Fact-checked by Grok 2 weeks ago

Language code

A language code is a standardized used to identify and represent individual languages, language variants, and language groups in a consistent manner across international contexts, as defined by the series of standards developed by the (). These codes, typically consisting of two or three lowercase letters, enable precise referencing in applications such as software localization, bibliographic systems, web content tagging, and linguistic research, promoting and reducing ambiguity in global communication. For example, the code "en" denotes English in the two-letter format, while "eng" serves the same purpose in the three-letter format. The standards, first established in the late and continually updated, form a harmonized framework that specifies rules for the selection, formation, presentation, and usage of these identifiers, including reference names in and . The core parts include , which provides 184 two-letter codes for widely used languages primarily in and general use; , offering three-letter codes for bibliographic (B) and terminological (T) purposes to cover 486 languages; and , which extends coverage to approximately 7,900 individual languages (as of 2024) for comprehensive ethnographic and linguistic applications. Additional parts, such as for language families and for language variants (alpha-4 codes), address hierarchical relationships among languages. Maintained by the ISO 639/RA (Registration Authority) in collaboration with organizations like the , the standards exclude codes for reconstructed proto-languages, computer programming languages, or markup languages to focus solely on human languages. The latest edition, :2023, emphasizes principles for combining language codes with other identifiers, such as country codes from , to form extended tags like "en-US" for , widely adopted in protocols like those from the (IETF). These codes play a critical role in multilingual environments, supporting , data processing, and cultural preservation efforts worldwide.

Definition and Purpose

Core Definition

A language code is a standardized or identifier used to represent , dialects, or language families in a concise, machine-readable format, typically consisting of two or three letters. These codes facilitate the unique identification of linguistic entities, encompassing individual (whether living, extinct, ancient, or constructed), variants, and broader groups such as families. Developed under international standards like , they ensure consistency across global applications without relying on lengthy descriptive names. Language codes are distinct from related identifiers in other ISO standards, such as country codes under ISO 3166, which denote geographic territories and their subdivisions using two- or three-letter alpha codes (e.g., "US" for United States), or script codes under ISO 15924, which specify writing systems like Latin or Cyrillic with four-letter codes (e.g., "Latn" for Latin script). While language codes focus solely on linguistic classification, these others address nationality or orthographic aspects, preventing conflation in combined tagging systems. Basic formats include two-letter alpha-2 codes for widely used languages (e.g., "en" for English, "fr" for ) and three-letter alpha-3 codes for broader or more specific coverage (e.g., "eng" for English, "fra" for ). These short formats prioritize brevity and universality for computational processing. The primary purpose of language codes is to provide unambiguous, machine-readable labels that mitigate confusion in multilingual environments, such as software localization, data interchange, and tagging, where precise language attribution is essential for functionality and accessibility.

Primary Uses

Language codes play a crucial role in (i18n), enabling software and applications to adapt content for diverse linguistic and cultural contexts. They are used to tag user interfaces, messages, and resources during localization, allowing developers to select appropriate translations based on user preferences, such as displaying text in English (en) or (es) variants like es-MX for . This facilitates efficient content management in global software products by separating translatable elements from code, reducing development costs and improving across regions. In linguistic documentation, language codes support the cataloging and preservation of languages, particularly endangered ones, by providing standardized identifiers for resources like dictionaries, grammars, and audio recordings. For instance, codes, such as "ayb" for Ayizo, are employed in databases to track approximately 7,900 languages (as of 2024), including those at risk of extinction, aiding researchers in organizing and accessing materials for revitalization efforts. This systematic coding ensures consistent referencing in academic and archival systems, helping to document linguistic diversity before potential loss. Language codes are integral to data exchange standards like XML and , where they specify the language of content to ensure accurate interpretation and processing across systems. In XML, the xml:lang attribute, using BCP 47 tags (e.g., fr for ), declares the of elements to support rendering, searching, and accessibility features in documents. Similarly, in -based APIs and metadata schemas, these codes appear in fields to denote string languages, promoting interoperability in web services and data serialization. In global communication protocols, language codes enable and specification of content languages to facilitate multilingual interactions. For negotiation, HTTP headers like Accept-Language (e.g., en-US,fr) allow clients to request preferred languages, while servers respond with Content-Language headers to indicate the delivered resource's language, optimizing delivery in diverse environments. In email protocols, the Content-Language header, defined per 3282, tags messages with codes like de for , assisting recipients and filters in handling multilingual correspondence.

Historical Development

Early Classification Systems

In the , efforts to systematize language identification emerged within , focusing on genealogical classification rather than standardized codes. , a German linguist, advanced this through his Stammbaumtheorie (family-tree theory), which he likened to biological evolution in his 1863 work Die Darwinsche Theorie und die Sprachwissenschaft, applied initially to . This approach treated languages as organic entities evolving from common ancestors, enabling the reconstruction of proto-languages and laying foundational principles for cataloging linguistic diversity. Schleicher's methods built on earlier comparative works, such as those by Franz Bopp and , which emphasized systematic comparison of and to establish language families. These 19th-century endeavors served as precursors to modern databases like by prioritizing exhaustive inventories and hierarchical classifications of global languages, though they relied on descriptive nomenclature rather than abbreviated codes. For instance, Schleicher's Compendium der vergleichenden Grammatik der indogermanischen Sprachen (1861–62) provided detailed typologies that influenced subsequent ethnolinguistic surveys. In the early 20th century, the Summer Institute of Linguistics (SIL), founded in 1934 by , initiated extensive ethnolinguistic surveys to document underrepresented languages, particularly in the Americas. These surveys, starting with fieldwork among indigenous groups like the Kaqchikel in and in , aimed to identify and describe languages for translation and literacy programs, producing informal lists that cataloged hundreds of varieties by the 1940s. SIL's efforts emphasized practical identification through native names and geographic markers, influencing later code development; by 1951, this work culminated in the first edition of , a comprehensive language inventory initially covering 46 entries. Parallel to SIL's initiatives, library systems began adopting abbreviated identifiers for languages in cataloging. developed three-letter codes in the 1960s as part of the (Machine-Readable Cataloging) format to standardize bibliographic entries, predating formal ISO standards and facilitating efficient indexing of multilingual materials. These codes, such as "" for English, were used internally for over a before alignment with international norms. By the , international organizations recognized the need for global catalogs to support and cultural preservation. UNESCO's monograph The Use of Vernacular Languages in urged comprehensive linguistic surveys to map mother tongues worldwide, leading to informal lists and inventories compiled through collaborative efforts with linguists and governments. This report highlighted the urgency of documenting the world's many , setting the stage for standardized systems.

Modern Standardization

The modern standardization of language codes began with the establishment of ISO/TC 37, the International Organization for Standardization's technical committee on language and , which became operational in 1952 to formulate general principles of and terminological , later expanding to include language coding standards. This committee provided the institutional framework for developing systematic, internationally agreed-upon codes, shifting from earlier ad-hoc systems toward formalized, maintainable identifiers suitable for global use in documentation, , and . Key milestones in this evolution include the publication of the first edition of in 1988, which introduced two-letter alpha-2 codes for major languages, aligning partially with country codes from to facilitate bibliographic and terminological applications. This was followed by in 1998, which established three-letter alpha-3 codes specifically for bibliographic and technical contexts, expanding coverage to include more language varieties while providing distinct codes for broader and narrower uses. The most significant advancement came with in 2007, which aimed to assign unique three-letter codes to all known individual languages, including extinct and ancient ones, thereby creating a comprehensive registry. A pivotal role in this expansion was played by , designated as the registration authority for , which developed the standard based on extensive linguistic data from sources like and processed requests to cover over 7,000 living languages by the late 2000s. Ongoing updates and revisions have further refined the system; for instance, with the publication of in 2007, specific codes were assigned to constructed (artificial) languages, building on the "art" identifier from to accommodate growing interest in engineered languages like those used in fiction, international communication, and . These developments ensure the codes remain adaptable to emerging needs while maintaining stability for practical implementation.

Classification Challenges

Linguistic and Dialectal Issues

One of the central challenges in assigning language codes arises from the debate over distinguishing languages from dialects, where serves as a primary linguistic criterion but often conflicts with sociopolitical realities. According to standards, varieties are considered distinct languages if they lack or form part of a chain where intelligibility diminishes significantly between endpoints. However, this criterion proves problematic in cases like Serbian (srp) and Croatian (hrv), which exhibit near-complete —approaching 100% in standard forms due to shared , , and core —yet receive separate codes under owing to post-Yugoslav national identities and political separation. Dialect continua further complicate coding efforts, as gradual variations across regions blur boundaries between distinct varieties. In such continua, speakers at adjacent points maintain high , but distant ones do not, making it arbitrary to draw lines for code assignment. exemplifies this issue, encompassing a from the to the , where (arb) coexists with highly divergent spoken forms; assigns over 30 individual codes to these varieties to capture their limited intelligibility with the standard and among themselves. To address these challenges, introduces the concept of macrolanguages, which group closely related varieties under a single code while allowing individual codes for components lacking full . (ara) functions as such a macrolanguage, unifying approximately 30 specific codes (e.g., , arz; , apc) that represent a cluster of varieties treated as a cohesive in broader contexts like international standards. This approach balances linguistic granularity with practical utility, though it still requires decisions on inclusion based on shared lexical and structural features. Sociopolitical factors profoundly influence code assignments, often overriding purely linguistic criteria, particularly in post-colonial settings. In , colonial legacies elevated languages as official while fragmenting ones, leading to code or driven by policies aimed at fostering unity or ethnic recognition. For instance, post-independence governments in countries like have promoted vernaculars such as Wolof (wol) through policy, elevating its status despite continuum ties to other varieties, reflecting efforts to counter colonial hierarchies.

Practical Implementation Difficulties

The proliferation of language codes in standards like , which encompasses approximately 7,900 individual codes for known human languages, poses significant maintenance challenges due to the dynamic nature of linguistic vitality. This expansive set requires ongoing updates to account for emerging languages, such as newly documented minority tongues in remote regions, and the obsolescence of others, including extinct varieties that no longer have speakers. For instance, , as the registration authority, facilitates annual code changes to incorporate such shifts, but the sheer volume—covering living, extinct, ancient, and constructed languages—demands rigorous verification to prevent redundancies or inaccuracies. These updates ensure comprehensive coverage but strain resources, as linguistic surveys must continually monitor global diversity to propose additions or retirements for languages proven non-existent or merged with others. Mapping between different coding schemes, particularly 's limited 184 two-letter codes and 's detailed three-letter identifiers, introduces incompatibilities that complicate practical adoption in software and databases. prioritizes major languages for broad interoperability, often using collective or macrolanguage codes like "zh" for , which correspond to dozens of distinct entries in (e.g., "cmn" for , "yue" for ). This granularity mismatch leads to deprecated codes in transitional systems, where outdated mappings—such as the former collective "cai" for Central American Indigenous languages in —must be resolved to align with 's individual identifiers, potentially requiring extensive in international standards applications. Such discrepancies hinder seamless integration, as developers must implement fallback mechanisms to handle unmapped or retired codes without disrupting functionality. The registration authority process for , managed by , further exacerbates implementation delays through its structured yet time-intensive approval workflow. Proposals for new codes, modifications, or retirements are accepted from to annually, followed by public posting for review until mid-December, with final approvals processed in early of the subsequent year and published by January 31. This timeline typically spans 6 to 12 months, depending on submission date, involving linguist evaluations to verify linguistic distinctiveness and avoid conflicts with existing codes. While this ensures , it slows responses to urgent needs, such as documenting endangered emerging languages before they vanish. Coverage gaps persist for specialized language types like sign languages and creoles, though updates in the 2020s have incrementally addressed partial inclusions from the standard's 2007 inception. initially drew from data, which under-represented sign languages, leading to only a handful of codes (e.g., "bzs" for ) until expanded listings in recent revisions incorporated more variants based on improved documentation. Similarly, creoles—often viewed as hybrid forms—faced inconsistent classification, with codes like "cab" for added progressively to reflect their status as distinct natural languages, but ongoing requests highlight remaining omissions for lesser-documented creoles in multilingual regions. These enhancements via annual change requests mitigate gaps but underscore the challenge of balancing exhaustive coverage with verifiable evidence.

Major Coding Schemes

ISO 639 Standards

The standards form a hierarchical family of international codes developed by the (ISO) to represent names of s and groups in a compact, unambiguous manner, facilitating their use in , , and international communication. These codes are maintained through designated agencies and evolve to address varying levels of linguistic granularity, from major world languages to individual dialects and families. The standards emphasize stability, with codes assigned based on established linguistic criteria and no reuse of retired identifiers to preserve historical integrity. The latest edition, :2023, harmonizes the framework and specifies principles for language coding. ISO 639-1 provides two-letter alphabetic codes for 184 major languages, designed for general-purpose applications where brevity is essential, such as in software localization and web standards. These codes prioritize widely spoken national or international languages, ensuring broad accessibility without requiring extensive lists. For example, "en" denotes English and "fr" denotes , allowing simple identification in diverse contexts like user interfaces or metadata tagging. ISO 639-2 extends this framework with three-letter codes, offering two variants for enhanced specificity in specialized domains: the bibliographic variant (e.g., "eng" for ) used primarily in library catalogs and academic indexing, and the terminological variant (e.g., "fre" for ) applied in technical documentation and terminology databases. This part covers approximately 464 individual languages and some groups, bridging the gap between broad usage and detailed cataloging needs while harmonizing with where possible. ISO 639-3 further expands coverage to approximately 7,900 known languages (as of 2024), including living, extinct, ancient, and constructed ones, using unique three-letter codes to achieve near-comprehensive representation of global linguistic diversity. Maintained by , it allocates codes through a formal request process that evaluates linguistic distinctiveness, with principles ensuring no reuse of retired codes to maintain referential consistency over time. An example is "ara" for , which supports detailed ethnolinguistic analysis in research and . ISO 639-5 introduces three-letter codes for language families and groups, supplementing earlier parts by enabling representation of broader classifications not covered as individual languages. For instance, "afa" identifies the Afro-Asiatic language family, encompassing branches like Semitic and Berber, which aids in organizing linguistic hierarchies for educational and archival purposes.

IETF BCP 47 and Extensions

The IETF Best Current Practice 47 (BCP 47) provides a standardized framework for constructing tags to identify human s in protocols and applications, extending beyond standalone identifiers by incorporating additional subtags for greater specificity. These tags are formed as a sequence of one or more subtags separated by hyphens, following the general structure: primary subtag, optionally followed by , , variant, extension, and private use subtags (e.g., "en-Latn-US" for English in as used in the United States). The primary subtag is typically a two- or three-letter code from , while the subtag uses four-letter codes from to denote writing systems, the subtag employs two-letter codes from or three-digit codes from UN M.49 for geographic or administrative areas, and variant subtags (five to eight characters) are registered to indicate specific dialects or historical forms. This integration allows BCP 47 tags to combine linguistic and contextual elements into a single, extensible identifier suitable for protocols like HTTP, XML, and standards. BCP 47 supports key extensions to accommodate specialized or legacy needs within its structure. Private use subtags begin with "x-" followed by one or more subtags defined by private agreement among users, enabling custom extensions without conflicting with registered elements (e.g., "en-x-foo" for a proprietary variant of English). Grandfathered tags, which predate the modern registry, are preserved for and include irregular forms starting with "i-" (e.g., "i-cherokee" for the ) or other legacy patterns; these are not to be created anew but may be mapped to preferred equivalents in the registry. Extension subtags, introduced via single-character singletons (e.g., "u-" for Unicode locale extensions as defined in 6067), allow for standardized additions like or numbering systems, further enhancing the tag's utility in software and protocols. The system is governed by RFC 5646 (published in 2009), which defines the syntax, semantics, and validity rules for tags, including case-insensitive matching and to ensure . It establishes an IANA-maintained registry of subtags and grandfathered tags, updated through a formal process outlined in RFC 5645, to track descriptions, deprecations, and preferred values while preventing conflicts. Matching rules in BCP 47 prioritize exact matches but allow for fallback to broader tags (e.g., "en-US" matching "en" if needed), supporting flexible language negotiation in applications. This framework has been widely adopted in IETF RFCs for protocols requiring language identification, promoting consistency across the Internet ecosystem.

Applications and Implementation

In Computing and Software

Language codes, standardized primarily through IETF BCP 47, are integral to and text processing, enabling applications to handle multilingual content by specifying the for rendering, collation, and script selection. In , these tags inform processes like layout and font fallback, ensuring correct display of scripts such as or when combined with encoding, which supports the 159,801 assigned characters across 172 scripts in 17.0 (as of September 2025). For instance, a language tag like "ar-SA" signals right-to-left rendering for text in , optimizing processing in libraries like ICU (). In web technologies, language codes are applied via the HTML lang attribute to declare the primary of document elements, aiding accessibility tools, search engines, and styling by informing screen readers and hyphenation rules. This attribute accepts BCP 47 tags, such as lang="fr-CA" for , which propagates to child elements unless overridden. In CSS, the :lang() pseudo-class selector uses these codes for language-specific styling, like applying a font to text with :lang(fr) { font-family: [Garamond](/page/Garamond); }, allowing targeted rules without altering HTML structure. Programming libraries leverage language codes for localization, adapting output to cultural conventions like date formats or currency symbols. In , the locale module uses codes in identifiers like "en_US.UTF-8" to set regional settings, enabling functions such as locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8') for German number formatting with commas as decimal separators. Similarly, Java's Locale class constructs objects from language codes and country codes, as in Locale.forLanguageTag("ja-JP"), which influences DateFormat and NumberFormat for symbols and year-month-day ordering. For content management, language codes facilitate tagging in databases to enforce locale-specific collation during queries. In SQL Server, the COLLATE clause applies rules like COLLATE French_CI_AS for case-insensitive French sorting, ensuring accurate comparisons in multilingual tables storing data. Search engines use these codes in annotations to deliver language-targeted results; for example, interprets hreflang="es-MX" to prioritize content for users in that region, improving relevance in multilingual queries. Systems address challenges with unknown or ambiguous codes through fallback mechanisms, defaulting to the "und" (undetermined) tag from BCP 47 when no specific language matches, preventing errors in processing mixed or unidentified content. This allows graceful degradation, such as rendering text without language-specific hyphenation, while broader matching rules extend to related variants like falling back from "en-GB" to "en".

In Linguistics and International Standards

In linguistics, language codes play a crucial role in documenting and assessing the vitality of the world's languages through comprehensive catalogs. Ethnologue, a primary reference for language data, employs ISO 639-3 three-letter codes to identify over 7,000 living languages, enabling detailed entries on their ecology, speaker populations, and status. These codes support vitality assessments using the Expanded Graded Intergenerational Disruption Scale (EGIDS), which evaluates degrees of endangerment based on intergenerational transmission and institutional support, as seen in the 26th edition's digital profiles for 7,168 languages. Similarly, Glottolog utilizes unique Glottocodes—stable alphanumeric identifiers—to catalog languages, dialects, and families, providing a foundational inventory for linguistic research and documentation without relying on vitality metrics but ensuring precise cross-referencing. The UNESCO Atlas of the World's Languages in Danger integrates these standards, assigning ISO 639 codes to approximately 2,500 endangered languages to map their geographic distribution, speaker numbers, and threat levels, aiding global preservation efforts. Language codes are embedded in international agreements to facilitate multilingual access and protection of . In the (WIPO) standards, such as ST.86 for patent data exchange, two-letter codes specify languages for document submissions and translations, ensuring equitable handling of copyrights across linguistic boundaries in over 190 member states. The 's multilingual policies, governing 24 official languages, incorporate codes in classifications and legislative drafting to standardize language tagging in official communications, supporting equal access under the Treaty on the Functioning of the (Article 342). At the , translation services for the six official languages (, , English, , , ) plus others use three-letter codes to manage over 10,000 annual documents, enabling efficient workflow in multilingual proceedings as outlined in the UN's documentation guidelines. Bibliographic standards in libraries rely on language codes for precise cataloging and retrieval. The (Machine-Readable Cataloging) format, maintained by the , adopts three-letter codes in the 041 field to denote the language of textual content, original versions, and translations, facilitating global interoperability in systems like , which indexes millions of records. This integration, harmonized since 1998, allows librarians to tag resources accurately, supporting multilingual discovery in academic and public collections. In linguistic research, language codes enable the tagging of datasets in for cross-linguistic analysis. Tools such as Wmatrix and the use identifiers to annotate multilingual corpora, allowing researchers to compare syntactic patterns, semantic shifts, or discourse features across languages like English and in projects examining typological diversity. This standardized tagging, as in the Glottolog-linked datasets, underpins comparative studies by ensuring consistent language identification, as evidenced in analyses of over 80 phylogenetic trees derived from coded inventories.

References

  1. [1]
    ISO 639:2023 - Code for individual languages and language groups
    In stockThis document specifies the ISO 639 language code and establishes the harmonized terminology and general principles of language coding.
  2. [2]
    ISO 639-2 Language Code List - Library of Congress
    Dec 21, 2017 · ISO 639-2 is the alpha-3 code in Codes for the representation of names of languages-- Part 2. There are 20 languages that have alternative codes.
  3. [3]
    ISO 639-1:2002 - Codes for the representation of names of languages
    This part of ISO 639 provides a code consisting of language code elements comprising two-letter language identifiers for the representation of names of ...
  4. [4]
    ISO 639:2023(en), Code for individual languages and language ...
    This document specifies the ISO 639 language code and establishes the harmonized terminology and general principles of language coding. It provides rules for ...
  5. [5]
    ISO 639-2:1998(en), Codes for the representation of names of ...
    This part of ISO 639 provides two sets of three-letter alphabetic codes for the representation of names of languages, one for terminology applications and the ...
  6. [6]
    ISO 639-2 Language Code Agency - The Library of Congress
    This document contains the ISO 639-2 Alpha-3 codes for the representation of names of languages.Codes for the representation of... · Development of ISO 639-2 · Date of changeMissing: early 20th century pre-<|separator|>
  7. [7]
    ISO 639 — Language code
    ISO 639, Code for individual languages and language groups, can be applied across many types of organization and situations.
  8. [8]
    ISO 3166 — Country Codes
    ISO 3166 is an international standard which defines codes representing names of countries and their subdivisions. The standard specifies basic guidelines for ...Country Codes Collection · Glossary for ISO 3166 · ISO 3166-1:2020 · ISO/TC 46Missing: 15924 | Show results with:15924
  9. [9]
    ISO 15924 Registration Authority - Unicode
    Welcome to the official site of the ISO 15924 Registration Authority (ISO 15924/RA). ISO has appointed the Unicode Consortium as the Registration Authority for ...
  10. [10]
    ISO 639-1:2002(en), Codes for the representation of names of ...
    ISO 639-1 was devised primarily for use in terminology, lexicography and linguistics. ISO 639-2 represents all languages contained in ISO 639-1 and in addition ...
  11. [11]
    Frequently Asked Questions (FAQ) - Codes for the representation of ...
    ISO 639 provides two sets of language codes, one as a two-character code set (639-1) and another as a three-character code set (639-2) for the representation ...
  12. [12]
    Standard locale names - Globalization - Microsoft Learn
    Apr 8, 2025 · BCP 47 is the standard that defines the most-commonly used format for specifying a locale. This format is used by Windows and many other environments.
  13. [13]
    Choosing a language tag - W3C
    The BCP 47 specification allows for an additional, 3-letter subtag immediately after the initial primary language subtag. This is called an extended language ...
  14. [14]
    Three-letter Codes for Identifying Languages - Ethnologue
    ISO 639-3 was devised to enable the uniform identification of all known languages in a wide range of applications, particularly including information systems.
  15. [15]
    RFC 9110: HTTP Semantics
    HTTP uses language tags within the Accept-Language and Content-Language header fields. Accept-Language uses the broader language-range production defined in ...
  16. [16]
    RFC 3282: Content Language Headers
    This document defines a "Content-language:" header, for use in cases where one desires to indicate the language of something that has RFC 822-like headers.
  17. [17]
    August Schleicher | Indo-European languages, comparative ...
    The comparative method was developed and used successfully in the 19th century to reconstruct this parent language, Proto-Indo-European, and has since been ...
  18. [18]
    Our History - SIL Global
    SIL Global (then known as the Summer Institute of Linguistics) began in 1934 as a summer training program in Arkansas, USA, with two students.
  19. [19]
    History | Ethnologue Free
    Under the new scheme, SIL International has been named as the Language Coding Agency for preprocessing requests for changes to ISO 639-3 and for publishing ...
  20. [20]
    [PDF] The Use of vernacular languages in education
    Linguistic Factors. A careful survey of the linguistic situation of a region by linguists is essential before it is decided which languages should be used in.
  21. [21]
    History of ISO/TC 37. - Standardization - Infoterm
    Apr 29, 2013 · In 1936 ISA/TC 37 “Terminology” was established. In 1952 ISO/TC 37 “Terminology (principles and co-ordination)” became operational, its ...
  22. [22]
    History of ISO 639. - Infoterm
    Mar 23, 2024 · ISO 639 and ISO 3166​​ In 1988, ISO/TC 37 published the 1st edition of ISO 639 Code for the representation of names of languages coinciding with ...
  23. [23]
    ISO 639-2:1998 - Codes for the representation of names of languages
    Publication date. : 1998-11. Stage. : Withdrawal of International Standard [95.99]. Edition. : 1. Number of pages. : 66. Technical Committee : ISO/TC 37/SC 2.Missing: history | Show results with:history
  24. [24]
    ISO 639-3:2007 - Codes for the representation of names of languages
    ISO 639-3:2007 attempts to provide as complete an enumeration of languages as possible, including living, extinct, ancient and constructed languages.
  25. [25]
    ISO 639-3 Language Codes Released with SIL as Registration ...
    Feb 5, 2007 · The new standard, released February 5, 2007, greatly expands upon the 478 codes formerly provided by ISO 639-2, having the goal of comprehensive ...
  26. [26]
    ISO 639-3 |
    - **Number of Codes**: Not explicitly stated, but based on Ethnologue 15th edition and harmonized with ISO 639-2 and Linguist List.
  27. [27]
    How to Distinguish Languages and Dialects - MIT Press Direct
    The more practical problem with the criterion of mutual intelligibility is that measurements are usually simply not available. The second approach was ...
  28. [28]
    [PDF] To what degree are Croatian and Serbian the same language?
    Thus the fact of mutual intelligibility is difficult to raise to the level of linguistic evidence in favor of linguistic identity, especially because of the ...
  29. [29]
    [PDF] The Ethics of Language Development - SIL Global
    For example, Arabic [ara] is classified as a macrolanguage that includes about thirty different varieties of Arabic, each with its own ISO 639-3 code.
  30. [30]
    Arabic Language (ARA) - Ethnologue
    Arabic is classified as a “macrolanguage” in the ISO 639 standard and is assigned to [ara] as its three-letter code.
  31. [31]
    [PDF] Colonization, Globalization and Language Vitality in Africa
    Aug 29, 2008 · Overall, the popu- lations the most isolated from the socioeconomic structure inherited from the colonial regimes are those that have held on to ...
  32. [32]
    How many languages are there in the world? | Ethnologue Free
    7159 languages are in use today. That number is constantly in flux, because we're learning more about the world's languages every day.
  33. [33]
    FAQs | SIL Global
    Accepted changes are processed in early January and posted by January 31st the following year.
  34. [34]
    [PDF] Issues to Resolve in ISO 639 - Unicode
    A macro-language, then, must be an entry in ISO 639-1 or ISO 639-2 that corresponds to multiple closely-related entries in ISO 639-3 that share a common name ...
  35. [35]
    [PDF] Lexvo.org: Language-Related Information for the Linguistic Linked ...
    For deprecated ISO 639-3 language codes, the system points to relevant alternative language iden- tifiers. The information that Lexvo.org serves is ...<|separator|>
  36. [36]
    [PDF] PCC Guidelines for the Use of ISO 639-3 Language Codes in MARC ...
    Jan 12, 2023 · SIL is the official maintainer of ISO 639-3. ISO 639-3 language codes may be searched or downloaded at the. SIL site at: https://iso639-3.sil ...
  37. [37]
    ISO 639-4:2010 - Codes for the representation of names of languages
    ISO 639-4:2010 gives the general principles of language coding using the codes that are specified in the other parts of ISO 639 and their combination with ...
  38. [38]
    ISO 639-5 Language Coding Agency - The Library of Congress
    Set 5 of ISO 639 supplements the coding of language groups and language families in ISO 639-2. ... This is the official site of the ISO 639-5 Language ...
  39. [39]
    Codes for the representation of names of languages (ISO 639-5 ...
    ISO 639-5 codes ordered by Identifier ; euq, Basque (family), basque (famille) ; fiu, Finno-Ugrian languages, finno-ougriennes, langues ; fox, Formosan languages ...
  40. [40]
    ISO 639-6:2009 - Codes for the representation of names of languages
    ISO 639-6:2009 specifies a method for establishing four-letter language identifiers (alpha-4) and language reference names for language variants.
  41. [41]
    RFC 5646 - Tags for Identifying Languages - IETF Datatracker
    This document describes the structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language ...
  42. [42]
  43. [43]
  44. [44]
  45. [45]
  46. [46]
  47. [47]
  48. [48]
  49. [49]
    RFC 5645 - Update to the Language Subtag Registry
    This memo defines the procedure used to update the IANA Language Subtag Registry, in conjunction with the publication of RFC 5646, for use in forming tags for ...
  50. [50]
  51. [51]
    iana.org language subtag registry
    ... language Subtag: ab Description: Abkhazian Added: 2005-10-16 Suppress-Script: Cyrl %% Type: language Subtag: ae Description: Avestan Added: 2005-10-16 ...
  52. [52]
    Unicode Locale Data Markup Language (LDML)
    Unicode language and locale identifiers inherit the design and the repertoire of subtags from [BCP47] Language Tags. There are some extensions and restrictions ...Dates · Numbers · General · Collation
  53. [53]
    Language tags in HTML and XML - W3C
    Mar 3, 2014 · Language tag syntax is defined by the IETF 's BCP 47. BCP stands for 'Best Current Practice', and is a persistent name for a series of RFC s ...
  54. [54]
    locale — Internationalization services — Python 3.14.0 documentation
    The `locale` module accesses the POSIX locale database, allowing programmers to handle cultural issues without knowing each country's specifics.Locale... · Background, Details, Hints... · Locale Names
  55. [55]
    Locale (Java Platform SE 8 ) - Oracle Help Center
    Use getCountry to get the country (or region) code and getLanguage to get the language code. You can use getDisplayCountry to get the name of the country ...Nested · Method
  56. [56]
    COLLATE (Transact-SQL) - SQL Server - Microsoft Learn
    Mar 4, 2025 · The COLLATE clause can be applied only for the char, varchar, text, nchar, nvarchar, and ntext data types. COLLATE uses collate_name to refer to ...
  57. [57]
    Language Tags and Locale Identifiers for the World Wide Web - W3C
    Oct 7, 2020 · The Unicode locale language tag extension [ RFC6067 ] uses the -u- subtag, and provides subtags for selecting different locale-based formats ...
  58. [58]
    Ethnologue | Languages of the world
    Find, read about, and research all 7159 living languages. Ethnologue is the ultimate source of information on the world's languages.Browse the Countries of the... · Browse By Language Name · Credits · HistoryMissing: 1930s | Show results with:1930s
  59. [59]
    Welcome to the 26th edition | Ethnologue Free
    Feb 21, 2023 · The 26th Ethnologue lists 7,168 living languages, 101 extinct languages, and includes digital vitality assessments, with 16,000 updates.
  60. [60]
    Glottolog 5.2 -
    Glottolog provides a comprehensive catalogue of the world's languages, language families and dialects. It assigns a unique and stable identifier (the ...Languages · Families · About · Language Search
  61. [61]
    [PDF] ST.86 - Standards - WIPO
    Feb 25, 2007 · ISO/IEC 10646 – UCS – Unicode UTF-8 MUST be used for character set. 25. ISO 639-1 (2-Letter Language Codes) MUST be used for Language Codes. 26.Missing: copyrights | Show results with:copyrights
  62. [62]
    Glossary:Language codes - Statistics Explained - Eurostat
    Languages within or without the European Union (EU) have been assigned a two-letter language code, always written in small letters, and used for coding ...Missing: multilingual | Show results with:multilingual
  63. [63]
    Language Code - the United Nations
    Jul 1, 2009 · Language codes are 3-letter, lower-case codes indicating language versions. Official UN languages are listed first, followed by non-official  ...Missing: translation | Show results with:translation
  64. [64]
    ISO 639-2: International standard for language codes
    The Library of Congress has been designated the Registration Authority for ISO 639-2; after initial publication of the ISO list, future development of the list ...Missing: history | Show results with:history
  65. [65]
    Wmatrix corpus analysis and comparison tool - UCREL
    Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to natural language processing tools such as the USAS and CLAWS ...Missing: comparative | Show results with:comparative