Fact-checked by Grok 2 weeks ago

Language code

A language code is a standardized abbreviation used to identify and represent individual languages, language variants, and language groups in a consistent manner across international contexts, as defined by the ISO 639 series of standards developed by the International Organization for Standardization (ISO).^[1] These codes, typically consisting of two or three lowercase letters, enable precise referencing in applications such as software localization, bibliographic systems, web content tagging, and linguistic research, promoting interoperability and reducing ambiguity in global communication.^[2] For example, the code "en" denotes English in the two-letter format, while "eng" serves the same purpose in the three-letter format.^[3] The ISO 639 standards, first established in the late 20th century and continually updated, form a harmonized framework that specifies rules for the selection, formation, presentation, and usage of these identifiers, including reference names in English and French.^[4] The core parts include ISO 639-1, which provides 184 two-letter codes for widely used languages primarily in information technology and general use; ISO 639-2, offering three-letter codes for bibliographic (B) and terminological (T) purposes to cover 486 languages; and ISO 639-3, which extends coverage to approximately 7,900 individual languages (as of 2024) for comprehensive ethnographic and linguistic applications.^[3]^[5] Additional parts, such as ISO 639-5 for language families and ISO 639-6 for language variants (alpha-4 codes), address hierarchical relationships among languages.^[1] Maintained by the ISO 639/RA (Registration Authority) in collaboration with organizations like the Library of Congress, the standards exclude codes for reconstructed proto-languages, computer programming languages, or markup languages to focus solely on natural human languages.^[6] The latest edition, ISO 639:2023, emphasizes principles for combining language codes with other identifiers, such as country codes from ISO 3166, to form extended tags like "en-US" for American English, widely adopted in protocols like those from the Internet Engineering Task Force (IETF).^[1] These codes play a critical role in multilingual environments, supporting accessibility, data processing, and cultural preservation efforts worldwide.^[2]

Definition and Purpose

Core Definition

A language code is a standardized abbreviation or identifier used to represent languages, dialects, or language families in a concise, machine-readable format, typically consisting of two or three letters. These codes facilitate the unique identification of linguistic entities, encompassing individual languages (whether living, extinct, ancient, or constructed), variants, and broader groups such as families. Developed under international standards like ISO 639, they ensure consistency across global applications without relying on lengthy descriptive names.^[7] Language codes are distinct from related identifiers in other ISO standards, such as country codes under ISO 3166, which denote geographic territories and their subdivisions using two- or three-letter alpha codes (e.g., "US" for United States), or script codes under ISO 15924, which specify writing systems like Latin or Cyrillic with four-letter codes (e.g., "Latn" for Latin script). While language codes focus solely on linguistic classification, these others address nationality or orthographic aspects, preventing conflation in combined tagging systems.^[8]^[9] Basic formats include two-letter alpha-2 codes for widely used languages (e.g., "en" for English, "fr" for French) and three-letter alpha-3 codes for broader or more specific coverage (e.g., "eng" for English, "fra" for French). These short formats prioritize brevity and universality for computational processing.^[10]^[5] The primary purpose of language codes is to provide unambiguous, machine-readable labels that mitigate confusion in multilingual environments, such as software localization, data interchange, and digital content tagging, where precise language attribution is essential for functionality and accessibility.^[11]

Primary Uses

Language codes play a crucial role in internationalization (i18n), enabling software and applications to adapt content for diverse linguistic and cultural contexts. They are used to tag user interfaces, messages, and resources during localization, allowing developers to select appropriate translations based on user preferences, such as displaying text in English (en) or Spanish (es) variants like es-MX for Mexican Spanish. This facilitates efficient content management in global software products by separating translatable elements from code, reducing development costs and improving user experience across regions.^[12]^[13] In linguistic documentation, language codes support the cataloging and preservation of languages, particularly endangered ones, by providing standardized identifiers for resources like dictionaries, grammars, and audio recordings. For instance, ISO 639-3 codes, such as "ayb" for Ayizo, are employed in databases to track approximately 7,900 languages (as of 2024), including those at risk of extinction, aiding researchers in organizing and accessing materials for revitalization efforts. This systematic coding ensures consistent referencing in academic and archival systems, helping to document linguistic diversity before potential loss.^[7]^[14]^[15] Language codes are integral to data exchange standards like XML and JSON, where they specify the language of content to ensure accurate interpretation and processing across systems. In XML, the xml:lang attribute, using BCP 47 tags (e.g., fr for French), declares the language of elements to support rendering, searching, and accessibility features in documents. Similarly, in JSON-based APIs and metadata schemas, these codes appear in fields to denote string languages, promoting interoperability in web services and data serialization. In global communication protocols, language codes enable negotiation and specification of content languages to facilitate multilingual interactions. For web content negotiation, HTTP headers like Accept-Language (e.g., en-US,fr) allow clients to request preferred languages, while servers respond with Content-Language headers to indicate the delivered resource's language, optimizing delivery in diverse environments. In email protocols, the Content-Language header, defined per RFC 3282, tags messages with codes like de for German, assisting recipients and filters in handling multilingual correspondence.^[16]^[17]

Historical Development

Early Classification Systems

In the 19th century, efforts to systematize language identification emerged within comparative linguistics, focusing on genealogical classification rather than standardized codes. August Schleicher, a German linguist, advanced this through his Stammbaumtheorie (family-tree theory), which he likened to biological evolution in his 1863 work Die Darwinsche Theorie und die Sprachwissenschaft, applied initially to Indo-European languages.^[18] This approach treated languages as organic entities evolving from common ancestors, enabling the reconstruction of proto-languages and laying foundational principles for cataloging linguistic diversity.^[18] Schleicher's methods built on earlier comparative works, such as those by Franz Bopp and Jacob Grimm, which emphasized systematic comparison of grammar and vocabulary to establish language families. These 19th-century endeavors served as precursors to modern databases like Glottolog by prioritizing exhaustive inventories and hierarchical classifications of global languages, though they relied on descriptive nomenclature rather than abbreviated codes. For instance, Schleicher's Compendium der vergleichenden Grammatik der indogermanischen Sprachen (1861–62) provided detailed typologies that influenced subsequent ethnolinguistic surveys.^[18] In the early 20th century, the Summer Institute of Linguistics (SIL), founded in 1934 by William Cameron Townsend, initiated extensive ethnolinguistic surveys to document underrepresented languages, particularly in the Americas. These surveys, starting with fieldwork among indigenous groups like the Kaqchikel in Guatemala and Mixtec in Mexico, aimed to identify and describe languages for translation and literacy programs, producing informal lists that cataloged hundreds of varieties by the 1940s.^[19] SIL's efforts emphasized practical identification through native names and geographic markers, influencing later code development; by 1951, this work culminated in the first edition of Ethnologue, a comprehensive language inventory initially covering 46 entries.^[20] Parallel to SIL's initiatives, library systems began adopting abbreviated identifiers for languages in cataloging. The Library of Congress developed three-letter codes in the 1960s as part of the MARC (Machine-Readable Cataloging) format to standardize bibliographic entries, predating formal ISO standards and facilitating efficient indexing of multilingual materials. These codes, such as "eng" for English, were used internally for over a decade before alignment with international norms.^[11] By the 1950s, international organizations recognized the need for global language catalogs to support education and cultural preservation. UNESCO's 1953 monograph The Use of Vernacular Languages in Education urged comprehensive linguistic surveys to map mother tongues worldwide, leading to informal code lists and inventories compiled through collaborative efforts with linguists and governments. This report highlighted the urgency of documenting the world's many languages, setting the stage for standardized systems.^[21]

Modern Standardization

The modern standardization of language codes began with the establishment of ISO/TC 37, the International Organization for Standardization's technical committee on language and terminology, which became operational in 1952 to formulate general principles of terminology and terminological lexicography, later expanding to include language coding standards.^[22] This committee provided the institutional framework for developing systematic, internationally agreed-upon codes, shifting from earlier ad-hoc systems toward formalized, maintainable identifiers suitable for global use in documentation, computing, and linguistics. Key milestones in this evolution include the publication of the first edition of ISO 639 in 1988, which introduced two-letter alpha-2 codes for major languages, aligning partially with country codes from ISO 3166 to facilitate bibliographic and terminological applications.^[23] This was followed by ISO 639-2 in 1998, which established three-letter alpha-3 codes specifically for bibliographic and technical contexts, expanding coverage to include more language varieties while providing distinct codes for broader and narrower uses.^[24] The most significant advancement came with ISO 639-3 in 2007, which aimed to assign unique three-letter codes to all known individual languages, including extinct and ancient ones, thereby creating a comprehensive registry.^[25] A pivotal role in this expansion was played by SIL International, designated as the registration authority for ISO 639-3, which developed the standard based on extensive linguistic data from sources like Ethnologue and processed requests to cover over 7,000 living languages by the late 2000s.^[26] Ongoing updates and revisions have further refined the system; for instance, with the publication of ISO 639-3 in 2007, specific codes were assigned to constructed (artificial) languages, building on the collective "art" identifier from ISO 639-2 to accommodate growing interest in engineered languages like those used in fiction, international communication, and computational linguistics.^[27] These developments ensure the codes remain adaptable to emerging needs while maintaining stability for practical implementation.

Classification Challenges

Linguistic and Dialectal Issues

One of the central challenges in assigning language codes arises from the debate over distinguishing languages from dialects, where mutual intelligibility serves as a primary linguistic criterion but often conflicts with sociopolitical realities.^[4] According to ISO 639 standards, varieties are considered distinct languages if they lack mutual intelligibility or form part of a chain where intelligibility diminishes significantly between endpoints.^[28] However, this criterion proves problematic in cases like Serbian (srp) and Croatian (hrv), which exhibit near-complete mutual intelligibility—approaching 100% in standard forms due to shared grammar, phonology, and core lexicon—yet receive separate codes under ISO 639-3 owing to post-Yugoslav national identities and political separation.^[29] Dialect continua further complicate coding efforts, as gradual variations across regions blur boundaries between distinct varieties. In such continua, speakers at adjacent points maintain high mutual intelligibility, but distant ones do not, making it arbitrary to draw lines for code assignment. Arabic exemplifies this issue, encompassing a dialect continuum from the Maghreb to the Arabian Peninsula, where Modern Standard Arabic (arb) coexists with highly divergent spoken forms; ISO 639-3 assigns over 30 individual codes to these varieties to capture their limited intelligibility with the standard and among themselves.^[30] To address these challenges, ISO 639-3 introduces the concept of macrolanguages, which group closely related varieties under a single code while allowing individual codes for components lacking full mutual intelligibility. Arabic (ara) functions as such a macrolanguage, unifying approximately 30 specific codes (e.g., Egyptian Arabic, arz; Levantine Arabic, apc) that represent a cluster of varieties treated as a cohesive unit in broader contexts like international standards.^[31] This approach balances linguistic granularity with practical utility, though it still requires decisions on inclusion based on shared lexical and structural features. Sociopolitical factors profoundly influence code assignments, often overriding purely linguistic criteria, particularly in post-colonial settings. In Africa, colonial legacies elevated European languages as official while fragmenting indigenous ones, leading to code proliferation or consolidation driven by national policies aimed at fostering unity or ethnic recognition. For instance, post-independence governments in countries like Senegal have promoted vernaculars such as Wolof (wol) through policy, elevating its status despite continuum ties to other West Atlantic varieties, reflecting efforts to counter colonial hierarchies.^[32]

Practical Implementation Difficulties

The proliferation of language codes in standards like ISO 639-3, which encompasses approximately 7,900 individual codes for known human languages,^[15] poses significant maintenance challenges due to the dynamic nature of linguistic vitality. This expansive set requires ongoing updates to account for emerging languages, such as newly documented minority tongues in remote regions, and the obsolescence of others, including extinct varieties that no longer have speakers. For instance, SIL International, as the registration authority, facilitates annual code changes to incorporate such shifts, but the sheer volume—covering living, extinct, ancient, and constructed languages—demands rigorous verification to prevent redundancies or inaccuracies.^[25]^[33] These updates ensure comprehensive coverage but strain resources, as linguistic surveys must continually monitor global diversity to propose additions or retirements for languages proven non-existent or merged with others. Mapping between different coding schemes, particularly ISO 639-1's limited 184 two-letter codes and ISO 639-3's detailed three-letter identifiers, introduces incompatibilities that complicate practical adoption in software and databases.^[11] ISO 639-1 prioritizes major languages for broad interoperability, often using collective or macrolanguage codes like "zh" for Chinese, which correspond to dozens of distinct entries in ISO 639-3 (e.g., "cmn" for Mandarin, "yue" for Cantonese). This granularity mismatch leads to deprecated codes in transitional systems, where outdated mappings—such as the former collective "cai" for Central American Indigenous languages in ISO 639-2—must be resolved to align with ISO 639-3's individual identifiers, potentially requiring extensive data migration in international standards applications.^[34] Such discrepancies hinder seamless integration, as developers must implement fallback mechanisms to handle unmapped or retired codes without disrupting functionality.^[35] The registration authority process for ISO 639-3, managed by SIL International, further exacerbates implementation delays through its structured yet time-intensive approval workflow. Proposals for new codes, modifications, or retirements are accepted from September 1 to August 31 annually, followed by public posting for review until mid-December, with final approvals processed in early January of the subsequent year and published by January 31.^[33] This timeline typically spans 6 to 12 months, depending on submission date, involving linguist evaluations to verify linguistic distinctiveness and avoid conflicts with existing codes. While this ensures quality control, it slows responses to urgent needs, such as documenting endangered emerging languages before they vanish. Coverage gaps persist for specialized language types like sign languages and creoles, though updates in the 2020s have incrementally addressed partial inclusions from the standard's 2007 inception. ISO 639-3 initially drew from Ethnologue data, which under-represented sign languages, leading to only a handful of codes (e.g., "bzs" for Brazilian Sign Language) until expanded listings in recent revisions incorporated more variants based on improved documentation.^[36] Similarly, creoles—often viewed as hybrid forms—faced inconsistent classification, with codes like "cab" for Garifuna added progressively to reflect their status as distinct natural languages, but ongoing requests highlight remaining omissions for lesser-documented creoles in multilingual regions. These enhancements via annual change requests mitigate gaps but underscore the challenge of balancing exhaustive coverage with verifiable evidence.^[33]

Major Coding Schemes

ISO 639 Standards

The ISO 639 standards form a hierarchical family of international codes developed by the International Organization for Standardization (ISO) to represent names of languages and language groups in a compact, unambiguous manner, facilitating their use in information technology, documentation, and international communication.^[7] These codes are maintained through designated agencies and evolve to address varying levels of linguistic granularity, from major world languages to individual dialects and families. The standards emphasize stability, with codes assigned based on established linguistic criteria and no reuse of retired identifiers to preserve historical integrity. The latest edition, ISO 639:2023, harmonizes the framework and specifies principles for language coding.^[1] ISO 639-1 provides two-letter alphabetic codes for 184 major languages, designed for general-purpose applications where brevity is essential, such as in software localization and web standards.^[11] These codes prioritize widely spoken national or international languages, ensuring broad accessibility without requiring extensive lists. For example, "en" denotes English and "fr" denotes French, allowing simple identification in diverse contexts like user interfaces or metadata tagging.^[3] ISO 639-2 extends this framework with three-letter codes, offering two variants for enhanced specificity in specialized domains: the bibliographic variant (e.g., "eng" for English) used primarily in library catalogs and academic indexing, and the terminological variant (e.g., "fre" for French) applied in technical documentation and terminology databases.^[6] This part covers approximately 464 individual languages and some groups, bridging the gap between broad usage and detailed cataloging needs while harmonizing with ISO 639-1 where possible.^[2] ISO 639-3 further expands coverage to approximately 7,900 known languages (as of 2024), including living, extinct, ancient, and constructed ones, using unique three-letter codes to achieve near-comprehensive representation of global linguistic diversity.^[15] Maintained by SIL International, it allocates codes through a formal request process that evaluates linguistic distinctiveness, with principles ensuring no reuse of retired codes to maintain referential consistency over time.^[27] An example is "ara" for Arabic, which supports detailed ethnolinguistic analysis in research and data management.^[37] ISO 639-5 introduces three-letter codes for language families and groups, supplementing earlier parts by enabling representation of broader classifications not covered as individual languages.^[38] For instance, "afa" identifies the Afro-Asiatic language family, encompassing branches like Semitic and Berber, which aids in organizing linguistic hierarchies for educational and archival purposes.^[39]

IETF BCP 47 and Extensions

The IETF Best Current Practice 47 (BCP 47) provides a standardized framework for constructing language tags to identify human languages in Internet protocols and applications, extending beyond standalone language identifiers by incorporating additional subtags for greater specificity.^[40] These tags are formed as a sequence of one or more subtags separated by hyphens, following the general structure: primary language subtag, optionally followed by script, region, variant, extension, and private use subtags (e.g., "en-Latn-US" for English in Latin script as used in the United States).^[41] The primary language subtag is typically a two- or three-letter code from ISO 639, while the script subtag uses four-letter codes from ISO 15924 to denote writing systems, the region subtag employs two-letter codes from ISO 3166-1 or three-digit codes from UN M.49 for geographic or administrative areas, and variant subtags (five to eight characters) are registered to indicate specific dialects or historical forms.^[42] This integration allows BCP 47 tags to combine linguistic and contextual elements into a single, extensible identifier suitable for protocols like HTTP, XML, and internationalization standards.^[43] BCP 47 supports key extensions to accommodate specialized or legacy needs within its structure. Private use subtags begin with "x-" followed by one or more subtags defined by private agreement among users, enabling custom extensions without conflicting with registered elements (e.g., "en-x-foo" for a proprietary variant of English).^[44] Grandfathered tags, which predate the modern registry, are preserved for backward compatibility and include irregular forms starting with "i-" (e.g., "i-cherokee" for the Cherokee language) or other legacy patterns; these are not to be created anew but may be mapped to preferred equivalents in the registry.^[45] Extension subtags, introduced via single-character singletons (e.g., "u-" for Unicode locale extensions as defined in RFC 6067), allow for standardized additions like collation or numbering systems, further enhancing the tag's utility in software and protocols.^[46] The system is governed by RFC 5646 (published in 2009), which defines the syntax, semantics, and validity rules for tags, including case-insensitive matching and canonicalization to ensure interoperability.^[40] It establishes an IANA-maintained registry of subtags and grandfathered tags, updated through a formal process outlined in RFC 5645, to track descriptions, deprecations, and preferred values while preventing conflicts.^[47]^[48] Matching rules in BCP 47 prioritize exact matches but allow for fallback to broader tags (e.g., "en-US" matching "en" if needed), supporting flexible language negotiation in applications.^[49] This framework has been widely adopted in IETF RFCs for protocols requiring language identification, promoting consistency across the Internet ecosystem.^[50]

Applications and Implementation

In Computing and Software

Language codes, standardized primarily through IETF BCP 47, are integral to Unicode and UTF-8 text processing, enabling applications to handle multilingual content by specifying the language for rendering, collation, and script selection.^[51] In Unicode, these tags inform processes like bidirectional text layout and font fallback, ensuring correct display of scripts such as Arabic or Devanagari when combined with UTF-8 encoding, which supports the 159,801 assigned characters across 172 scripts in Unicode 17.0 (as of September 2025).^[52] For instance, a language tag like "ar-SA" signals right-to-left rendering for Arabic text in Saudi Arabia, optimizing processing in libraries like ICU (International Components for Unicode). In web technologies, language codes are applied via the HTML lang attribute to declare the primary language of document elements, aiding accessibility tools, search engines, and styling by informing screen readers and hyphenation rules.^[53] This attribute accepts BCP 47 tags, such as lang="fr-CA" for Canadian French, which propagates to child elements unless overridden. In CSS, the :lang() pseudo-class selector uses these codes for language-specific styling, like applying a serif font to French text with :lang(fr) { font-family: [Garamond](/page/Garamond); }, allowing targeted rules without altering HTML structure. Programming libraries leverage language codes for localization, adapting output to cultural conventions like date formats or currency symbols. In Python, the locale module uses codes in identifiers like "en_US.UTF-8" to set regional settings, enabling functions such as locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8') for German number formatting with commas as decimal separators.^[54] Similarly, Java's Locale class constructs objects from ISO 639 language codes and ISO 3166 country codes, as in Locale.forLanguageTag("ja-JP"), which influences DateFormat and NumberFormat for Japanese yen symbols and year-month-day ordering.^[55] For content management, language codes facilitate tagging in databases to enforce locale-specific collation during queries. In SQL Server, the COLLATE clause applies rules like COLLATE French_CI_AS for case-insensitive French sorting, ensuring accurate comparisons in multilingual tables storing varchar data.^[56] Search engines use these codes in hreflang annotations to deliver language-targeted results; for example, Google interprets hreflang="es-MX" to prioritize Mexican Spanish content for users in that region, improving relevance in multilingual queries. Systems address challenges with unknown or ambiguous codes through fallback mechanisms, defaulting to the "und" (undetermined) tag from BCP 47 when no specific language matches, preventing errors in processing mixed or unidentified content.^[40] This allows graceful degradation, such as rendering text without language-specific hyphenation, while broader matching rules extend to related variants like falling back from "en-GB" to "en".^[57]

In Linguistics and International Standards

In linguistics, language codes play a crucial role in documenting and assessing the vitality of the world's languages through comprehensive catalogs. Ethnologue, a primary reference for language data, employs ISO 639-3 three-letter codes to identify over 7,000 living languages, enabling detailed entries on their ecology, speaker populations, and status.^[58] These codes support vitality assessments using the Expanded Graded Intergenerational Disruption Scale (EGIDS), which evaluates degrees of endangerment based on intergenerational transmission and institutional support, as seen in the 26th edition's digital profiles for 7,168 languages.^[59] Similarly, Glottolog utilizes unique Glottocodes—stable alphanumeric identifiers—to catalog languages, dialects, and families, providing a foundational inventory for linguistic research and documentation without relying on vitality metrics but ensuring precise cross-referencing.^[60] The UNESCO Atlas of the World's Languages in Danger integrates these standards, assigning ISO 639 codes to approximately 2,500 endangered languages to map their geographic distribution, speaker numbers, and threat levels, aiding global preservation efforts. Language codes are embedded in international agreements to facilitate multilingual access and protection of intellectual property. In the World Intellectual Property Organization (WIPO) standards, such as ST.86 for patent data exchange, ISO 639-1 two-letter codes specify languages for document submissions and translations, ensuring equitable handling of copyrights across linguistic boundaries in over 190 member states.^[61] The European Union's multilingual policies, governing 24 official languages, incorporate ISO 639 codes in Eurostat classifications and legislative drafting to standardize language tagging in official communications, supporting equal access under the Treaty on the Functioning of the European Union (Article 342).^[62] At the United Nations, translation services for the six official languages (Arabic, Chinese, English, French, Russian, Spanish) plus others use three-letter ISO 639-2 codes to manage over 10,000 annual documents, enabling efficient workflow in multilingual proceedings as outlined in the UN's documentation guidelines.^[63] Bibliographic standards in libraries rely on language codes for precise cataloging and retrieval. The MARC (Machine-Readable Cataloging) format, maintained by the Library of Congress, adopts ISO 639-2 three-letter codes in the 041 field to denote the language of textual content, original versions, and translations, facilitating global interoperability in systems like WorldCat, which indexes millions of records. This integration, harmonized since 1998, allows librarians to tag resources accurately, supporting multilingual discovery in academic and public collections.^[64] In linguistic research, language codes enable the tagging of datasets in corpus linguistics for cross-linguistic analysis. Tools such as Wmatrix and the UAM Corpus Tool use ISO 639 identifiers to annotate multilingual corpora, allowing researchers to compare syntactic patterns, semantic shifts, or discourse features across languages like English and Spanish in projects examining typological diversity.^[65] This standardized tagging, as in the Glottolog-linked datasets, underpins comparative studies by ensuring consistent language identification, as evidenced in analyses of over 80 phylogenetic trees derived from coded inventories.

References

[1]
ISO 639:2023 - Code for individual languages and language groups
In stockThis document specifies the ISO 639 language code and establishes the harmonized terminology and general principles of language coding.
[2]
ISO 639-2 Language Code List - Library of Congress
Dec 21, 2017 · ISO 639-2 is the alpha-3 code in Codes for the representation of names of languages-- Part 2. There are 20 languages that have alternative codes.
[3]
ISO 639-1:2002 - Codes for the representation of names of languages
This part of ISO 639 provides a code consisting of language code elements comprising two-letter language identifiers for the representation of names of ...
[4]
ISO 639:2023(en), Code for individual languages and language ...
This document specifies the ISO 639 language code and establishes the harmonized terminology and general principles of language coding. It provides rules for ...
[5]
ISO 639-2:1998(en), Codes for the representation of names of ...
This part of ISO 639 provides two sets of three-letter alphabetic codes for the representation of names of languages, one for terminology applications and the ...
[6]
ISO 639-2 Language Code Agency - The Library of Congress
This document contains the ISO 639-2 Alpha-3 codes for the representation of names of languages.Codes for the representation of... · Development of ISO 639-2 · Date of changeMissing: early 20th century pre-<|separator|>
[7]
ISO 639 — Language code
ISO 639, Code for individual languages and language groups, can be applied across many types of organization and situations.
[8]
ISO 3166 — Country Codes
ISO 3166 is an international standard which defines codes representing names of countries and their subdivisions. The standard specifies basic guidelines for ...Country Codes Collection · Glossary for ISO 3166 · ISO 3166-1:2020 · ISO/TC 46Missing: 15924 | Show results with:15924
[9]
ISO 15924 Registration Authority - Unicode
Welcome to the official site of the ISO 15924 Registration Authority (ISO 15924/RA). ISO has appointed the Unicode Consortium as the Registration Authority for ...
[10]
ISO 639-1:2002(en), Codes for the representation of names of ...
ISO 639-1 was devised primarily for use in terminology, lexicography and linguistics. ISO 639-2 represents all languages contained in ISO 639-1 and in addition ...
[11]
Frequently Asked Questions (FAQ) - Codes for the representation of ...
ISO 639 provides two sets of language codes, one as a two-character code set (639-1) and another as a three-character code set (639-2) for the representation ...
[12]
Standard locale names - Globalization - Microsoft Learn
Apr 8, 2025 · BCP 47 is the standard that defines the most-commonly used format for specifying a locale. This format is used by Windows and many other environments.
[13]
Choosing a language tag - W3C
The BCP 47 specification allows for an additional, 3-letter subtag immediately after the initial primary language subtag. This is called an extended language ...
[14]
Three-letter Codes for Identifying Languages - Ethnologue
ISO 639-3 was devised to enable the uniform identification of all known languages in a wide range of applications, particularly including information systems.
[15]
RFC 9110: HTTP Semantics
HTTP uses language tags within the Accept-Language and Content-Language header fields. Accept-Language uses the broader language-range production defined in ...
[16]
RFC 3282: Content Language Headers
This document defines a "Content-language:" header, for use in cases where one desires to indicate the language of something that has RFC 822-like headers.
[17]
August Schleicher | Indo-European languages, comparative ...
The comparative method was developed and used successfully in the 19th century to reconstruct this parent language, Proto-Indo-European, and has since been ...
[18]
Our History - SIL Global
SIL Global (then known as the Summer Institute of Linguistics) began in 1934 as a summer training program in Arkansas, USA, with two students.
[19]
History | Ethnologue Free
Under the new scheme, SIL International has been named as the Language Coding Agency for preprocessing requests for changes to ISO 639-3 and for publishing ...
[20]
[PDF] The Use of vernacular languages in education
Linguistic Factors. A careful survey of the linguistic situation of a region by linguists is essential before it is decided which languages should be used in.
[21]
History of ISO/TC 37. - Standardization - Infoterm
Apr 29, 2013 · In 1936 ISA/TC 37 “Terminology” was established. In 1952 ISO/TC 37 “Terminology (principles and co-ordination)” became operational, its ...
[22]
History of ISO 639. - Infoterm
Mar 23, 2024 · ISO 639 and ISO 3166 In 1988, ISO/TC 37 published the 1st edition of ISO 639 Code for the representation of names of languages coinciding with ...
[23]
ISO 639-2:1998 - Codes for the representation of names of languages
Publication date. : 1998-11. Stage. : Withdrawal of International Standard [95.99]. Edition. : 1. Number of pages. : 66. Technical Committee : ISO/TC 37/SC 2.Missing: history | Show results with:history
[24]
ISO 639-3:2007 - Codes for the representation of names of languages
ISO 639-3:2007 attempts to provide as complete an enumeration of languages as possible, including living, extinct, ancient and constructed languages.
[25]
ISO 639-3 Language Codes Released with SIL as Registration ...
Feb 5, 2007 · The new standard, released February 5, 2007, greatly expands upon the 478 codes formerly provided by ISO 639-2, having the goal of comprehensive ...
[26]
ISO 639-3 |
- **Number of Codes**: Not explicitly stated, but based on Ethnologue 15th edition and harmonized with ISO 639-2 and Linguist List.
[27]
How to Distinguish Languages and Dialects - MIT Press Direct
The more practical problem with the criterion of mutual intelligibility is that measurements are usually simply not available. The second approach was ...
[28]
[PDF] To what degree are Croatian and Serbian the same language?
Thus the fact of mutual intelligibility is difficult to raise to the level of linguistic evidence in favor of linguistic identity, especially because of the ...
[29]
[PDF] The Ethics of Language Development - SIL Global
For example, Arabic [ara] is classified as a macrolanguage that includes about thirty different varieties of Arabic, each with its own ISO 639-3 code.
[30]
Arabic Language (ARA) - Ethnologue
Arabic is classified as a “macrolanguage” in the ISO 639 standard and is assigned to [ara] as its three-letter code.
[31]
[PDF] Colonization, Globalization and Language Vitality in Africa
Aug 29, 2008 · Overall, the popu- lations the most isolated from the socioeconomic structure inherited from the colonial regimes are those that have held on to ...
[32]
How many languages are there in the world? | Ethnologue Free
7159 languages are in use today. That number is constantly in flux, because we're learning more about the world's languages every day.
[33]
FAQs | SIL Global
Accepted changes are processed in early January and posted by January 31st the following year.
[34]
[PDF] Issues to Resolve in ISO 639 - Unicode
A macro-language, then, must be an entry in ISO 639-1 or ISO 639-2 that corresponds to multiple closely-related entries in ISO 639-3 that share a common name ...
[35]
[PDF] Lexvo.org: Language-Related Information for the Linguistic Linked ...
For deprecated ISO 639-3 language codes, the system points to relevant alternative language iden- tifiers. The information that Lexvo.org serves is ...<|separator|>
[36]
[PDF] PCC Guidelines for the Use of ISO 639-3 Language Codes in MARC ...
Jan 12, 2023 · SIL is the official maintainer of ISO 639-3. ISO 639-3 language codes may be searched or downloaded at the. SIL site at: https://iso639-3.sil ...
[37]
ISO 639-4:2010 - Codes for the representation of names of languages
ISO 639-4:2010 gives the general principles of language coding using the codes that are specified in the other parts of ISO 639 and their combination with ...
[38]
ISO 639-5 Language Coding Agency - The Library of Congress
Set 5 of ISO 639 supplements the coding of language groups and language families in ISO 639-2. ... This is the official site of the ISO 639-5 Language ...
[39]
Codes for the representation of names of languages (ISO 639-5 ...
ISO 639-5 codes ordered by Identifier ; euq, Basque (family), basque (famille) ; fiu, Finno-Ugrian languages, finno-ougriennes, langues ; fox, Formosan languages ...
[40]
ISO 639-6:2009 - Codes for the representation of names of languages
ISO 639-6:2009 specifies a method for establishing four-letter language identifiers (alpha-4) and language reference names for language variants.
[41]
RFC 5646 - Tags for Identifying Languages - IETF Datatracker
This document describes the structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language ...
[42]
https://datatracker.ietf.org/doc/html/rfc5646#section-2.2
[43]
https://datatracker.ietf.org/doc/html/rfc5646#section-2
[44]
https://datatracker.ietf.org/doc/html/rfc5646#section-2.2.7
[45]
https://datatracker.ietf.org/doc/html/rfc5646#section-2.2.8
[46]
https://datatracker.ietf.org/doc/html/rfc5646#section-2.2.6
[47]
https://datatracker.ietf.org/doc/html/rfc5646#section-3
[48]
https://datatracker.ietf.org/doc/html/rfc5645
[49]
RFC 5645 - Update to the Language Subtag Registry
This memo defines the procedure used to update the IANA Language Subtag Registry, in conjunction with the publication of RFC 5646, for use in forming tags for ...
[50]
https://www.iana.org/assignments/language-subtag-registry
[51]
iana.org language subtag registry
... language Subtag: ab Description: Abkhazian Added: 2005-10-16 Suppress-Script: Cyrl %% Type: language Subtag: ae Description: Avestan Added: 2005-10-16 ...
[52]
Unicode Locale Data Markup Language (LDML)
Unicode language and locale identifiers inherit the design and the repertoire of subtags from [BCP47] Language Tags. There are some extensions and restrictions ...Dates · Numbers · General · Collation
[53]
Language tags in HTML and XML - W3C
Mar 3, 2014 · Language tag syntax is defined by the IETF 's BCP 47. BCP stands for 'Best Current Practice', and is a persistent name for a series of RFC s ...
[54]
locale — Internationalization services — Python 3.14.0 documentation
The `locale` module accesses the POSIX locale database, allowing programmers to handle cultural issues without knowing each country's specifics.Locale... · Background, Details, Hints... · Locale Names
[55]
Locale (Java Platform SE 8 ) - Oracle Help Center
Use getCountry to get the country (or region) code and getLanguage to get the language code. You can use getDisplayCountry to get the name of the country ...Nested · Method
[56]
COLLATE (Transact-SQL) - SQL Server - Microsoft Learn
Mar 4, 2025 · The COLLATE clause can be applied only for the char, varchar, text, nchar, nvarchar, and ntext data types. COLLATE uses collate_name to refer to ...
[57]
Language Tags and Locale Identifiers for the World Wide Web - W3C
Oct 7, 2020 · The Unicode locale language tag extension [ RFC6067 ] uses the -u- subtag, and provides subtags for selecting different locale-based formats ...
[58]
Ethnologue | Languages of the world
Find, read about, and research all 7159 living languages. Ethnologue is the ultimate source of information on the world's languages.Browse the Countries of the... · Browse By Language Name · Credits · HistoryMissing: 1930s | Show results with:1930s
[59]
Welcome to the 26th edition | Ethnologue Free
Feb 21, 2023 · The 26th Ethnologue lists 7,168 living languages, 101 extinct languages, and includes digital vitality assessments, with 16,000 updates.
[60]
Glottolog 5.2 -
Glottolog provides a comprehensive catalogue of the world's languages, language families and dialects. It assigns a unique and stable identifier (the ...Languages · Families · About · Language Search
[61]
[PDF] ST.86 - Standards - WIPO
Feb 25, 2007 · ISO/IEC 10646 – UCS – Unicode UTF-8 MUST be used for character set. 25. ISO 639-1 (2-Letter Language Codes) MUST be used for Language Codes. 26.Missing: copyrights | Show results with:copyrights
[62]
Glossary:Language codes - Statistics Explained - Eurostat
Languages within or without the European Union (EU) have been assigned a two-letter language code, always written in small letters, and used for coding ...Missing: multilingual | Show results with:multilingual
[63]
Language Code - the United Nations
Jul 1, 2009 · Language codes are 3-letter, lower-case codes indicating language versions. Official UN languages are listed first, followed by non-official ...Missing: translation | Show results with:translation
[64]
ISO 639-2: International standard for language codes
The Library of Congress has been designated the Registration Authority for ISO 639-2; after initial publication of the ISO list, future development of the list ...Missing: history | Show results with:history
[65]
Wmatrix corpus analysis and comparison tool - UCREL
Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to natural language processing tools such as the USAS and CLAWS ...Missing: comparative | Show results with:comparative