Fact-checked by Grok 2 weeks ago

Windows code page

Windows code pages are schemes employed by the Windows operating system to represent text using numeric identifiers, primarily supporting legacy applications, older mail servers, and console interfaces that predate widespread adoption. These code pages extend the 7-bit ASCII set (code points 0x00–0x7F) with additional characters in the 8-bit range (0x80–0xFF) to accommodate international glyphs, , and symbols specific to various languages and locales. Commonly referred to as "ANSI code pages," they are based on drafts from the but differ slightly from official standards like ISO 8859-1, with code page 1252 serving as the default for Western European languages in many English-language Windows installations. Historically, code pages emerged in the and to address the limitations of single-byte character sets in supporting diverse writing systems, starting with 8-bit single-byte character sets (SBCS) for Latin-based scripts and evolving to double-byte (DBCS) and multi-byte (MBCS) variants for such as (code page 932, based on ) and Simplified (code page 936, based on GBK). Distinct from ANSI code pages, OEM code pages—like 437 for English—were designed for environments, emphasizing line-drawing characters and compatibility with FAT file systems, and they remain relevant for console and legacy hardware interactions. In Windows, the active code page is system-wide and influences ANSI functions (prefixed with "A"), while Unicode-based APIs (prefixed with "W") bypass code pages entirely for broader compatibility. Although Windows has relied on Unicode internally since Windows NT for comprehensive global text handling, code pages persist for backward compatibility, enabling functions like MultiByteToWideChar and WideCharToMultiByte to convert between legacy encodings and Unicode. Microsoft recommends transitioning to Unicode encodings, such as UTF-8 (code page 65001), for new applications to avoid data corruption risks associated with varying system-default code pages across locales. Over 50 code pages are supported in modern Windows versions, covering languages from Arabic (code page 1256) to Vietnamese (code page 1258), but their use is increasingly limited to specific scenarios like regional file naming or command-line tools.

Fundamentals

Definition and Purpose

A Windows code page is a system that defines a mapping between byte values, typically in an 8-bit range, and specific characters, enabling the representation of text in legacy Windows applications and files. These mappings associate sequences of bytes—most commonly single bytes for 256 possible characters—with code points, allowing for the encoding and decoding of text data. Developed by , code pages extend the limitations of 7-bit ASCII by incorporating an additional 128 characters in the upper byte range (0x80–0xFF), which vary according to or regional requirements. The primary purpose of Windows code pages is to facilitate internationalization in software and systems prior to the widespread adoption of Unicode, supporting non-ASCII characters for various scripts such as Latin extensions, Cyrillic, Arabic, and others. By providing deterministic translations—often one-to-one for single-byte sets or many-to-one for multi-byte variants—code pages ensure compatibility for legacy applications, older mail and news servers, command-line tools, and document formats that rely on regional character sets. For instance, code page identifiers like CP1252 are used for Western European languages, assigning unique byte values to accented letters and symbols not covered by basic ASCII. This approach addressed the need for localized text handling in global markets without requiring a universal encoding standard at the time. Key characteristics of Windows code pages include their identification by numeric codes (e.g., 1252 for ANSI Western European), support for both single-byte character sets (SBCS) and double- or multi-byte character sets (DBCS/MBCS) for denser scripts, and a fixed that remains consistent within a given . Commonly referred to as "ANSI code pages" in Windows contexts, they are not identical to formal ANSI or ISO standards but are based on drafts like ISO 8859-1, with Microsoft-specific extensions for broader compatibility. While modern Windows primarily uses internally for universal character support, code pages persist for in transitional environments.

Types of Windows Code Pages

Windows code pages are categorized into several main types based on their intended use and mechanisms within the operating system. These include ANSI code pages, OEM code pages, multi-byte character set (MBCS) code pages, and Unicode-based code pages, each serving distinct roles in handling text data across different contexts such as graphical user interfaces, consoles, and internationalized applications. ANSI code pages, also known as active code pages (ACP), are primarily used for text rendering in Windows graphical user interfaces (GUI), file I/O operations, and legacy text files, with the default varying by system locale—for instance, code page 1252 (Windows-1252) for English-language systems. These single-byte encodings map the 256 possible byte values to characters, supporting Western European languages in the default case, and are retrieved programmatically via the GetACP API function. OEM code pages, in contrast, are designed for console applications, command-line interfaces, and compatibility with MS-DOS-era systems, often differing from ANSI pages to accommodate hardware-specific character sets like box-drawing symbols. For example, serves as the OEM default for English locales, and it can be queried using the GetOEMCP . These pages ensure proper display in text-mode environments but are locale-dependent and not suitable for cross-system data exchange without verification. Multi-byte code pages (MBCS) extend single-byte capabilities to support languages with extensive character sets, such as East Asian scripts, by employing a variable-length scheme where most characters use a single byte but others require a lead byte followed by a trail byte to encode extended glyphs. Examples include for () and 936 for Simplified (GBK), which allow Windows applications to double-byte characters seamlessly in MBCS-aware strings. This approach enables denser representation of non-Latin scripts but requires careful byte parsing to distinguish single- from multi-byte sequences. Unicode code pages represent a modern bridge between legacy systems and universal text encoding, incorporating UTF formats directly as code pages for ; notable examples are code page 1200 for UTF-16 little-endian and 65001 for , which support all characters without locale-specific limitations. These are increasingly recommended over traditional pages to avoid data corruption from varying system defaults, and they integrate with APIs like MultiByteToWideChar for conversions. All Windows s are identified by unique numeric identifiers (e.g., 1252 for ANSI Western European), which applications use to specify encoding in functions like CreateFile or registry queries. Character mappings for these s are stored in Support (NLS) files, such as C_1252.NLS, located in the %SystemRoot%\System32 directory, while active settings are configurable via registry keys under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage (e.g., ACP for ANSI). These mappings are loaded into memory by system DLLs like kernel32.dll during runtime for efficient text processing.

Historical Development

Origins in MS-DOS and Early Windows

The origins of Windows code pages trace back to the era in the , where they emerged as essential extensions to handle international characters and graphics on the PC platform. In 1981, with the release of the PC, (CP437) was introduced as the original OEM code page, extending the 7-bit US-ASCII standard to an 8-bit, 256-character set that included box-drawing graphics, mathematical symbols, and a selection of European accented characters to support text-based user interfaces and early applications. This code page, also known as OEM-US or PC-8, was designed for compatibility with the PC's hardware, particularly its display adapters, and became the foundational for systems. Key milestones in the 1980s built upon CP437 as the baseline for subsequent code pages. IBM established CP437 as the standard for the US English market, while Microsoft extended this framework for international MS-DOS versions to accommodate diverse linguistic needs. A notable example is CP850, introduced in 1987 with MS-DOS 3.3, which served as a multilingual extension for Western European languages, Latin America, and Canada, incorporating a broader set of Latin-1 characters while retaining compatibility with CP437's structure. These developments allowed MS-DOS to support country-specific variants, loaded dynamically to adapt to regional requirements without altering the core operating system. Technically, these early code pages operated as 8-bit supersets of 7-bit US-ASCII, where the first 128 characters (hex 00-7F) matched ASCII exactly, and the upper 128 (hex 80-FF) provided extensions for localized content such as diacritics and . In , country-specific code pages were configured via the file during boot, using commands like COUNTRY=XXX to load appropriate support files (e.g., COUNTRY.SYS) and DISPLAY.SYS to set console code pages, enabling seamless switching between encodings like CP437 for the or equivalents for other regions. This modular approach ensured hardware and software compatibility across global markets. Early Windows versions from 1.0 (1985) to 3.x (up to 1992) inherited these MS-DOS OEM code pages primarily for console and backward compatibility, maintaining support for CP437 and its variants in command-line interfaces and file systems. For graphical user interfaces (GUI), Windows introduced "ANSI" code pages—distinct from true ANSI standards but based on ECMA-94—to handle text rendering, with the initial set in Windows 1.0 evolving to include additional characters by Windows 3.1 and tying selections to regional settings for localized installations. This dual system of OEM for legacy DOS integration and ANSI for native Windows applications laid the groundwork for the platform's character encoding architecture.

Evolution and Standardization

With the release of in 1995 and in 1996, formalized the Windows-125x series as the primary ANSI code pages for single-byte character encodings in the Windows environment, establishing them as the default for text handling in graphical applications. These code pages, such as CP1252 for Western European languages, were designed to extend the ASCII range (0x00-0x7F) while aligning as closely as possible with the ISO/IEC 8859 standards, for instance mapping CP1252 to ISO/IEC 8859-1 for Latin-1 characters. However, alignments were incomplete due to proprietary extensions, including the addition of 27 printable characters in the 0x80-0x9F range of CP1252, which ISO/IEC 8859-1 left undefined for control codes. Simultaneously, Windows 95 introduced enhanced support for multi-byte character sets through Double-Byte Character Sets (DBCS), enabling efficient handling of languages requiring more than 256 characters, such as those in , by using lead and trail bytes for extended glyphs while preserving ASCII compatibility. This formalization was documented in the Windows 95 SDK, where conversion tables for code pages like CP1252 were provided in files such as UNICODE.BIN, supporting up to 18 code pages in international editions for bidirectional mappings between code pages and internal representations. Standardization efforts involved Microsoft's submission of the Windows-125x mappings to the (IANA) for official charset registration, with CP1250 (Central European) registered on May 3, 1996, following collaborative development with ISO/IEC to incorporate ISO 8859-2 mappings while adding vendor-specific characters. In the mid-1990s, Microsoft expanded the Windows-125x series to address global market needs, introducing CP1251 for Cyrillic scripts in 1995 to support languages like Russian and Bulgarian, building on earlier non-English Windows 3.1 implementations but integrating it fully into the Windows 95/NT ecosystem. This was followed by CP1256 for Arabic in 1996, which incorporated right-to-left text rendering and visual ordering adjustments, registered with IANA on May 3, 1996, to facilitate telecom and document exchange in Middle Eastern regions. These expansions reflected influences from ITU-T recommendations for international telecom encodings, such as those in Recommendation T.61 for teletex services, where Microsoft adapted mappings to ensure compatibility with global data transmission standards despite proprietary deviations. Key documentation of these code pages, including detailed glyph tables and conversion matrices, appeared in Microsoft SDK releases from 1995 onward, serving as authoritative references for developers despite the incomplete harmonization with ISO standards due to extensions for Windows-specific typography.

Transition to Unicode

The Windows NT family has used Unicode internally since Windows NT 3.1 in 1993, initially with UCS-2 encoding, providing Unicode support for the enterprise line while the consumer Windows 9x series continued relying on code pages. With the release of Windows 2000 in 2000, Microsoft upgraded the operating system's internal text encoding to native UTF-16, including surrogate pairs for full Unicode support beyond the Basic Multilingual Plane across applications and system components. This change marked a significant pivot from reliance on single-byte and multi-byte code pages, which were limited to specific language sets, toward a unified encoding capable of handling global scripts. However, to maintain compatibility with existing software, Windows retained support for legacy code pages through conversion APIs such as MultiByteToWideChar and WideCharToMultiByte, allowing applications to translate between code page-based strings and UTF-16 Unicode. Windows XP, released in 2001, introduced as 65001 (CP_UTF8), enabling limited support for this variable-length encoding in APIs and file handling, though initial implementations suffered from bugs, particularly in console output and certain localization scenarios. These issues persisted for years, prompting developers to favor UTF-16 for reliability, but addressed many through cumulative updates, with notable improvements in console handling by Windows 10's version 1607 (Anniversary Update) in 2016, enhancing stability for non-Unicode applications. By (May 2019 Update), further refinements included the activeCodePage property, allowing apps to declare as their default , and a system-wide option to set as the active ANSI via registry or settings (e.g., enabling "Beta: Use Unicode UTF-8 for worldwide language support" under Administrative language settings). As Windows evolved, code pages' role diminished, with explicitly marking them as legacy components by in 2021 to discourage new development reliance on them in favor of encodings like and UTF-16. This deprecation trend emphasized adoption for legacy non- apps through configuration tweaks, such as registry keys under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage, while ensuring for older systems and tools. Overall, the transition underscored 's superiority for , reducing the fragmentation caused by region-specific code pages while preserving via robust conversion mechanisms.

Single-Byte Code Page Families

Windows-125x Series

The Windows-125x series comprises a family of 8-bit single-byte code pages, designated as code pages 1250 through 1258, developed by to support Western European and other non-East Asian scripts in Windows operating systems. These code pages serve as the primary ANSI code pages for graphical user interfaces, extending the ISO/IEC 8859 family of standards by incorporating additional glyphs not defined in the ISO specifications, such as curly , em dashes, and other typographic symbols in the range 0x80–0x9F. Unlike the ISO 8859 standards, which reserve this C1 control range for non-printing characters, the Windows-125x implementations assign printable characters to these bytes to better accommodate common usage in word processing and display applications. Each code page in the series targets specific linguistic regions, mapping the first 128 bytes (0x00–0x7F) identically to the ASCII standard while using the extended range (0x80–0xFF) for language-specific accented letters, symbols, and punctuation. For instance, code page 1252 (Western European, also known as Windows Latin 1) was adopted in the 1980s as the default for English and other Western European languages, based on an early American National Standards Institute (ANSI) draft that preceded the finalization of ISO 8859-1; it includes characters like the en dash (–) at 0x96 and non-breaking space ( ) at 0xA0, which differ from ISO 8859-1's undefined or control assignments in the 0x80–0x9F block. Code page 1250 supports Central European languages (e.g., Polish, Czech) by extending ISO 8859-2 with additional diacritics; 1251 handles Cyrillic scripts (e.g., Russian, Bulgarian) based on ISO 8859-5; 1253 covers Greek, extending ISO 8859-7; 1254 addresses Turkish needs, modifying ISO 8859-9; 1255 encodes Hebrew from right-to-left, drawing from ISO 8859-8; 1256 supports Arabic, also based on ISO 8859-6; 1257 serves Baltic languages (e.g., Latvian, Lithuanian), extending ISO 8859-4 and 13; and 1258 accommodates Vietnamese, combining Latin characters with tone marks in a manner similar to but distinct from VISCII. The following table summarizes the key code pages in the series, their primary language coverage, and .NET encoding names:
Code PageDescriptionPrimary Languages/Region.NET Name
1250ANSI Central EuropeanCentral/Eastern European (Latin script)windows-1250
1251ANSI CyrillicCyrillic (Russian, Ukrainian, etc.)windows-1251
1252ANSI Latin 1 (Western)Western European (English, French, etc.)windows-1252
1253ANSI GreekGreekwindows-1253
1254ANSI TurkishTurkishwindows-1254
1255ANSI HebrewHebrewwindows-1255
1256ANSI ArabicArabicwindows-1256
1257ANSI BalticBaltic (Latvian, Lithuanian, etc.)windows-1257
1258ANSI/OEM VietnameseVietnamesewindows-1258
These mappings ensure compatibility with legacy single-byte systems while prioritizing printable output over strict adherence to international standards. A notable update to code page 1252 occurred post-1998, incorporating the symbol (€) at byte 0x80 to support the introduction of the European currency, alongside adjustments for characters like and ž; similar practical enhancements appear across the series to address regional requirements. In Windows environments, the active ANSI code page from the 125x series is determined by the system's and retrieved programmatically via the GetACP function, which returns the identifier for the current operating system's default ANSI (e.g., 1252 on English systems). These code pages remain prevalent in legacy text files (e.g., .txt documents saved without Unicode specification), email protocols, and older applications that rely on single-byte processing for efficiency in non-Unicode contexts, though recommends transitioning to Unicode encodings like for broader compatibility and to avoid locale-dependent variations.

OEM and DOS Code Pages

OEM and code pages refer to a family of 8-bit character encodings designed for use in systems and applications, where they handle text display and input in legacy environments. Unlike ANSI code pages, which prioritize characters in the upper byte range, OEM code pages allocate values from 0x80 to 0xFF primarily to symbols, including line-drawing elements, characters, and for text-based user interfaces. These encodings originated with the PC in the early , evolving from the initial support for regional variations. In , OEM code pages were configured and loaded through the COUNTRY= directive in the file, which specified a and optional identifier to enable appropriate character sets, keyboard layouts, and formatting conventions from a supporting file like COUNTRY.SYS. This mechanism allowed to adapt to different locales without altering the core 7-bit ASCII base (0x00–0x7F), which remained consistent across code pages. The first such , CP437, was introduced in 1981 for the and included dedicated slots for box-drawing characters to support applications like early text editors and games. Subsequent OEM code pages extended this model to other regions, replacing graphics symbols with language-specific characters while retaining many visual elements for compatibility. Key examples include:
Code PageNameRegion/Language
437IBM437
737IBM737
775IBM775
850IBM850 (Multilingual)
852IBM852
855IBM855Cyrillic (Russian)
857IBM857Turkish
860IBM860
861IBM861
863IBM863
865IBM865
These code pages, first expanded in 3.3 in 1987, provided essential support for non-English scripts in applications. A core characteristic of OEM code pages is their emphasis on legacy graphics in the 128–255 range, enabling block-mode interfaces in text-mode programs, though this limits their capacity for full international text compared to later encodings. In Windows, the active OEM can be retrieved programmatically using the GetOEMCP function, which returns the identifier for the system's default console encoding. These code pages maintain a legacy role in modern Windows, particularly in the Command Prompt, where they govern console output and input for compatibility with older executables. The chcp command allows switching between supported OEM code pages, but only the system's installed OEM page renders correctly in the console window, underscoring their persistence for DOS-era software and file systems like . Despite the shift to , OEM code pages remain integral for emulating historical behaviors in virtual DOS environments and non-Unicode-aware tools.

Other Single-Byte Encodings

Windows supports several single-byte encodings derived from international standards beyond its proprietary Windows-125x and OEM families, including adaptations of the , , and variants. These encodings facilitate with global standards for text handling in applications and , particularly in regions where specific scripts require precise mapping to 8-bit code points. The ISO/IEC 8859 series provides single-byte encodings for various Latin-based scripts, with Windows assigning dedicated code page identifiers for direct support. For instance, 28591 corresponds to ISO/IEC 8859-1 (Latin-1), covering Western European languages with characters such as accented letters and symbols for English, French, German, and . Similarly, 28599 maps to ISO/IEC 8859-9 (Latin-5), tailored for Turkish, incorporating letters like Ğ (U+011E) and I without dot (U+0131) to support the . These Windows mappings adhere closely to the ISO standards but include platform-specific implementations for accent handling and control codes, differing from extensions in Windows' own s. ITU-T code pages in Windows address and needs, drawing from standards like ISO/IEC 6937. Code page 20269 implements ISO/IEC 6937, a non-spacing accent encoding for Latin scripts used in early digital telephony and systems, allowing combined diacritics for efficient transmission of accented characters in bandwidth-limited environments. Additionally, code page 20866 supports , a Cyrillic encoding standardized in the 1990s for Russian text, originating from Soviet-era computing but adapted for post-Cold War interoperability in systems and early . KOI8 variants extend this support to other Cyrillic scripts, enhancing legacy Unix-Windows data interchange. Code page 21866 corresponds to KOI8-U, an extension of KOI8-R for Ukrainian, incorporating characters like Є (U+0404) and І (U+0406) while maintaining compatibility with the Russian base for shared Cyrillic layouts. These encodings remain relevant for processing older files from Eastern European systems, where full Unicode adoption has been gradual. All these single-byte encodings are integrated into Windows through APIs, such as MultiByteToWideChar and WideCharToMultiByte, enabling conversion to and from (UTF-16) for applications requiring backward compatibility. Support persists in modern Windows versions, including mappings for rare standards that have seen limited updates since the early 2000s, ensuring reliable handling of international legacy data without requiring custom implementations.

Multi-Byte and Specialized Code Page Families

East Asian Multi-Byte Code Pages

East Asian multi-byte code pages in Windows support CJK (, , ) languages by combining single-byte character sets (SBCS) for ASCII compatibility with double-byte character sets (DBCS) for ideographic s. These encodings use variable-width representations, where the first 128 code points (0x00–0x7F) encode ASCII characters in a single byte, while extended characters require two bytes: a lead byte typically in the range 0x81–0xFE followed by a trail byte that together represent a hanzi, , or syllable. The lead byte signals the start of a multi-byte sequence, allowing parsers to distinguish between single-byte and double-byte characters during text processing. For example, in 932, lead bytes occupy ranges such as 0x81–0x9F, enabling encoding of thousands of characters beyond the 7-bit ASCII limit. Trail bytes vary by code page but generally fall in non-overlapping ranges to avoid ambiguity with ASCII or single-byte extensions. This structure ensures with 8-bit systems while accommodating the vast character sets needed for East Asian scripts. Prominent East Asian multi-byte code pages in Windows include:
  • CP932: A Microsoft variant of the Shift JIS encoding for Japanese, developed in the 1990s to handle JIS X 0208 characters plus extensions; it supports over 6,000 kanji and kana.
  • CP936: The encoding for Simplified Chinese, initially based on GB 2312 but extended to GBK in Windows to include additional characters from GB 13000.1 for better coverage of modern usage in mainland China and Singapore.
  • CP949: An extension of EUC-KR based on the KS C 5601 standard for Korean, incorporating unified Hangul syllables and Hanja characters for compatibility with Windows Korean locales.
  • CP950: An extension of the Big5 encoding for Traditional Chinese, used in Taiwan and Hong Kong, with added characters for regional variants and compatibility.
These code pages feature Windows-specific extensions that go beyond international standards to include vendor-proprietary characters. For instance, CP932 adds special characters (such as row 13 extensions) and selections from extended sets, enabling support for legacy Japanese applications and fonts not covered in base . Similar extensions in CP936, CP949, and CP950 incorporate additional glyphs for practical Windows usage, such as user-defined or regional variants. In Windows environments, these multi-byte code pages serve as legacy encodings for East Asian regional settings, particularly in older applications, file systems, and command-line interfaces where adoption is incomplete. The operating system includes like IsDBCSLeadByteEx to validate lead bytes specifically in code pages 932, 936, 949, and 950, facilitating safe string manipulation and preventing misinterpretation of byte sequences in multi-lingual text.

EBCDIC Code Pages

, or , is an 8-bit developed by in the early as part of its /360 mainframe architecture to facilitate data interchange on punched cards and tapes. This encoding extends earlier BCD-based codes used in systems, assigning 256 possible values to characters while maintaining compatibility with decimal arithmetic through a structured bit layout. Unlike ASCII, which follows a sequential ordering where digits precede letters, EBCDIC places alphabetic characters before numeric digits in its collating sequence, a design choice rooted in legacy punched-card sorting practices. Windows provides support for code pages mainly to enable interoperability with environments, such as , where EBCDIC remains the native encoding for legacy applications and data stores. These code pages are classified as non-native encodings in Windows and are handled through system rather than as default system locales. Key variants include those tailored for specific languages and regions, using the EBCDIC framework's flexibility for adaptations. For instance, the high-order "zone" bits (bits 7-4) categorize characters into groups like , letters, and digits, while the low-order "numeric" bits (bits 3-0) define specific symbols within those groups, allowing variants to remap accented or characters without altering the core structure. The following table summarizes prominent Windows EBCDIC code pages, their associated names, and primary uses, drawn from Microsoft's supported code page specifications:
Code PageNameDescriptionCCSID (IBM Equivalent)
037IBM EBCDIC US-CanadaStandard for English text in North America37
500IBM EBCDIC InternationalSupports Western European languages500
870IBM EBCDIC Multilingual Latin 2For Central and Eastern European Latin scripts870
1047IBM EBCDIC Open Systems Latin 1POSIX-compliant variant for Western Latin1047
These mappings facilitate direct data exchange, such as file transfers or emulation, between Windows applications and systems, where zone bit variations accommodate country-specific characters like accented letters in or . Euro symbol extensions, such as in code page 1140 (a variant of 037), were added in later updates to support modern currency needs. In contemporary Windows environments, code pages are infrequently used for new development due to the prevalence of , but they persist for integration via conversion functions in the Win32 API, such as MultiByteToWideChar and WideCharToMultiByte, which handle EBCDIC-to- translations. This support is particularly relevant in enterprise scenarios involving Host Integration Server, where protocols enable seamless data flow between Windows clients and mainframes. maintains over 40 EBCDIC variants in its library, ensuring compatibility without native rendering in the Windows .

Macintosh Compatibility Code Pages

Windows includes a series of code pages specifically designed for compatibility with character encodings, facilitating cross-platform text handling for applications and file transfers. The foundational encoding is MacRoman (CP10000), a single-byte character set introduced by Apple in 1984 to support Western European languages on Macintosh computers. MacRoman uses bytes 0-127 for ASCII compatibility but assigns distinct characters to the 128-255 range, incorporating Apple-specific symbols like the (†), (), and various fractions, which differ significantly from equivalent ranges in Windows encodings such as CP1252. These compatibility code pages extend beyond Roman scripts to cover other Macintosh language systems, mapping them to Windows representations for bidirectional . Key examples include support for East Asian and right-to-left scripts, ensuring that text from Mac OS applications could be processed in Windows without loss of meaning.
Code Page IDIANA NameDescription
10000macintoshMAC ; Western European ()
10001x-mac-japanese ()
10002x-mac-chinesetradTraditional Chinese (; )
10003x-mac-koreanKorean ()
10004x-mac-arabicArabic ()
10005x-mac-hebrewHebrew ()
10006x-mac-greekGreek ()
10007x-mac-cyrillicCyrillic ()
10008x-mac-chinesesimpSimplified Chinese (GB2312; )
This table lists the primary Macintosh compatibility code pages supported in Windows, each tailored to a specific Mac OS script system. Bidirectional conversions between these code pages and other formats, including , are enabled through Windows National Language Support (NLS) APIs such as MultiByteToWideChar and WideCharToMultiByte, which handle mapping for accurate round-trip preservation where possible. These mechanisms were essential for pre-Unicode file exchanges, such as sharing documents between and Windows systems in mixed environments like publishing or office workflows. Developed during the 1990s amid growing interoperability needs between and Apple Macintosh platforms, these code pages addressed the challenges of diverse legacy encodings but have seen limited adoption in contemporary applications, overshadowed by standards.

UTF-8 and UTF-16 Integration

Windows has utilized UTF-16 as its primary internal encoding for text processing since the introduction of in 1993, initially based on UCS-2 and later extended to full UTF-16 support with surrogate pairs to handle Unicode code points beyond the Basic Multilingual Plane (), exceeding 65,536 characters. However, code page 1200 designates a UCS-2-compatible version of UTF-16 in little-endian byte order, limited to the , which is the default for Windows systems in code page conversion APIs, while code page 1201 specifies big-endian UTF-16 with the same limitation. This encoding serves as the native format for calls involving wide characters (WCHAR), ensuring efficient handling of Unicode data within the operating system kernel and user-mode applications, though code page conversions via 1200/1201 do not fully support . In contrast, support in Windows, designated as 65001, was introduced with in 2001 as a using 1 to 4 bytes per character to represent the full repertoire. enables compatibility with web standards and cross-platform text files, but its adoption was initially limited due to incomplete system integration. Significant improvements occurred starting with (May 2019 Update), which enhanced handling in the console, file I/O, and functions, making it viable for broader system-wide use without relying solely on UTF-16 conversions. Integration of these encodings into Windows code page mechanisms occurs primarily through string conversion s, such as WideCharToMultiByte, which accepts code page 65001 to convert UTF-16 wide strings to byte sequences, and the reciprocal MultiByteToWideChar for the reverse. Developers can specify these code pages explicitly for portability, bypassing locale-dependent ANSI code pages. Additionally, since , can be set as the active ANSI code page (ACP) either via a beta system locale configuration in Settings > Time & Language > Language > Administrative language settings (enabling "Beta: Use Unicode for worldwide language support," which updates the registry under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage to set ACP to 65001) or through the stable activeCodePage manifest element set to "" for individual applications; this was expanded in to also allow "Legacy" or specific code pages via manifests. This enables as the default for non-Unicode (A) variants like CreateFileA. Early implementations of code page 65001 in and Server 2003 exhibited limitations and bugs, including inconsistent handling of invalid sequences and partial support for operations like , which could lead to in multi-byte contexts. These issues, such as errors in pair processing and flag support in conversion functions, were progressively addressed through updates, with key fixes for and error detection by in 2006 and further refinements in subsequent service packs. Despite these advancements, remains non-default in most Windows configurations to maintain with legacy single-byte code pages, though recommends transitioning to or UTF-16 for new applications to avoid encoding pitfalls.

Other Unicode-Derived Code Pages

In addition to the primary UTF-8 and UTF-16 encodings, Windows supports several other code pages derived from or closely related to standards, primarily for legacy compatibility and specialized applications. Code page 1200 corresponds to in little-endian byte order, encoding the (BMP) of ISO/IEC 10646 and available only to managed applications (e.g., .NET). This format evolved from UCS-2, an earlier fixed-width 16-bit encoding limited to the BMP and incapable of representing characters beyond U+FFFF without surrogates; UCS-2 has been deprecated in favor of full UTF-16 since Unicode 2.0, though Windows maintains for applications assuming the legacy behavior. Code page 65000 implements , a designed to be safe for transport over 7-bit channels like email, where ASCII remains unmodified and non-ASCII characters are encoded using a modified scheme with escape sequences. Developed as part of early efforts, prioritizes compatibility with legacy protocols but is less efficient than and rarely used outside specific contexts like IMAP folders. Code page 1201 provides UTF-16 in big-endian byte order, mirroring the structure of CP1200 but with reversed byte serialization for network or where big-endian is conventional, and also available only to managed code environments in Windows, such as .NET applications. Windows also supports UTF-32 encodings via code page 12000 (little-endian) and 12001 (big-endian), fixed-width 32-bit formats that directly map code points without , available only to managed applications for scenarios requiring explicit 32-bit Unicode handling. The Standard Compression Scheme for Unicode (SCSU), a draft Unicode Technical Standard for reducing storage needs through dynamic windowing of frequent character ranges, sees limited implementation in Windows products like SQL Server for internal Unicode compression, but it lacks a dedicated code page identifier and is not exposed for general file or text handling. Experimental encodings like , proposed to map onto EBCDIC-compatible structures for mainframe , remain unimplemented as Windows code pages and are confined to niche or vendor-specific extensions post-2010.

Usage in Windows

API and System Implementation

Windows code pages are integrated into the operating system through the Support (NLS) subsystem, which provides APIs in kernel32.dll for managing and converting between code pages and representations. The NLS framework loads locale-specific data, including code page translation tables stored in binary NLS files (e.g., c_1252.nls for ), from the %SystemRoot%\System32 directory, enabling dynamic access to character mappings without embedding them in applications. Core APIs for querying active code pages include GetACP, which retrieves the current ANSI code page identifier used for non-Unicode text in the system locale, and GetOEMCP, which returns the OEM code page identifier typically used for console and compatibility operations. These functions allow applications to determine the system's default mappings for ANSI and OEM contexts, respectively, ensuring with legacy single-byte encodings. For validation, the IsValidCodePage checks whether a specified identifier (e.g., 1252 for Western European) is supported by the system, returning a nonzero value if valid. Character conversion between multi-byte code pages and Unicode is handled primarily by MultiByteToWideChar and its counterpart WideCharToMultiByte, both exported from kernel32.dll. MultiByteToWideChar translates a string from a specified to UTF-16, with flags such as MB_PRECOMPOSED (the default) directing the function to produce precomposed characters where possible, avoiding separate base and combining marks. Conversely, WideCharToMultiByte performs the reverse, mapping UTF-16 to a target , and supports similar flags to control decomposition behavior for compatibility with legacy applications. These APIs rely on NLS-loaded tables for accurate mappings and are essential for bridging -based data with internal processing. Error handling in these conversion functions addresses invalid byte sequences, which occur when input bytes do not map to valid characters in the source . Without the MB_ERR_INVALID_CHARS , MultiByteToWideChar substitutes such sequences with the system's default character (typically '?'), allowing partial conversions to proceed rather than failing entirely; setting the causes the function to return 0 and set GetLastError to ERROR_INVALID_PARAMETER upon encountering invalid input. This substitution mechanism prevents crashes in legacy code but can lead to , underscoring the importance of validating inputs via IsValidCodePage beforehand. In modern Windows versions starting from , the system prefers UTF-16 as the internal encoding for strings and UI elements, reducing reliance on code pages for core operations while maintaining support for . Starting with , beta system-wide support is available as an alternative to traditional ANSI code pages, configurable via administrative settings, and this feature is supported in Windows 11.

Regional and Language Settings

In Windows, users configure code pages through the Regional and Language Settings in the Settings app (or legacy Control Panel), specifically under Time & Language > Region > Administrative language settings, where the system locale can be changed to select a non-Unicode code page for legacy applications. For example, selecting the Russian (Russia) system locale sets the default ANSI code page to Windows-1251 (CP1251), which determines how non-Unicode programs interpret text characters. This configuration affects the active code page used by the system for ANSI operations, with changes requiring a restart to take effect. Locale IDs (LCIDs) in Windows associate specific code pages with languages and regions, enabling the system to retrieve relevant encoding information for internationalization. Each LCID is a 32-bit value combining a primary language identifier (lower 10 bits), sublanguage (next 6 bits), sort ID (next 4 bits), and reserved bits, with the default ANSI code page tied to the locale—for instance, LCID 1049 corresponds to Russian (Russia) and links to CP1251. Applications can query these associations using the GetLocaleInfo function with the LOCALE_IDEFAULTANSICODEPAGE constant to obtain the ANSI code page for a given LCID, supporting runtime locale-specific text handling. Multilingual User Interface (MUI) packs and language feature updates install additional locales and their associated code pages during or post-installation via the Settings app under Time & Language > Language > Add a language. These packs extend system support for non-default languages, automatically incorporating the corresponding code pages without altering the primary system locale. In recent versions, such as , has promoted as a configurable system locale option (labeled as a beta feature in Administrative settings), allowing users to set it as the default active code page (CP65001) for broader compatibility across legacy and modern applications. These settings directly impact compatibility for legacy applications that rely on the system locale's ANSI code page rather than Unicode, such as older versions of or console tools, where mismatched locales can result in garbled text display. Enabling as the system locale enhances cross-language support in such apps by standardizing on a Unicode-based encoding, though it may require application-specific adjustments for optimal rendering.

Limitations and Challenges

Common Encoding Problems

One prevalent issue in handling Windows code pages arises from , where text becomes garbled due to the assumption of an incorrect during decoding. This occurs because different code pages assign distinct byte values to characters, leading to systematic misinterpretation; for instance, ANSI code pages can vary across systems or be altered, resulting in when a file encoded in one code page is read using another. A common example involves a file saved in (CP1252), which maps the euro symbol (€) to byte 0x80, but when interpreted as CP850 (OEM Multilingual Latin I), this byte maps to (C with ), producing such as â in unrelated contexts or a replacement character in some displays if unsupported. Such mismatches are particularly frequent in legacy applications or cross-system file transfers without explicit encoding . Round-trip conversion problems further complicate text handling, where converting from a to and back fails to preserve the original data due to non-reversible mappings. In multi-byte like CP932 (Shift-JIS for ), multiple byte sequences may map to a single character, but the reverse conversion cannot unambiguously reconstruct the original bytes, leading to loss of information or altered content. This issue is exacerbated when system and thread code pages differ, as the conversion functions like MultiByteToWideChar and WideCharToMultiByte rely on the active code page context, potentially causing corruption during round-trip operations in multithreaded environments. Locale mismatches amplify these risks, especially when files from regions using incompatible code pages are processed on systems configured for different locales. For example, a document encoded in CP1251 (Windows Cyrillic) containing text, such as "Привет" (byte sequence like 0xCF 0xF0 0xE8 0xE2 0xE5 0xF2), will appear corrupted—often as Latin gibberish like Ïðèвåт—when viewed using a Latin-based like CP1252 on a Western European-configured Windows system. This corruption stems from the fundamental incompatibility between script-specific code pages, where bytes intended for Cyrillic glyphs overlap with Latin control or printable characters in other pages, rendering the text unusable without proper locale-aware handling. Detecting the correct code page poses significant challenges, particularly for legacy files lacking a (BOM), which is absent in traditional Windows code page encodings unlike or UTF-16 variants. Without , applications must rely on heuristics or user intervention, often leading to trial-and-error decoding attempts. In console environments, tools like the chcp command allow switching the active code page (e.g., chcp 1252 to set CP1252), but this only affects output display and does not retroactively detect or correct embedded file encodings, complicating troubleshooting in mixed-locale scenarios.

Deprecation and Migration Strategies

Microsoft recommends that developers prioritize Unicode encodings, such as UTF-8 or UTF-16, for new Windows applications to avoid the limitations of legacy code pages and ensure broad international support. This guidance emphasizes using UTF-8 for its efficiency in handling variable-width characters and compatibility with web standards, while UTF-16 remains suitable for internal string processing in Windows APIs. Since Windows 10 version 1903, Microsoft has provided a system configuration option to set UTF-8 as the default ANSI code page for legacy non-Unicode applications, accessible via the "Use Unicode UTF-8 for worldwide language support" setting in Region > Administrative settings, which modifies the registry key HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP to value 65001. This feature, initially introduced as beta, enables smoother transitions by interpreting legacy API calls through UTF-8, though it requires a system reboot and may impact performance in some scenarios. For bulk migration of existing data from code pages to Unicode, several tools facilitate conversion. PowerShell cmdlets, such as Get-Content -Encoding <codepage> to read files in legacy encodings (e.g., Default for ANSI) followed by Set-Content -Encoding utf8, allow scripted of text files to . Windows native like MultiByteToWideChar and WideCharToMultiByte provide programmatic conversion capabilities for developers integrating migration into applications. Additionally, the iconv utility, installable via for Windows or similar environments, supports command-line bulk conversions, such as iconv -f [WINDOWS-1252](/page/Windows-1252) -t [UTF-8](/page/UTF-8) input.txt > output.txt, for handling multiple files from specific code pages. Best practices for migration include embedding a Byte Order Mark (BOM) in UTF-8 files to assist legacy Windows applications in detecting the encoding, particularly for text files opened in Notepad or Excel. After conversion, validation using APIs like IsTextUnicode ensures data integrity by checking for valid Unicode sequences and flagging potential corruption from mismatched code pages. Enterprises adopting Windows 10 and later have increasingly implemented these strategies during upgrades, often combining automated scripts with testing phases to handle large-scale data shifts in multilingual environments. As of 2025, continues to promote adoption without announcing full deprecation of code pages, maintaining for legacy software while encouraging as the standard for future development.

References

  1. [1]
    Code Pages - Win32 apps - Microsoft Learn
    Oct 26, 2021 · Windows code pages, commonly called "ANSI code pages", are code pages for which non-ASCII values (values greater than 127) represent international characters.
  2. [2]
    Code Page Identifiers - Win32 apps - Microsoft Learn
    Jan 7, 2021 · The following table defines the available code page identifiers. Note: ANSI code pages can be different on different computers, or can be changed for a single ...
  3. [3]
    Code pages - Globalization - Microsoft Learn
    Feb 2, 2024 · A code page is a list of selected character codes (characters represented as code points). Code pages were originally defined to support a specific language.
  4. [4]
    GetACP function (winnls.h) - Win32 apps - Microsoft Learn
    Feb 22, 2024 · Returns the current Windows ANSI code page (ACP) identifier for the operating system. See Code Page Identifiers for a list of identifiers for Windows ANSI code ...Missing: identification | Show results with:identification
  5. [5]
    Console Code Pages - Windows Console - Microsoft Learn
    Feb 12, 2021 · A code page is a mapping of 256 character codes to individual characters. Different code pages include different special characters, typically customized for a ...
  6. [6]
    GetOEMCP function (winnls.h) - Win32 apps | Microsoft Learn
    Feb 22, 2024 · The GetOEMCP function returns the current OEM code page identifier for the operating system. Applications should use Unicode for consistent ...
  7. [7]
    Locales and Code Pages | Microsoft Learn
    Aug 3, 2021 · For example, the ANSI code page 1252 is used for English and most European languages, and the ANSI code page 932 is used for Japanese Kanji.Missing: OEM | Show results with:OEM
  8. [8]
    Unicode and Multibyte Character Set (MBCS) Support
    Mar 28, 2024 · Multibyte Character Sets (MBCS), char based single or double-byte characters and strings encoded in a locale-specific character set. Note.
  9. [9]
    MultiByteToWideChar function (stringapiset.h) - Win32 apps
    Feb 5, 2024 · The Windows ANSI code page for the current thread. Note: This value can be different on different computers, even on the same network. It can be ...Missing: MBCS | Show results with:MBCS
  10. [10]
    NLS Terminology - Win32 apps - Microsoft Learn
    Jan 7, 2021 · NLS terminology includes language groups, language for non-Unicode programs, Standards and Formats, thread locale, input language, and system ...Locale and Language Terms · Code Page<|control11|><|separator|>
  11. [11]
    setlocale, _wsetlocale - Microsoft Learn
    Jan 8, 2024 · Use the setlocale function to set, change, or query some or all of the current program locale information specified by locale and category.Missing: identification DLLs<|control11|><|separator|>
  12. [12]
    DOS codepages (and their history) - Aivosto
    Codepage 437 is the original IBM "PC-ASCII" codepage. It's the basis for all other codepages. Differences exist in the 80-FF (hex) range. In the charts that ...
  13. [13]
    Using Code Pages in DOS
    DOS uses the AUTOEXEC.BAT and CONFIG.SYS files to set up system code pages to support a national language. Examples of CONFIG.SYS commands are shown later in ...
  14. [14]
    Windows codepages (and their history) - Aivosto
    The Windows ANSI character set first appeared in Windows 1.0 in 1985. Despite its name, Windows ANSI was not actually based on any published ANSI standard.Missing: inheritance | Show results with:inheritance
  15. [15]
    [MS-WMF]: Glossary - Microsoft Learn
    Jun 24, 2021 · The term "ANSI" as used to signify Windows code pages is a historical reference and a misnomer that persists in the Windows community. The ...
  16. [16]
    Code Pages and the Windows 95 IFSMGR
    BIN supplied with the US edition of Windows 95 provides conversion tables for just the three code pages 437, 850 and 1252. Conversion From Unicode. A Unicode ...
  17. [17]
  18. [18]
    Windows-1250
    ... Registration of new MIME charset: Windows-1250 Date: Fri, 3 May 1996 09:49:45 -0700 MIME character set name: Windows-1250 Security Considerations Security ...Missing: CP1250 | Show results with:CP1250
  19. [19]
    Windows-1251
    ... Windows-1251 Date: Fri, 3 May 1996 09:49:41 -0700 MIME character set name: Windows-1251 Security Considerations: Security issues are not discussed in this ...
  20. [20]
    Windows-1256
    ... Windows-1256 Date: Fri, 3 May 1996 09:50:06 -0700 MIME character set name: Windows-1256 Security Considerations Security issues are not discussed in this ...
  21. [21]
    [DOC] biccencodingandunicodeconside... - Microsoft Download Center
    Starting from Windows 2000, Windows has full Unicode support implemented by using UTF-16 Encoding. Windows APIs take WCHAR* as input which represent a Unicode ...
  22. [22]
    Windows Command-Line: Unicode and UTF-8 Output Text Buffer
    Nov 15, 2018 · In this post, we'll discuss the improvements we've been making to the Windows Console's internal text buffer, enabling it to better store ...
  23. [23]
    Use UTF-8 code pages in Windows apps - Microsoft Learn
    Jul 18, 2025 · UTF-8 is the universal code page for internationalization and is able to encode the entire Unicode character set. It is used extensively on the ...
  24. [24]
    Unicode - Win32 apps | Microsoft Learn
    Jan 7, 2021 · An application can use the MultiByteToWideChar and WideCharToMultiByte functions to convert between strings based on code pages and Unicode ...
  25. [25]
    [MS-UCODEREF]: Supported Codepage in Windows - Microsoft Learn
    Jun 24, 2021 · The following table shows all the supported codepages by Windows. The Codepage ID lists the integer number assigned to a codepage.
  26. [26]
    List of DOS CONFIG.SYS commands
    Specifies where to load DOS. DOS=HIGH|LOW[,UMB|,NOUMB] HIGH Load DOS into the high memory area (HMA) if available. LOW Load DOS ...
  27. [27]
    chcp | Microsoft Learn
    Feb 3, 2023 · Only the original equipment manufacturer (OEM) code page that is installed with Windows appears correctly in a Command Prompt window that uses ...
  28. [28]
    Support for Multibyte Character Sets (MBCSs) | Microsoft Learn
    Oct 16, 2024 · For example, Japanese code page 932 uses the range 0x81 through 0x9F as lead bytes, but Korean code page 949 uses a different range. Consider ...<|control11|><|separator|>
  29. [29]
    IsDBCSLeadByteEx function (winnls.h) - Win32 apps - Microsoft Learn
    Oct 12, 2021 · This function validates lead byte values only in code pages 932, 936, 949, 950, and 1361. Use system default Windows ANSI code page. Use the ...
  30. [30]
    The IBM System/360
    The IBM System/360, introduced in 1964, ushered in a new era of compatibility in which computers were no longer thought of as collections of individual ...
  31. [31]
    EBCDIC Codes and Characters - Lookup Tables
    IBM released their IBM system/360 line around the same time ASCII was being standardized in the early 1960s. IBM therefore developed their own EBCDIC (Extended ...Missing: introduction date<|separator|>
  32. [32]
    Differences between Unicode and EBCDIC sorting sequences - IBM
    In EBCDIC, alphabetic characters are sorted before numeric characters. Because the Db2 catalog is stored in Unicode, any queries that you issue against Unicode ...Missing: letters digits
  33. [33]
    Sorting or Collating Sequence, EBCDIC and ASCII Environments
    The major difference is the numbers (or digits) are sorted after the alphabet (with lower case being sorted before upper case) for the EBCDIC environment. The ...
  34. [34]
    How it was: ASCII, EBCDIC, ISO, and Unicode - EE Times
    Due to the fact that IBM sold its computer systems around the world, it had to create multiple versions of EBCDIC. In fact, 57 different national variants were ...Missing: zone | Show results with:zone
  35. [35]
    EBCDIC Code Page Support (SNANLS)1 - Host Integration Server
    Apr 19, 2022 · The following table shows the EBCDIC code pages and character code set identifiers (CCSIDs) supported by SNA National Language Support (SNANLS) ...
  36. [36]
    National Language Support in Windows2 - Host Integration Server
    Apr 19, 2022 · Windows operating systems include EBCDIC-to-Unicode and Unicode-to-EBCDIC translation tables for all of the popular host code pages.<|control11|><|separator|>
  37. [37]
    Apple's MacRoman character set and equivalent Unicode and ...
    Jun 16, 2000 · The following table lists all of the 223 characters in Apple's proprietary MacRoman character set, and gives the Unicode name and numeric character reference.<|control11|><|separator|>
  38. [38]
    WideCharToMultiByte function (stringapiset.h) - Win32 apps
    Feb 5, 2024 · Note The ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the ...Missing: MBCS | Show results with:MBCS
  39. [39]
    Why does Windows use UTF-16LE? - Stack Overflow
    Feb 5, 2021 · Windows NT adopted UCS-2 as its internal encoding when everybody believed the number of Unicode code points would never go beyond 65536. When ...What's the point of UTF-16? - Stack Overflow"Windows uses UTF-16 as its internal encoding", what exactly does ...More results from stackoverflow.com
  40. [40]
    Unicode in Microsoft Windows - Wikipedia
    Windows [is] moving forward to support UTF-8 ... Microsoft Windows (Windows XP and later) has a code page designated for UTF-8, code page 65001 or CP_UTF8 .Missing: CP65001 | Show results with:CP65001
  41. [41]
  42. [42]
    [PDF] ISO/IEC International Standard ISO/IEC 10646 - Unicode
    ... UCS-2 which would be a subset of the UTF-16 encoding form restricted to the BMP UCS scalar values. The UCS-2 form is deprecated. 9.3 UTF-32 (UCS-4). UTF-32 (or ...
  43. [43]
    Unicode compression implementation - SQL Server | Microsoft Learn
    Aug 21, 2023 · SQL Server uses an implementation of the Standard Compression Scheme for Unicode (SCSU) algorithm to compress Unicode values that are stored in row or page ...
  44. [44]
  45. [45]
    National Language Support Functions - Win32 apps | Microsoft Learn
    Jun 13, 2022 · NLS supports functions like string comparison, locale conversion, getting locale scripts, and enumerating calendar information.
  46. [46]
    IsValidCodePage function (winnls.h) - Win32 apps - Microsoft Learn
    Oct 12, 2021 · The IsValidCodePage function determines if a specified code page is valid, returning a nonzero value if valid, and 0 if invalid. A valid code ...Missing: identification | Show results with:identification
  47. [47]
    Non-Unicode apps show question marks instead of Russian text ...
    Sep 8, 2025 · This issue occurs because the System Locale determines the ANSI code page used by non-Unicode apps (e.g., Windows-1251 for Russian) which ...
  48. [48]
    Set-WinSystemLocale (International) - Microsoft Learn
    The Set-WinSystemLocale cmdlet sets the system locale, which determines code pages and font fallback for legacy applications, and is primarily used for non- ...
  49. [49]
    [MS-LCID]: Windows Language Code Identifier (LCID) Reference
    Jun 24, 2021 · Describes localizable information in Windows. It lists all language code identifiers (LCIDs) available in all versions of Windows.
  50. [50]
    Locale Identifiers - Win32 apps - Microsoft Learn
    Jan 7, 2021 · Each locale has a unique identifier, a 32-bit value that consists of a language identifier and a sort order identifier.
  51. [51]
    GetLocaleInfoA function (winnls.h) - Win32 apps - Microsoft Learn
    Feb 9, 2023 · The ANSI string retrieved by the ANSI version of this function is translated from Unicode to ANSI based on the default ANSI code page for the ...
  52. [52]
    How can I get the default code page for a locale? - The Old New Thing
    Oct 7, 2016 · You can ask GetLocaleInfo for the LOCALE_IDEFAULTANSICODEPAGE to get the ANSI code page for a locale.
  53. [53]
    Overview of MUI - Win32 apps | Microsoft Learn
    Aug 20, 2021 · MUI technology is targeted at developers and ISVs aiming to build and support multilingual applications for the Windows platform.
  54. [54]
    Available Language Packs for Windows - Microsoft Learn
    Jan 7, 2022 · The following tables show the supported language packs for Windows desktop editions and Windows Server, and supported language interface ...Languages overview · Add languages · The speech recognition FODMissing: pages | Show results with:pages
  55. [55]
    UCRT Locale names, Languages, and Country/Region strings
    Jun 18, 2025 · The locale can be set by using the locale names, languages, country/region codes, and code pages that are supported by the Windows NLS API.
  56. [56]
    about_Character_Encoding - PowerShell | Microsoft Learn
    Jan 19, 2024 · Default is the encoding specified by the active system locale's ANSI legacy code page. Export-Csv creates Ascii files but uses different ...
  57. [57]
    Use Unicode! - Dr. International - Microsoft Developer Blogs
    Sep 12, 2023 · They usually internally convert to Unicode, then back to the system encoding, making them slower than the native “W” Unicode APIs.
  58. [58]
    What does "Beta: Use Unicode UTF-8 for worldwide language ...
    Jun 2, 2019 · Win+R -> intl.cpl · Administrative tab · Click the Change system locale button. · Enable Beta: Use Unicode UTF-8 for worldwide language support ...How to uncheck "Beta: Use Unicode UTF-8 for worldwide language ...Why is the /utf-8 flag in MSVC not allowing my program to display ...More results from stackoverflow.com
  59. [59]
    Converting text file to UTF-8 on Windows command prompt
    Jan 5, 2017 · You can easily do this with PowerShell: Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt This method will convert to UTF-8-BOM.Change default code page of Windows console to UTF-8 - Super UserChange OutputEncoding and code page - Super UserMore results from superuser.com
  60. [60]
    iconv: Convert from CP1252 to UTF-8 - Stack Overflow
    Mar 15, 2013 · When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8 with command iconv.exe -f CP1252 -t UTF-8 test.txt >testout.txt then the ...iconv: Converting from Windows ANSI to UTF-8 with BOMHow to detect the equivalent Windows code pages for UTF8 textMore results from stackoverflow.comMissing: bulk | Show results with:bulk
  61. [61]
    Using Byte Order Marks - Win32 apps | Microsoft Learn
    Sep 26, 2024 · For new applications, UTF-8 is recommended. Ideally, all Unicode text follows only one set of byte ordering rules.
  62. [62]
    Migrating to Unicode - W3C
    Apr 11, 2008 · This article provides guidelines for the migration of software and data to Unicode. It covers planning the migration, and design and implementation of Unicode- ...