Fact-checked by Grok 2 weeks ago

UTF-EBCDIC

UTF-EBCDIC is a transformation format designed specifically for compatibility with -based systems, enabling the encoding of all valid Unicode scalar values using a variable-length of 1 to 7 bytes per . It achieves this through a two-step process: first converting Unicode code points into an intermediate I8- modeled after but extended to handle up to 7 bytes, and then applying a reversible one-to-one mapping of those bytes to EBCDIC byte values that respect conventions such as those in IBM's 1047. Developed by and formalized in Technical Report #16, UTF-EBCDIC preserves the single-byte encoding of 65 control characters and 82 invariant graphic characters from , allowing legacy applications on platforms like IBM mainframes to process data without corruption of existing content. This format ensures that unrecognized multi-byte sequences can be safely ignored or skipped by parsers, while maintaining the relative order of scalar values in multi-byte encodings. Unlike , which is optimized for ASCII systems, UTF-EBCDIC avoids using certain byte ranges (such as C1 controls from 0x80 to 0x9F) to prevent conflicts with control codes. Primarily intended for internal use within homogeneous environments rather than open interchange, UTF-EBCDIC supports the integration of into legacy and systems via specific conversion APIs, such as those in the iconv family, without requiring full system-wide changes. Its implementation facilitates the handling of variant characters and international text in applications originally designed for , bridging the gap between modern standards and older mainframe architectures.

Introduction

Definition and Purpose

UTF-EBCDIC is a variable-length Transformation Format designed to encode all valid scalar values using sequences of 1 to 7 eight-bit bytes per . It follows a structure similar to but is specifically adapted for -based systems, incorporating a modified intermediate step followed by a reversible byte mapping to align with 's non-contiguous graphic zones and control code arrangements. This encoding ensures that certain remain represented as single bytes while supporting the full range of code points. The primary purpose of UTF-EBCDIC is to enable compatibility within legacy environments, such as systems running or , where traditional EBCDIC code pages like 037 predominate. It facilitates the processing and interchange of Unicode data in these homogeneous EBCDIC networks without requiring extensive modifications to existing applications, allowing seamless handling of mixed ASCII and EBCDIC content. Unlike encodings intended for open systems, UTF-EBCDIC is optimized for internal use in EBCDIC-dominant infrastructures to bridge the gap between modern Unicode requirements and historical data formats. A key benefit of UTF-EBCDIC is its preservation of single-byte invariants for common characters, including uppercase and lowercase letters (A-Z, a-z), digits (0-9), and various controls, which reduces conversion overhead and maintains for legacy workflows. For example, it encodes 65 control characters (U+0000 to U+009F, including C1 controls) and 95 graphic characters as single bytes, matching conventions, while multi-byte sequences handle other characters that legacy applications can safely ignore during parsing. IBM standardized UTF-EBCDIC as CCSID 1210 to support this encoding in its platforms.

Development History

In the early , the emergence of in 1991 presented challenges for systems, which had long relied on the encoding standard developed in the 1960s for business . 's widespread entrenchment in legacy infrastructure necessitated adaptations to incorporate support without disrupting existing applications, prompting to investigate -compatible transformation formats. IBM's National Language Technical Centre in Laboratory initiated development of an EBCDIC-friendly encoding, initially termed EF-UTF, in the late 1990s. The team, including Baldev Soor, Alexis Cheng, Rick Pond, Ibrahim Meru, and V.S. Umamaheswaran, disclosed the proposal to the Technical Committee on June 2, 1998, accompanied by a . This effort was influenced by prior extensions, such as CCSID 1047 for Latin-1 support in environments, to ensure compatibility with mainframe code pages. The Technical Committee approved the format as UTF-EBCDIC and published Unicode Technical Report #16 (UTR #16) on June 19, 2000, formalizing its specifications based on 2.0. Standardization progressed with the release of UTR #16 version 7.2 on April 29, 2001, aligning minor aspects with updates in 3.0, such as refinements to handling for consistency. integrated UTF-EBCDIC as CCSID 1210, with adoption accelerating through 1.2 (September 2001), which enhanced Unicode services for mainframe compatibility. No significant revisions occurred after 2002, reflecting the format's stability for EBCDIC systems.

Encoding Principles

Basic Structure

UTF-EBCDIC employs a variable-length encoding scheme to represent scalar values, utilizing between 1 and 7 bytes per character depending on the (up to 5 bytes suffice for the Unicode range U+0000 to U+10FFFF). This format ensures compatibility with -based systems by aligning byte patterns with traditional EBCDIC conventions, where single-byte characters occupy the range 0x00 to 0x9F. Characters equivalent to the Unicode range U+0000 through U+009F, particularly the EBCDIC invariants such as controls and basic Latin letters, are encoded in a single byte to maintain transparency with legacy EBCDIC data in zones like 0x00-0x9F. For all other characters, the encoding extends to 2 through 7 bytes: the leading header byte signals the sequence length and supplies the initial bits of the scalar value, followed by one or more continuation bytes in the range 0xA0 to 0xBF, each providing 5 data bits. Header bytes for longer sequences are positioned in EBCDIC-friendly high ranges, such as 0xF0 to 0xF7 for 4-byte starters, 0xF8 to 0xFB for 5-byte, 0xFC to 0xFD for 6-byte, and 0xFE to 0xFF for 7-byte starters, adapting UTF-8-like principles but shifted to avoid overlap with standard EBCDIC graphic and control zones. A key architectural feature is the absence of zero-extension or padding bytes; every byte in a multi-byte sequence contributes meaningful data bits toward the scalar value reconstruction, distinguishing it from encodings like UTF-16 that rely on surrogate pairs for extended planes. This direct encoding approach ensures efficient representation without unnecessary overhead. The format fully supports the repertoire up to U+10FFFF, encompassing all 17 planes, with 5 bytes required for code points U+40000 to U+10FFFF. To mitigate security risks such as byte-sequence confusion attacks, overlong encodings—where a could be represented in more bytes than necessary—are explicitly prohibited, mandating the use of the shortest valid sequence for each scalar value.

Byte Mapping Rules

The byte mapping rules for UTF-EBCDIC follow a two-step encoding process to convert a Unicode scalar value N into a sequence of 1 to 7 bytes compatible with EBCDIC environments. First, N is transformed into an intermediate I8-sequence using a modified UTF-8 algorithm with adjusted ranges to avoid conflicts with EBCDIC code points. The number of bytes in the I8-sequence is determined by the value of N: 1 byte for $0 \leq N < 0xA0 (U+0000 to U+009F), 2 bytes for $0xA0 \leq N < 0x400 (U+00A0 to U+03FF), 3 bytes for $0x400 \leq N < 0x4000 (U+0400 to U+3FFF), 4 bytes for $0x4000 \leq N < 0x40000 (U+4000 to U+3FFFF), 5 bytes for $0x40000 \leq N < 0x400000 (U+40000 to U+3FFFFF), 6 bytes for $0x400000 \leq N < 0x4000000 (U+400000 to U+3FFFFFF), and 7 bytes for $0x4000000 \leq N < 0x8000000 (U+4000000 to U+7FFFFFF). In the I8-sequence generation, lead bytes use specific bit patterns to indicate sequence length, while trailing bytes follow a uniform pattern of 101xxxxx. For example, in a 2-byte sequence, the lead byte is formed as 110yyyyy where yyyyy are the high 5 bits of N, and the trail byte is 101xxxxx with the low 5 bits. The bytes are derived using bit-shifting and masking: for a 2-byte sequence, the first byte is $0xC0 | ((N >> 6) \& 0x1F), and the second byte is $0xA0 | (N \& 0x1F). Similar operations apply to longer sequences, such as for 3 bytes: first byte $0xE0 | ((N >> 10) \& 0x0F), second $0xA0 | ((N >> 5) \& 0x1F), third $0xA0 | (N \& 0x1F). This ensures the shortest possible encoding without overlong forms. For instance, U+00A0 (non-breaking space) encodes as the I8-sequence 0xC2 0xA0. The second step applies a reversible byte mapping from each I8 byte to a corresponding UTF- byte, defined in a fixed table that preserves properties. This mapping adjusts for zones, such as placing invariant Latin letters A-Z in the range 0xC1 to 0xD9 (e.g., I8 0x41 for U+0041 maps to 0xC1), while remapping lead bytes like I8 0xC0 to 0x74 to avoid reserved or conflicting zones. The table ensures 82 graphic characters remain invariant between ASCII and subsets, and it handles control characters with OS-specific swaps, such as line feed and newline in environments. Decoding reverses this process: bytes are unmapped to the I8-sequence using the inverse table, then the I8-sequence is validated for . Validation checks sequence length based on the lead byte's bit pattern (e.g., 110xxxxx must start a 2-byte sequence followed by exactly one 101xxxxx), ensures no overlong encodings (e.g., rejecting 0xC0 0xA0 for U+0000), and confirms trailing bytes match 101xxxxx. Lead bytes must fall within expected ranges like 0xC0–0xDF for 2-byte starters. A "shadow flag" mechanism, based on leading bits post-unmapping, classifies bytes (e.g., flag '1' for single-byte characters, '9' for trailers) to detect malformations efficiently. For error handling, UTF-EBCDIC conforms to Standard requirements: invalid or ill-formed sequences, such as unmatched trailers or prohibited patterns (e.g., 0xFF 0xFF), trigger replacement with the Unicode replacement character U+FFFD, ensuring robust processing in systems.

Codepage Details

Layout Organization

The UTF- encoding utilizes a 256-byte codepage structure, systematically divided into zones to balance legacy compatibility with extensibility. The range 0x00–0x3F is designated for the 65 control characters, directly aligning with traditional conventions for these elements. The 0x40–0x7F range contains many invariant graphic characters and , such as at 0x40, ! at 0x4A, and . at 0x4B, reflecting the inherent layout in designs. Latin uppercase letters are accommodated in positions 0xC1–0xC9, 0xD1–0xD9, and 0xE2–0xE9, with the 0xC0–0xDF range also serving as lead bytes for 2-byte sequences. This zonal organization preserves 's non-contiguous layout, including gaps such as scattered punctuation placements, to ensure fidelity with legacy code pages like CCSID 037. By maintaining these structural idiosyncrasies in the invariant positions, UTF-EBCDIC functions as a superset of prior encodings, allowing single-byte codes to map precisely to established byte values for without requiring data transformation in existing applications. Beyond single-byte invariants, multi-byte extensions enable encoding of the full Unicode repertoire. Two-byte sequences (lead byte 0xC0–0xDF) address many characters beyond invariants. Three-byte sequences (lead byte 0xE0–0xEF) encode remaining characters, including CJK ideographs. Supplementary plane characters use 4- to 5-byte sequences (lead byte 0xF0–0xFB). Trailing bytes across these sequences are restricted to 0xA0–0xBF to prevent overlap with single-byte zones.

Invariant Characters

In UTF-EBCDIC, invariant characters consist of 82 graphic characters, including the , that occupy the same code positions across most single-byte EBCDIC code pages, ensuring compatibility with legacy EBCDIC data. These characters, drawn from the ASCII repertoire, are encoded using a single byte in their traditional EBCDIC positions, such as uppercase letters A–Z at hex C1–C9, D1–D9, and E2–E9; lowercase letters a–z at hex 81–89, 91–99, and A2–A9; and digits 0–9 at hex F0–F9. For instance, the (U+0020) is encoded at hex 40, and the letter 'A' (U+0041) at hex C1, differing from ASCII positions like hex 41 for 'A' to align with EBCDIC . This invariant set covers a subset of the ISO 646 (ISO/IEC 646) basic Latin characters, including A–Z, a–z, 0–9, and common punctuation such as !, ?, ., ,, ;, :, -, /, (, ), [, ], {, }, @, #, $, %, &, *, +, =, and ', but mapped to -specific zones rather than ASCII. By preserving these positions without alteration, UTF-EBCDIC allows direct operations with legacy hardware and software, preventing data corruption in mixed-environment files where invariant characters appear alongside multi-byte sequences. This design facilitates seamless processing in mainframe systems, as applications can treat invariant bytes as native without requiring full . To maintain security and uniqueness, UTF-EBCDIC forbids overlong representations for invariant characters, mandating the use of the shortest possible byte sequence—namely, the single-byte form—for these code points. This rule mirrors restrictions in other Unicode encodings like , preventing potential exploits such as byte-sequence injection or denial-of-service attacks in parsing.

Implementations

IBM Variant

IBM's official implementation of UTF-EBCDIC, designated as CCSID 1210, serves as the primary encoding for the full repertoire of characters on EBCDIC-based systems. This variant has been natively integrated into since version 1 release 2, released in 2001, and is utilized in core components such as DB2 and for handling Unicode data. Conversion utilities like ICONV enable seamless transformations between UTF-EBCDIC and other encodings, supporting system-wide data processing. The encoding scheme, designated as CCSID 1210 by , processes surrogate pairs in a manner analogous to , encoding them as multi-byte sequences without native surrogate support akin to UTF-16. Error handling in this implementation includes configurable modes within locales, such as replacement of invalid UTF-EBCDIC byte sequences with the question mark (?) character to maintain during processing.

Oracle UTFE

Oracle's Universal Transformation Format for (UTFE) is a partial implementation of the UTF-EBCDIC encoding form, introduced in Oracle Database 8i specifically for EBCDIC-based platforms to enable Unicode support in environments, such as with the AL32UTF8 character set. This variant optimizes data handling for database and applications on mainframe systems, providing a variable-length encoding scheme that aligns with EBCDIC byte patterns while extending to multilingual text processing. UTFE was implemented starting with Oracle Database 8i, allowing seamless integration with legacy EBCDIC code pages like WE8EBCDIC500 through built-in conversion functions. However, as of Oracle Database 21c, UTFE is not recommended for new implementations. A primary distinction of UTFE from the standard UTF-EBCDIC lies in its handling of characters beyond the Basic Multilingual Plane (BMP), where it treats UTF-16 surrogate pairs as independent encoding units rather than directly mapping them to scalar values. This surrogate-based approach, analogous to CESU-8 in UTF-8 contexts, ensures compatibility with Oracle's internal Unicode processing and Java's UTF-16 representations, avoiding disruptions in database operations that assume surrogate preservation. As a result, UTFE employs 1 to 4 bytes for most characters, extending to 5 or 6 bytes for astral plane code points encoded via surrogate pairs (typically two 3-byte sequences). UTFE is an -specific partial implementation of UTF-EBCDIC, incorporating extensions for enhanced database functionality, such as transparent conversions between EBCDIC-hosted Unicode data and client-side ASCII/UTF-8 streams. Byte patterns in UTFE maintain compatibility for invariant characters and common scripts, while the added multi-byte sequences for facilitate support for the full 3.0 repertoire, including supplementary characters up to over 1 million code points. This design prioritizes operational efficiency in Oracle's EBCDIC ecosystem without requiring full system migrations to ASCII-based encodings.

Usage and Compatibility

In Mainframe Environments

In IBM mainframe environments, UTF-EBCDIC serves as a specialized Unicode encoding form designed for compatibility with EBCDIC-based systems, primarily facilitating data storage and transmission in . It is supported through z/OS Unicode Services as CCSID 1210, enabling the handling of internationalized text in various system components, including VSAM files for persistent data organization and JES2/3 job streams for batch job processing and output management. This integration allows legacy EBCDIC applications to process Unicode data without extensive modifications, as UTF-EBCDIC preserves single-byte representations for control and invariant characters aligned with common EBCDIC code pages like CCSID 1047, while using multi-byte sequences for other Unicode characters. Software support for UTF-EBCDIC is embedded within key programming languages and database systems on . Enterprise and compilers leverage Unicode Services APIs, such as CUNLCNV for character conversion, to manage UTF-EBCDIC data alongside other Unicode forms like (CCSID 1208) and UTF-16 (CCSID 1200). In DB2 for , Unicode columns typically utilize CCSID 1208 for storage, but the database supports conversions to and from CCSID 1210 via built-in Unicode Services, ensuring seamless integration with EBCDIC-encoded tables in mixed environments. These capabilities extend to batch utilities like CUNMTUNI for and case conversion, supporting efficient processing of international text in and applications. UTF-EBCDIC has played a significant role in enabling within banking and sectors on mainframes, where high-volume demands reliable multi-language support. For instance, updates to code pages in the late , including the addition of the symbol (€) in CCSIDs such as 1140 and 1141 starting around 1999, prepared systems for the 's introduction and aligned with expansions via UTF-EBCDIC. This allowed to handle diverse currencies and scripts in environments without full system overhauls, supporting compliance with international standards while maintaining legacy . A key application of UTF-EBCDIC lies in batch processing scenarios involving mixed and files, where its design minimizes the need for full by retaining invariant bytes identical to EBCDIC for common characters like ASCII subsets. In , this is particularly useful for JES-managed job streams transmitting internationalized reports or VSAM datasets storing hybrid content, reducing processing overhead in environments like financial batch runs. Services handle these operations through APIs that support buffer-based conversions, ensuring scalability for large-scale data flows without performance degradation from unnecessary byte transformations.

Migration Considerations

Adopting UTF-EBCDIC in legacy systems presents several challenges, primarily due to the encoding's variable-length nature. Non-invariant characters, which include the 13 variant ASCII graphic characters that differ across code pages, must be represented using multi-byte sequences ranging from 2 to 5 bytes, necessitating scanning and replacement operations in applications originally designed for single-byte processing. Additionally, the potential for buffer overflows arises from data expansion, as 's single-byte characters can expand up to 5 times in size when converted to UTF-EBCDIC for supplementary characters beyond the Basic Multilingual Plane. Common pitfalls in migration include mishandling of surrogate pairs in applications developed before the of UTF-EBCDIC in , as these pairs—used to encode characters from U+10000 to U+10FFFF—are transformed into 4- to 5-byte sequences that legacy code may interpret incorrectly without proper decoding. To mitigate such issues, best practices recommend leveraging 's (ICU) library for robust conversions between code pages and UTF-EBCDIC, which supports aliases like ibm-1210 for seamless mapping. Thorough testing for round-trip integrity is essential, involving conversions such as to UTF-EBCDIC and back to ensure no , particularly for display and processing in mainframe environments. Tools like DFSORT in provide support for formats including UTF-EBCDIC through conversion utilities, facilitating manipulation during . From a cost-benefit perspective, incurs minimal changes for dominated by characters—such as the 82 consistent across code pages, which remain single-byte—allowing many legacy applications to process UTF-EBCDIC streams transparently. However, full audits are required for scripts like CJK or , which rely on multi-byte encodings and may demand significant buffer resizing and code updates to avoid corruption.

Comparisons

With UTF-8

UTF-EBCDIC and share fundamental similarities as variable-length Unicode encoding forms tailored for legacy systems, both supporting the full range of Unicode scalar values through multi-byte sequences and adhering to standard Unicode encoding algorithms for representation. Like , UTF-EBCDIC uses continuation bytes to pack additional data bits—specifically in the range 0xA0 to 0xBF, each carrying 5 bits—enabling efficient extension beyond single-byte invariants, though UTF-EBCDIC extends to up to 5 bytes per while is limited to 4. Despite these parallels, UTF-EBCDIC diverges significantly in its structure to align with conventions rather than , placing invariant characters (such as controls and graphics) in traditional byte positions; for example, the character 'A' is encoded as the single byte 0xC1 in UTF-EBCDIC, in contrast to 0x41 in . This -oriented layout, including the use of zone bits for case distinction, results in encoded text that typically requires as many bytes as or more for printable characters, leading to larger file sizes overall due to the less compact mapping for common Latin text. Transcoding between and necessitates a complete process via an intermediate I8-sequence (a modified UTF-8-like form), as the distinct bit patterns preclude any direct byte-for-byte remapping and ensure lossless round-trip conversion. A primary distinction is UTF-EBCDIC's deliberate avoidance of ASCII-compatible assumptions, preserving string-handling behaviors in mainframe environments for seamless integration with legacy I/O operations, whereas 's ASCII subset makes it the for protocols and open systems, rendering UTF-EBCDIC unsuitable for broad internet use.

With Legacy EBCDIC

Legacy code pages, such as CCSID 037 used for US English, are restricted to approximately 256 characters, encompassing a limited set of Latin letters, digits, punctuation, and control codes tailored to early requirements. UTF-EBCDIC extends this foundation by incorporating multi-byte sequences to encode the full range of code points, exceeding 1.1 million characters, including scripts beyond Latin alphabets like CJK ideographs and emojis. This extension allows EBCDIC-based systems to handle international text without abandoning their historical character mappings, as defined in Technical Report #16. Compatibility with legacy EBCDIC is achieved through a direct single-byte subset, where the base characters and control codes match those in CCSID 1047, ensuring no remapping is needed for existing US-English data in applications. For instance, invariant characters—such as a core set of 82 graphic symbols preserved across variants—remain encoded as single bytes, allowing legacy tools to process mixed UTF-EBCDIC streams by treating unrecognized multi-byte sequences as invalid without corrupting single-byte content. However, non-Latin scripts necessitate new multi-byte handling, requiring updates to logic in applications that encounter them. A distinctive feature of UTF-EBCDIC is the preservation of gaps in the legacy layout, such as positioning lowercase letters before uppercase (e.g., 'a' at 0x81 preceding 'A' at 0xC1 in CCSID 037), which contrasts with the contiguous ordering in ASCII-derived encodings. This retention facilitates the continued use of pattern-matching tools like on systems, as relative positions and unused ranges are maintained for multi-byte expansion without disrupting legacy search behaviors. While UTF-EBCDIC ensures strong with environments, it introduces a trade-off in : data encoded in this form is not directly compatible with ASCII-based systems, necessitating conversion utilities for exchange with open standards like UTF-8. This design prioritizes seamless integration within ecosystems over universal portability.

References

  1. [1]
    UTR #16: UTF-EBCDIC - Unicode
    UTF-EBCDIC is an EBCDIC-friendly Unicode transformation format, derived from Unicode using a two-step process, for use in EBCDIC systems.
  2. [2]
    UTF-EBCDIC - IBM
    UTF-EBCDIC is an encoding of Unicode that is friendly to EBCDIC data. It transforms Unicode characters to a form that is safe for EBCDIC systems.
  3. [3]
    UTR #16: UTF-EBCDIC
    ### Summary of UTF-EBCDIC from UTR #16
  4. [4]
    UTR #16: UTF-EBCDIC - Unicode
    Summary. This document presents the specifications of UTF-EBCDIC - EBCDIC Friendly Unicode (or UCS) Transformation Format.3 Definition · 3.3 Step 2: Byte Conversion · 6 Annex B: Additional...
  5. [5]
    Stabilized Technical Report
    ### Summary of UTF-EBCDIC from UTR #16
  6. [6]
  7. [7]
    Machine Interface Introduction - IBM
    ... EBCDIC invariant character set. This includes the following characters. 'A' = hex C1 'a' = hex 81 '0' = hex F0 ':' = hex 7A 'B' = hex C2 'b' = hex 82 '1' = hex ...
  8. [8]
    Unicode CCSIDs - IBM
    Unicode - most recent version supported, UTF-EBCDIC encoding. UH. 01232, Unicode - most recent version supported, UTF-32 encoding. (Suffix not applicable).
  9. [9]
    Converting data to ASCII, EBCDIC and UTF-8 - IBM
    UTF-8 is a multi-byte character encoding for UNICODE which can represent much more characters than EBCDIC. While no issue on returning data from a provider ...
  10. [10]
    Encoding Scheme - IBM
    A basic feature of a CCSID is its encoding scheme, which is uniquely identified by the hexadecimal encoding scheme identifier (ESID).Missing: date | Show results with:date
  11. [11]
    Supporting Multilingual Databases with Unicode - Oracle Help Center
    The properties of characters in the UTF8 character set are not guaranteed to be updated beyond version 3.0 of the Unicode Standard. Oracle recommends that ...
  12. [12]
    [PDF] Oracle9i Database Globalization Support Guide - Oracle Help Center
    ... UTFE are exceptions to the naming convention. Note: Use the server ... Oracle 9i Release 2 (9.2), to specify UTF-16 data. Note that the OCI_. UTF16 ...
  13. [13]
    [PDF] z/OS Unicode Services User's Guide and Reference - IBM
    Jan 29, 2020 · This information explains how z/OS references information in other documents and on the web. When possible, this information uses cross ...
  14. [14]
    Application programming with Unicode data and multiple CCSIDs
    Remote Db2 applications can send and receive DRDA command and reply message parameters that contain character type data encoded in Unicode CCSID 1208 (UTF-8).
  15. [15]
    Db2 12 - Installation and migration - Euro symbol support - IBM
    The tables below list the ASCII and EBCDIC CCSIDs that can be converted to CCSIDs that support the euro symbol.
  16. [16]
    UTR #16: UTF-EBCDIC - Unicode
    Apr 16, 2002 · Care has to be taken to correctly process the corresponding UTF-EBCDIC sequence corresponding to the surrogate pairs, similar to dealing with ...Missing: pitfalls | Show results with:pitfalls
  17. [17]
    Convert characters from Unicode to EBCDIC - IBM
    This function converts characters encoded in the Unicode character set to an EBCDIC character set.
  18. [18]
    Mixed EBCDIC (DBCS) Conversion Considerations with Unicode
    Jul 1, 2021 · The IBM Unicode conversion does not match the Microsoft Unicode conversion for all characters. These platforms use different standards. IBM i ...Missing: challenges | Show results with:challenges
  19. [19]
    Converter | ICU Documentation
    ICU provides comprehensive character set conversion services, mapping tables, and implementations for many encodings. Since ICU uses Unicode (UTF-16) internally ...