Data Coding Scheme
A Data Coding Scheme (DCS) is a one-octet field defined in the Short Message Service (SMS) protocol for GSM, UMTS, LTE, and other mobile cellular networks, which specifies the encoding format of the message user data field (TP-UD) and may indicate additional attributes such as message class or compression.[1] This field ensures that recipient devices correctly interpret the content of short messages and cell broadcast messages by signaling the character set, such as GSM 7-bit default alphabet (supporting up to 160 characters), 8-bit binary data (up to 140 octets), or UCS2 (up to 70 characters). Standardized in 3GPP Technical Specification TS 23.038, the DCS plays a critical role in interoperability across mobile networks by preventing garbled text or data loss during transmission. The structure of the DCS octet divides into two 4-bit nibbles: the most significant nibble (bits 7-4) identifies the coding group, while the least significant nibble (bits 3-0) provides group-specific details like character set indicators or message classes.[1] For instance, coding group 00 (binary 00xx) is used for general data coding, where bit 5 signals compression (0 for uncompressed, 1 for compressed per TS 23.042), bits 3-2 select the character set (00 for GSM 7-bit, 01 for 8-bit data, 10 for UCS2, and 11 reserved), and bits 1-0 denote the message class if bit 4 is set (e.g., 00 for Class 0 immediate display, 01 for Class 1 (ME storage)). Other groups handle specialized cases, such as 01xx for messages marked for automatic deletion, 11xx for message waiting indications (MWI), and reserved values that default to GSM 7-bit encoding to maintain backward compatibility.[1] In practice, the DCS enables efficient SMS delivery in protocols like SMPP (Short Message Peer-to-Peer), where it informs gateways and handsets about text encoding to support international characters or binary payloads like ringtones and logos.[2] Common values include DCS 0x00 for plain GSM 7-bit text without class specification and 0x08 for UCS2-encoded messages, ensuring broad support across diverse devices while adhering to the 140-octet SMS limit (adjusted for DCS and other headers).[3] Evolving with mobile standards, the DCS remains essential for legacy and modern applications, including over-the-air updates and multimedia messaging precursors.Fundamentals
Definition and Purpose
The Data Coding Scheme (DCS) is an 8-bit field within the Transfer Protocol Data Unit (TPDU) of Short Message Service (SMS) protocols, used to specify the encoding of user data, including character sets, languages, and message handling attributes. This field, denoted as TP-Data-Coding-Scheme (TP-DCS), appears in key TPDUs such as SMS-DELIVER, SMS-SUBMIT, and SMS-STATUS-REPORT, providing essential metadata for message interpretation without requiring prior negotiation between sending and receiving entities.[4] The primary purpose of DCS is to enable receiving devices, such as mobile stations (MS), to correctly decode and render message content, supporting features like multilingual text display and specific delivery behaviors, including message classes and compression status. By indicating the alphabet used—such as the GSM 7-bit default or UCS2—it ensures accurate processing of diverse data types, from plain text to binary content, while facilitating interoperability in heterogeneous networks. This mechanism also allows for enhanced functionalities, such as immediate display for flash messages or storage for normal ones, optimizing user experience across global mobile systems. Introduced in the GSM 03.38 standard (now 3GPP TS 23.038) for alphabets and data coding during the 1990s by the European Telecommunications Standards Institute (ETSI), with the TP-DCS field specified in the contemporaneous GSM 03.40 standard, DCS addressed the need for standardized SMS handling as mobile networks expanded internationally.[5] Over time, it evolved to accommodate globalization demands, incorporating support for additional encodings and features such as the Enhanced Messaging Service (EMS), initially introduced in 3GPP Release 99 with enhancements in Releases 4 and 5, while preserving backward compatibility with early GSM implementations. Key benefits of DCS include promoting seamless interoperability between diverse devices and carriers, thereby reducing errors like garbled text from encoding mismatches, and enabling efficient resource use through options like data compression. These attributes have been instrumental in the widespread adoption of SMS, supporting billions of daily messages worldwide by ensuring reliable, context-aware delivery.Bit Structure and Encoding
The Data Coding Scheme (DCS) is an 8-bit field used in Short Message Service (SMS) protocols to specify the encoding and handling attributes of the message user data.[5] The octet is structured with bits numbered 7 (most significant) to 0 (least significant), where bits 7 to 4 define the primary coding group, and bits 3 to 0 provide group-specific details such as character set, message class, or indications.[5] This division allows for up to 16 possible values in the default coding group (hexadecimal 00 to 0F), with higher ranges reserved for specialized or extended encodings.[5] In the default coding group (bits 7 to 6 set to 00), bit 5 indicates whether compression is applied (0 for uncompressed, 1 for compressed), bit 4 signals the presence of a message class (0 for absent, 1 for present), bits 3 to 2 specify the character set encoding (00 for GSM 7-bit default alphabet, 01 for 8-bit data, 10 for UCS2 16-bit, 11 reserved), and bits 1 to 0 denote the message class when bit 4 is 1 (00 for Class 0, 01 for Class 1 ME-specific, 10 for Class 2 SIM-specific, 11 for Class 3 TE-specific).[5] Bit 7 serves as a high-level group indicator (0 for general or default groups, 1 for extended groups like message waiting indications), while bits 6 to 4 further delineate subgroups within those categories, such as 000 for basic GSM 7-bit, 001 for 8-bit binary, or 010 for UCS2.[5] For groups like message waiting indication (bits 7 to 4 as 1100 or 1101), bits 3 to 0 shift to indication status and type (e.g., bit 3 for active/inactive, bits 1 to 0 for voicemail, fax, email, or other).[5] The encoding rules combine these bits to form standardized values, ensuring compatibility across SMS implementations; for instance, values 00 to 0F (binary 0000xxxx) exclusively use the general data coding structure, defaulting to the GSM 7-bit alphabet if unspecified.[5] Reserved ranges (e.g., bits 7 to 4 as 1000 to 1011 or 1111 with specific sub-bits) allow for future extensions or alternative codings like UCS2-based indications, but receiving devices treat undefined values by falling back to the default GSM 7-bit encoding.[5] A common example is DCS value 0x00 (binary 00000000), which denotes uncompressed data using the GSM 7-bit default alphabet with no message class or additional indications.[5]| Coding Group (Bits 7-4) | Bit 5 Meaning | Bit 4 Meaning | Bits 3-2 Meaning | Bits 1-0 Meaning | Example Hex Value |
|---|---|---|---|---|---|
| 00xx (General Data) | Compression (0/1) | Class present (0/1) | Character set (00=GSM 7-bit, 01=8-bit, 10=UCS2) | Class (if bit 4=1: 00=Class 0, etc.) | 0x00 (GSM 7-bit, no class) |
| 01xx (Auto Deletion) | Same as 00xx | Same as 00xx | Same as 00xx | Same as 00xx | 0x45 (8-bit, Class 1) |
| 1100 (MWI Discard) | N/A | N/A | Indication active (0/1, bit 3) | Type (00=Voicemail, etc.) | 0xC0 (Inactive voicemail) |
| 1101 (MWI Store, GSM 7-bit) | N/A | N/A | Same as 1100 | Same as 1100 | 0xD9 (Active fax) |
| 1110 (MWI Store, UCS2) | N/A | N/A | Same as 1100 | Same as 1100 | 0xEA (Active email) |
| 1111 (Data/Class) | N/A | N/A | Reserved (0, bit 3) | Class (00=Class 0, 01=Class 1, 10=Class 2, 11=Class 3) | 0xF0 (GSM 7-bit, Class 0) |
Character Sets and Message Classes
Supported Character Sets
The Data Coding Scheme (DCS) specifies multiple character encoding options to accommodate diverse text representations in short message services, primarily through bit combinations in an 8-bit field that indicate the alphabet or data type used for the message user data. These encodings balance efficiency, capacity, and multilingual support, with selections made via bits 3 and 2 in the general coding group (bits 7-4 set to 0000).[6] The GSM 7-bit default alphabet serves as the primary encoding for text messages, supporting a 128-character set optimized for Western European languages, including basic Latin letters, digits, and common symbols such as punctuation and currency signs. Characters are packed into 7-bit units within 8-bit octets, allowing a maximum of 160 characters per short message while maintaining compatibility across GSM networks; an extension table enables access to additional symbols like curly braces, but each use reduces the effective displayable characters by one due to the escape mechanism. This encoding is mandatory for all mobile stations (MS) and service centers (SC).[6] For non-textual content, the 8-bit binary or data coding treats the message as an unstructured octet stream, suitable for applications like ringtone downloads, operator logos, or other binary payloads, with a capacity of up to 140 octets per message. This scheme provides flexibility for user-defined data without imposing a specific character interpretation, though it requires explicit handling by receiving devices.[6] UCS2, a 16-bit encoding based on the Unicode standard, enables support for a broader range of international scripts, including non-Latin languages such as Arabic, Hebrew, Chinese, and Cyrillic, by representing each character with two octets. This results in a reduced capacity of 70 characters per message but facilitates global interoperability for multilingual content. UCS2 is selected when bits 3-2 are set to 10 in the general coding group.[6] Reserved coding combinations, particularly when bits 3-2 are set to 11 in the general group, default to the GSM 7-bit alphabet for backward compatibility, ensuring that unsupported schemes do not disrupt basic text transmission. For national language variants, such as Turkish or Portuguese, the DCS incorporates shift mechanisms within the GSM 7-bit encoding: single-shift tables add specific characters to the default alphabet, while locking-shift replaces it entirely with a language-specific set using escape characters (e.g., 0x1B for single shift, 0x1D for locking shift). In practice, national language variants such as Turkish or Portuguese are supported by invoking language-specific single-shift or locking-shift tables via escape characters within the GSM 7-bit default alphabet, allowing addition or replacement of characters without changing the DCS coding group. These features address regional character needs without requiring full Unicode overhead.[6] The specification has evolved through multiple 3GPP releases, with the latest version 19.0.0 (October 2025, Release 19) maintaining core character set definitions while incorporating updates to language-specific information as needed.[6]| Coding Group (Bits 7-4) | Bits 3-2 Character Set | Description |
|---|---|---|
| 0000 (General) | 00 | GSM 7-bit default alphabet |
| 0000 (General) | 01 | 8-bit binary/data |
| 0000 (General) | 10 | UCS2 (16-bit Unicode) |
| 0000 (General) | 11 | Reserved (defaults to GSM 7-bit) |
Message Class Types
The Data Coding Scheme (DCS) in SMS messaging categorizes messages into four distinct classes to determine their handling, storage, and presentation by the receiving device. These classes are indicated within the TP-DCS octet, allowing the Short Message Service Center (SMSC) or originating entity to specify how the message should be processed upon delivery. Message classes are encoded using bits 1 and 0 of the DCS octet when bit 4 is set to 1 (indicating a class is present), with bits 3-2 reserved (set to 00 by sender), within coding group 0000 (bits 7-4 = 0000) for uncompressed text. The encoding overrides any default behavior, with the following bit patterns for bits 1-0: 00 for Class 0, 01 for Class 1, 10 for Class 2, and 11 for Class 3. This mechanism ensures compatibility across different message alphabets while prioritizing the specified storage and display rules.| Class | Bit Pattern (Bits 1-0) | Description | Storage and Handling |
|---|---|---|---|
| 0 | 00 | Immediate display (flash message) | Displayed directly on the device screen without user interaction; not stored in any memory. |
| 1 | 01 | Mobile Equipment (ME)-specific | Stored in the device's ME memory; accessible via the user interface and potentially the SIM if configured. |
| 2 | 10 | SIM-specific | Stored exclusively on the SIM or USIM card; not accessible through the standard user SMS interface. |
| 3 | 11 | Terminal Equipment (TE)-specific | Forwarded to an external TE, such as a connected computer or peripheral, for storage and processing. |
Indication and Control Features
Message Waiting Indication
Message Waiting Indication (MWI) is a feature within the Data Coding Scheme (DCS) that enables short message service centers to notify mobile devices of pending messages, such as voicemails or faxes, through dedicated SMS messages. This mechanism uses specific DCS values to signal the presence or absence of waiting messages, allowing the receiving mobile equipment (ME) to display appropriate indicators like icons or play alert sounds without delivering full message content. Introduced as part of the ETSI GSM 03.40 specification in its version 5.1.0 released in March 1996, MWI has been a standard component of SMS protocols for legacy circuit-switched networks.[8][5] The DCS octet for MWI belongs to the indication group, where bits 7 to 4 are set to 1100 for discard mode (message processed but not stored) or 1101 for store mode (message stored in GSM 7-bit alphabet, with an additional 1110 option for UCS2 storage). Bit 3 indicates the sense (1 for active/waiting, 0 for inactive/no waiting), bit 2 is reserved (set to 0), and bits 1 to 0 specify the type: 00 for voicemail message waiting, 01 for fax, 10 for electronic mail, or 11 for other. For the "other" category, up to 14 additional subtypes can be distinguished via associated SIM/USIM storage records (EF_MWIS), enabling support for diverse services like short message waiting or multimedia alerts. Upon receipt, the ME interprets the DCS to update the device's visual or audible indicators and stores the status in the SIM/USIM's Message Waiting Indication Status (MWIS) elementary file, regardless of available memory; the originating address may also be retained if supported.[5][7] For voicemail specifically, the basic DCS MWI sets the active/inactive flag, while new and deleted message counters are updated via separate SMS messages employing the Special SMS Message Indication in the user data field, which conveys numerical counts and timestamps. This layered approach allows precise status reporting without overloading the core indication SMS. The receiving device acknowledges MWI SMS irrespective of storage capacity, ensuring reliable delivery of alerts.[9][10] Although widely implemented in GSM and UMTS networks, MWI support is not universal, as it remains an optional feature dependent on network operator and device capabilities. In modern IP-based messaging systems like Rich Communication Services (RCS), which leverage the IP Multimedia Subsystem, traditional SMS-based MWI has been largely supplanted by SIP-based protocols defined in 3GPP TS 24.606 since Release 7 (2006), with ongoing enhancements through 2023 standards favoring push notifications and integrated indicators over DCS signaling.[11][12]Automatic Deletion Mechanisms
In the SMS Data Coding Scheme, the automatic deletion mechanism allows the short message (SM) originator to mark messages for automatic removal from the mobile equipment (ME) or universal subscriber identity module ((U)SIM) after the recipient has read them. This feature ensures temporary storage without permanent retention, enhancing privacy or storage management in mobile networks. It was introduced in 3GPP Release 4 via change request TP-000074 to support automatic removal of read SMS messages.[13] The mechanism is encoded in the TP-Data-Coding-Scheme (TP-DCS) field of the SMS transfer protocol data unit (TPDU), where bits 7-4 are set to 01xx (coding group for messages marked for automatic deletion), with bits 5-0 following the same coding as the general data coding group (00xx), defining elements like character set, data compression, and message class. This coding applies irrespective of the message class, ensuring consistent behavior across SMS types stored in the ME or (U)SIM. Upon delivery and reading, the receiving ME must delete the message without intervention from the end user or any targeted application, with the deletion process being manufacturer-specific. Mobile equipment manufacturers may optionally implement a user-accessible setting to prevent this automatic deletion, allowing recipients to retain flagged messages if needed. Unlike SMS, the Cell Broadcast Service (CBS) Data Coding Scheme does not include an equivalent automatic deletion group, as CBS messages are transient broadcasts without individual storage semantics.[13] The bit structure for the TP-DCS field in the automatic deletion group is as follows:| Bits | Function | Values |
|---|---|---|
| 7-4 | Coding Group | 01xx (Message marked for automatic deletion) |
| 5 | Data Compression | 0 = Uncompressed; 1 = Compressed (per 3GPP TS 23.042) |
| 4 | Message Class Indication | 0 = No message class; 1 = Message class present (bits 1-0 used) |
| 3-2 | Character Set / Indication | 00 = GSM 7-bit default alphabet; 01 = 8-bit data; 10 = UCS2 (16-bit); 11 = Reserved |
| 1-0 | Message Class (if bit 4=1) | 00 = Class 0 (immediate display); 01 = Class 1 (ME-specific); 10 = Class 2 ((U)SIM-specific); 11 = Class 3 (TE-specific) |