offsetof
The offsetof macro is a standard utility in the C programming language, defined in the <stddef.h> header, that expands to an integer constant expression of type size_t representing the byte offset of a specified structure or union member from the beginning of its enclosing aggregate type. Introduced in the ANSI X3.159-1989 standard (also known as C89), it provides a portable mechanism for low-level memory manipulation without relying on implementation-specific assumptions about structure layout.[1]
In subsequent revisions of the ISO/IEC 9899 C standard, including C99 (ISO/IEC 9899:1999) and later versions such as C11 (ISO/IEC 9899:2011) and C23 (ISO/IEC 9899:2024), the macro retains its core semantics: offsetof(type, member-designator) yields the offset for a complete structure or union type and a valid member-designator within it, assuming the aggregate begins at byte address zero. In C23, offsetof can also be used to define new aggregate types inline for computing offsets of their members. The behavior is undefined if applied to bit-fields or if the member-designator does not permit forming a valid address constant, such as &(t.member-designator) for a hypothetical static instance t of type. This ensures compile-time evaluation while accommodating padding, alignment, and flexible array members, though the exact offset may be implementation-defined due to varying architecture constraints.[2][3]
The macro is particularly valuable in systems programming for tasks like serialization, database record handling, and generic data access, where structure layouts must be navigated dynamically without embedding hardcoded offsets. For example, to allocate space for a structure with a variable-length array, one might use malloc(offsetof(struct example, var_array) + computed_size), where var_array is the flexible member.[1]
In C++, the offsetof macro is inherited from C via <cstddef> and mirrors the C semantics for standard-layout classes, but it is conditionally supported only for non-static data members of such classes, with undefined behavior otherwise to align with C++'s stricter type and layout rules.[4] Proposals in recent C++ standards discussions, such as C++26, have explored redefining it as a keyword for enhanced expressiveness, though it remains a macro in current drafts.[5]
Fundamentals
Definition
The [offsetof](/page/offsetof) macro is a standard library facility in both C and C++ used to determine the byte offset of a member within a structure or union. It is defined with the signature offsetof(type, member-designator), where type specifies a structure or union type, and member-designator identifies a particular member of that type. The macro expands to an integer constant expression of type size_t, representing the offset in bytes from the beginning of an instance of type to the specified member. In C, it is provided in the <stddef.h> header; in C++, it is available via the <cstddef> header.[2][6]
This offset calculation accounts for the layout of the structure or union as defined by the compiler, including any padding bytes inserted to satisfy alignment requirements. Structure padding refers to additional unused bytes added between members (or after the last member) to ensure that each member begins at a memory address that is a multiple of its alignment boundary, which is typically the size of the member's type or a power of two dictated by the hardware architecture. Alignment ensures efficient memory access and adherence to processor constraints, preventing issues like bus errors on misaligned reads. Without such padding, offsets computed by offsetof would not reflect the actual memory layout used at runtime.[2][6]
For instance, consider a simple structure containing a char member followed by an int member on a typical 32-bit or 64-bit system where int requires 4-byte alignment:
c
#include <stddef.h>
struct example {
char c; // 1 byte, offset 0
int i; // 4 bytes, but padded to start at offset 4
};
#include <stddef.h>
struct example {
char c; // 1 byte, offset 0
int i; // 4 bytes, but padded to start at offset 4
};
Here, offsetof(struct example, c) evaluates to 0, while offsetof(struct example, i) evaluates to 4, due to the 3 bytes of padding inserted after c to align i properly. This demonstrates how offsetof captures the effective layout, including padding, rather than the sum of member sizes alone (which would be 5 bytes).[2][6]
History and Standardization
The offsetof macro was introduced in the ANSI C standard (X3.159-1989), ratified in December 1989, as part of the <stddef.h> header to enable portable computation of the byte offset of a structure or union member from the beginning of the object.[7] This addition addressed the need in low-level programming for a standardized mechanism to perform offset calculations, avoiding reliance on implementation-defined pointer arithmetic or ad-hoc techniques that varied across compilers and systems.[7] Prior to formal standardization, similar offset macros existed in Unix environments, influencing its design and inclusion to support common practices in memory management and generic data handling.[8] The International Organization for Standardization (ISO) adopted this as part of the ISO/IEC 9899:1990 standard (commonly called C90), marking the first international ratification.[9]
Subsequent revisions to the C standard refined but did not fundamentally alter offsetof. In C99 (ISO/IEC 9899:1999), the specification was clarified for enhanced portability and type safety, and flexible array members were introduced as the last member of a struct with more than one named member, allowing an incomplete array type (e.g., int data[]); offsetof provides the well-defined offset to such a member, enabling portable dynamic allocation such as malloc(offsetof(struct s, data) + n * sizeof([int](/page/INT))), while maintaining flexible implementation options such as macro expansions or compiler intrinsics.[10] The C11 standard (ISO/IEC 9899:2011) introduced no notable changes to the macro, continuing its role as a compile-time constant expression of type size_t.[9] These evolutions emphasized reliability in environments requiring precise structure layout knowledge, such as embedded systems and system programming.
In C++, offsetof inherited its definition from C but with initial restrictions tied to language features. The C++98 standard (ISO/IEC 14882:1998) limited its use to plain old data (POD) types, where behavior was well-defined only for structures without virtual functions, user-defined constructors, or private/protected non-static data members. C++11 (ISO/IEC 14882:2011) extended support to standard-layout classes, broadening applicability to types meeting stricter layout guarantees while introducing the concept of standard-layout to formalize compatible memory models. By C++17 (ISO/IEC 14882:2017), usage became conditionally supported for non-static data members in non-standard-layout classes, allowing compilers to provide defined behavior or issue diagnostics, further aligning with evolving class layout rules.[11]
Implementation Details
Standard Implementation
The standard implementation of the offsetof macro in C relies on pointer arithmetic applied to a null pointer cast to the structure type, computing the byte offset of a member without requiring runtime memory access. This approach is defined in the C standard library header <stddef.h> as an integer constant expression of type size_t representing the offset in bytes from the beginning of the structure to the specified member.[12]
The classic portable form of the macro is #define offsetof(type, member) ((size_t)((char *)&((type *)0)->member - (char *)0)). Here, the null pointer (type *)0 serves as a fictional base address of zero for the structure instance. The member access ->member conceptually adds the offset to this base, and taking the address & yields a pointer to the member's location. Subtracting the base address (char *)0 then isolates the offset, with the char * casts ensuring byte-level arithmetic for portability across different member types. This technique works because the offset is invariant regardless of the actual base address, allowing compile-time evaluation.[13]
However, this implementation invokes undefined behavior, as it involves forming a pointer to a potentially invalid (null) address and accessing a member through it, which violates C standard rules on pointer dereferencing (even if no explicit dereference occurs). In practice, modern compilers treat this as a constant expression and optimize it away without generating code that accesses memory, mitigating the risks on typical platforms.[13]
An alternative formulation avoids the direct use of the null pointer by shifting the base address slightly: #define offsetof(type, member) ((size_t)((char *)&(((type *)1)->member) - (char *)1). This casts the integer 1 to a structure pointer, accesses the member (yielding address 1 + offset), and subtracts 1 to recover the offset. While still technically undefined behavior due to the invalid pointer, it sidesteps issues on architectures where null has special handling (e.g., trapped access), and compilers similarly optimize it to a constant.[13]
A self-contained implementation in C, suitable for inclusion in a header file before using the standard <stddef.h> version if needed, appears below. It includes the necessary header for size_t.
c
#include <stddef.h>
#define offsetof(type, member) \
((size_t)((const volatile char *)&(((type *)0)->member) - (const volatile char *)0))
#include <stddef.h>
#define offsetof(type, member) \
((size_t)((const volatile char *)&(((type *)0)->member) - (const volatile char *)0))
The const volatile qualifiers in the char casts further discourage any potential compiler-generated loads, reinforcing compile-time evaluation. Compiler optimizations, such as those in GCC and Clang, recognize the entire expression as a constant and replace it with the precomputed offset value during compilation, ensuring no runtime overhead or null dereference in the generated code.[13]
Compiler Extensions
Compilers often provide built-in functions or extensions to implement or extend the offsetof macro, enabling compile-time offset computation without relying on the potentially undefined behavior (UB) associated with pointer arithmetic in the standard macro definition. These extensions ensure defined behavior, support for non-standard-layout types in some cases, and optimization opportunities by treating offsets as constants.[14]
In GCC and Clang, the __builtin_offsetof function serves as the core implementation for offsetof, computing the byte offset of a member within a structure or union at compile time. This built-in avoids the (type*)0 cast and pointer dereference that can invoke UB, particularly in C++ for non-POD types, and supports dependent types in templates. It returns a size_t constant expression, allowing use in constexpr contexts without runtime evaluation. Clang mirrors GCC's behavior for compatibility, extending support to scenarios where the standard macro would fail.[14][15]
Microsoft Visual C++ (MSVC) integrates __builtin_offsetof as an intrinsic within its standard library's offsetof macro, providing compile-time evaluation tailored to Windows and x86/x64 architectures. This implementation handles alignment and padding natively, ensuring offsets account for platform-specific layout rules like those in the Microsoft ABI, and avoids UB by leveraging compiler knowledge of type layouts. It supports constant expressions in C++ and is optimized for Intel/AMD processors, reducing potential overhead in embedded or performance-critical code.[16]
Other compilers offer proprietary variants; for instance, IBM XL C uses __offsetof as a built-in extension that computes offsets while natively managing alignment, padding, and architecture-specific features on z/OS or AIX platforms. These vendor-specific builtins guarantee defined behavior across their ecosystems, often extending beyond standard requirements to include non-standard-layout aggregates.[17]
The primary advantages of these compiler extensions include compile-time evaluation, which eliminates runtime overhead and enables constant folding by optimizers; guaranteed defined behavior without UB risks from invalid pointer operations; and enhanced support for complex types like those with virtual bases or inheritance in C++. No special compilation flags are typically required, though enabling C++ standards like -std=[c++11](/page/C++11) or higher in GCC/Clang ensures constexpr compatibility.[14][16]
To replace the standard offsetof with a builtin, consider this C++ example using GCC/Clang or MSVC:
cpp
#include <cstddef> // For size_t
struct Example {
int a;
[char](/page/Char) b;
[double](/page/Double) c;
};
constexpr size_t offset_b = __builtin_offsetof(Example, b); // Compile-time constant: 4 (assuming typical padding)
static_assert(offset_b == 4, "Offset mismatch");
#include <cstddef> // For size_t
struct Example {
int a;
[char](/page/Char) b;
[double](/page/Double) c;
};
constexpr size_t offset_b = __builtin_offsetof(Example, b); // Compile-time constant: 4 (assuming typical padding)
static_assert(offset_b == 4, "Offset mismatch");
This computes the offset of b as a constexpr, avoiding any pointer arithmetic and allowing verification at compile time. In contrast, the standard macro might not be usable in constexpr without extensions. For IBM XL C, the equivalent uses __offsetof(Example, b).[14][16]
Usage Examples
Basic Offset Calculation
The offsetof macro is commonly used in simple C programs to determine the byte offset of a structure member from the beginning of the structure, facilitating tasks such as memory layout verification and data serialization.[18] As defined in the C standard, it expands to an integer constant expression of type size_t suitable for compile-time calculations.[18]
Consider a basic structure with members of varying types to illustrate offset computation and the impact of alignment padding. The following example defines a structure containing a char, an int, and a double, then uses offsetof to print their offsets. On typical 64-bit systems with 4-byte int alignment and 8-byte double alignment, the char resides at offset 0, the int at offset 4 (with 3 bytes of padding after the char to satisfy the int's alignment requirement), and the double at offset 8 (with no additional padding in this case).[19] This demonstrates how the C standard mandates that members are allocated in declaration order with padding inserted as needed to meet each member's alignment constraints, ensuring efficient access.[19]
c
#include <stddef.h>
#include <stdio.h>
struct example {
char c; // 1 byte
int i; // 4 bytes, typically aligned to 4-byte boundary
double d; // 8 bytes, typically aligned to 8-byte boundary
};
int main(void) {
printf("Offset of c: %zu\n", offsetof(struct example, c));
printf("Offset of i: %zu\n", offsetof(struct example, i));
printf("Offset of d: %zu\n", offsetof(struct example, d));
printf("Size of struct: %zu\n", sizeof(struct example));
return 0;
}
#include <stddef.h>
#include <stdio.h>
struct example {
char c; // 1 byte
int i; // 4 bytes, typically aligned to 4-byte boundary
double d; // 8 bytes, typically aligned to 8-byte boundary
};
int main(void) {
printf("Offset of c: %zu\n", offsetof(struct example, c));
printf("Offset of i: %zu\n", offsetof(struct example, i));
printf("Offset of d: %zu\n", offsetof(struct example, d));
printf("Size of struct: %zu\n", sizeof(struct example));
return 0;
}
When compiled and run on a system with the aforementioned alignments, the output might be:
Offset of c: 0
Offset of i: 4
Offset of d: 8
Size of struct: 16
Offset of c: 0
Offset of i: 4
Offset of d: 8
Size of struct: 16
Here, the total size of 16 bytes includes 3 bytes of padding after c, with the int and double placed adjacently and no padding between them or trailing in this case, confirming that offsets accumulate according to declaration order plus necessary padding, while sizeof accounts for trailing padding to align the entire structure.[19] This relationship allows developers to inspect layout rules programmatically, relating offsets directly to alignment multiples as per C's object representation requirements.[19]
In memory layout inspection or serialization scenarios, offsets from offsetof enable precise byte positioning for reading or writing structure data to/from streams. For instance, to serialize the int member, one could seek to the computed offset and write its 4 bytes, bypassing padding to optimize storage or network transmission.[18] This approach verifies that the structure's effective size matches the last offset plus the member's size (plus any trailing padding), providing insight into alignment without relying on platform-specific assumptions.[19]
For basic error handling, offsetof performs compile-time validation: using an invalid member designator (e.g., a non-existent field or bit-field) typically results in a compiler error, as the macro requires a complete structure type and valid member.[18] This ensures type safety in simple uses, preventing runtime issues from malformed expressions.
In Generic Programming
In generic programming, the offsetof macro plays a crucial role in enabling type-agnostic code by facilitating the recovery of enclosing structures from pointers to their members, particularly through the container_of idiom. This macro, commonly used in systems programming, is defined as container_of(ptr, type, member) = (type *)((char *)(ptr) - offsetof(type, member)), where ptr is a pointer to the member, type is the enclosing structure type, and member is the name of the embedded field. By subtracting the compile-time offset of the member from the member's address, it computes the base address of the container structure, allowing flexible traversal and manipulation without embedding type-specific pointers or duplicating code for each structure variant.[20]
In the Linux kernel, offsetof underpins container_of to support efficient list traversal and embedded structure recovery, promoting reusable, generic data structures across diverse kernel subsystems. For instance, the kernel's doubly-linked list implementation embeds a struct list_head within various types like device objects or process descriptors, and container_of (via the list_entry wrapper) recovers the full container during iteration, enabling type-safe navigation without hardcoding offsets or requiring inheritance-like mechanisms in C. This approach is evident in routines like list_for_each_entry, which iterates over a list head, applies container_of to each node pointer to retrieve the enclosing type, and processes the payload, thus avoiding boilerplate for each list-using structure.[21]
The technique extends to generic linked lists and queues in void-pointer contexts, where offsetof enables offset-based navigation for heterogeneous data without runtime type information. In such designs, a generic node structure holds void* data and next/prev pointers, while user code supplies offsets computed via offsetof to extract containers from node pointers, allowing a single list implementation to handle multiple payload types like tasks or buffers in embedded systems.[21]
In C++, offsetof supports template metaprogramming for trait-based offset computation, where templates deduce member offsets at compile time to build generic containers or serializers without runtime overhead.
A simple implementation of container_of and its usage in a doubly-linked list can be demonstrated as follows, adapted from kernel patterns:
c
#include <stddef.h> // For offsetof
#define container_of(ptr, type, member) ({ \
const typeof(((type *)0)->member) *__mptr = (ptr); \
(type *)((char *)__mptr - offsetof(type, member)); \
})
// Generic doubly-linked list node
struct list_head {
struct list_head *next, *prev;
};
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define INIT_LIST_HEAD(ptr) do { \
(ptr)->next = (ptr); (ptr)->prev = (ptr); \
} while (0)
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
next->prev = new;
new->next = next;
new->prev = prev;
prev->next = new;
}
static inline void list_add(struct list_head *new, struct list_head *head)
{
__list_add(new, head, head->next);
}
// Example container structure
struct example {
int id;
struct list_head list;
};
// List head
struct example_list {
struct list_head head;
};
// Initialize and add
void init_example(struct example *ex, int id) {
ex->id = id;
INIT_LIST_HEAD(&ex->list);
}
void add_to_list(struct example_list *elist, struct example *ex) {
list_add(&ex->list, &elist->head);
}
// Traversal using container_of
void traverse_list(struct example_list *elist) {
struct list_head *pos;
for (pos = elist->head.next; pos != &elist->head; pos = pos->next) {
struct example *ex = container_of(pos, struct example, list);
// Process ex->id
}
}
#include <stddef.h> // For offsetof
#define container_of(ptr, type, member) ({ \
const typeof(((type *)0)->member) *__mptr = (ptr); \
(type *)((char *)__mptr - offsetof(type, member)); \
})
// Generic doubly-linked list node
struct list_head {
struct list_head *next, *prev;
};
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define INIT_LIST_HEAD(ptr) do { \
(ptr)->next = (ptr); (ptr)->prev = (ptr); \
} while (0)
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
next->prev = new;
new->next = next;
new->prev = prev;
prev->next = new;
}
static inline void list_add(struct list_head *new, struct list_head *head)
{
__list_add(new, head, head->next);
}
// Example container structure
struct example {
int id;
struct list_head list;
};
// List head
struct example_list {
struct list_head head;
};
// Initialize and add
void init_example(struct example *ex, int id) {
ex->id = id;
INIT_LIST_HEAD(&ex->list);
}
void add_to_list(struct example_list *elist, struct example *ex) {
list_add(&ex->list, &elist->head);
}
// Traversal using container_of
void traverse_list(struct example_list *elist) {
struct list_head *pos;
for (pos = elist->head.next; pos != &elist->head; pos = pos->next) {
struct example *ex = container_of(pos, struct example, list);
// Process ex->id
}
}
This code defines a minimal doubly-linked list with container_of for recovering the struct example from list nodes during traversal, illustrating offset-based genericity in practice.[21]
Limitations
Type and Layout Constraints
The offsetof macro imposes strict requirements on the types and members to which it can be applied, ensuring well-defined behavior in both C and C++. In C, the type argument must be a complete object type that is either a structure or a union, and the member designator must refer to a non-bit-field member; applying offsetof outside these constraints results in undefined behavior. These rules are specified in the C11 standard, section 7.19.3, which defines offsetof as expanding to a constant expression of type size_t representing the byte offset from the beginning of the object to the specified member, assuming compliance with the type constraints.
In C++, the constraints evolved across standards to align with language features while maintaining compatibility with C. Prior to C++11 (specifically in C++98), the type must be a plain old data (POD) type, which includes structures and unions without virtual functions, non-trivial constructors, or other non-POD members that could affect layout predictability. Starting with C++11, the requirement shifted to standard-layout types, broadening applicability slightly but still excluding classes with virtual bases, virtual functions, or user-provided constructors that influence member layout; the member must be a non-static data member, and use on bit-fields or static members yields undefined behavior. This is detailed in C++11 section 18.2, under the <cstddef> header, emphasizing that offsets are computed assuming a standard layout without implementation-defined padding variations beyond standard rules.
These type and layout constraints stem from the need to guarantee that member offsets are fixed and computable at compile time, independent of object instantiation. For instance, a simple structure like struct { int x; char y; }; qualifies as a valid standard-layout type in C++11, allowing offsetof to reliably compute the offset of y as 4 bytes (assuming typical alignment). In contrast, a class such as class C { virtual void f(); int x; }; violates the standard-layout requirement due to the virtual function, making offsetof(C, x) undefined behavior, as the vtable pointer alters the layout unpredictably. Similarly, multiple inheritance or non-standard padding (e.g., from non-trivial base classes) can introduce complications, rendering offsets non-deterministic and outside the macro's guarantees.
Portability Issues
The portability of the offsetof macro is significantly influenced by variations in architecture-specific alignment rules, which determine how padding is inserted between structure members to satisfy hardware requirements. For instance, on x86 architectures, fundamental types like int are typically aligned to 4-byte boundaries, while on ARM or PowerPC systems, stricter alignments (e.g., 8 bytes for double) may introduce additional padding, altering the byte offsets computed by offsetof for subsequent members. [22] These differences arise because compilers must adhere to the target platform's natural alignment to avoid performance penalties or hardware faults, resulting in non-uniform structure layouts across big-endian and little-endian systems where padding decisions are independent of byte order but tied to register widths and bus constraints. [22]
Compiler implementations exhibit variances that can lead to warnings, errors, or incorrect results when using offsetof, particularly in older versions lacking full support for modern standards. For example, pre-Visual Studio 2017 releases of MSVC did not fully diagnose invalid uses of offsetof on non-standard-layout classes, potentially leading to undefined behavior for types involving inheritance or virtual functions, whereas GCC and Clang provided earlier compliance but issued warnings for non-standard-layout usage. [16] Edge cases, such as applying offsetof to unions or bit-fields, may trigger diagnostic messages or undefined results in strict conformance modes across compilers, emphasizing the need for conditional compilation to handle these discrepancies. [16]
Although the C and C++ standards define offsetof to yield a valid constant for compliant types, common implementations—such as ((size_t)&(((TYPE *)0)->MEMBER))—technically invoke undefined behavior by dereferencing a null pointer, which can manifest as crashes on certain systems or under analysis tools. In practice, optimizing compilers elide the dereference to produce the correct offset at compile time, but enabling undefined behavior sanitizers (e.g., AddressSanitizer in GCC/Clang) may insert trap instructions like ud2, causing runtime failures even in constant expressions. [23] This implementation artifact has led to portability pitfalls in embedded or safety-critical environments where strict aliasing or memory protection enforces null dereference traps, potentially halting execution on architectures with hardware-enforced null page isolation. [23]
To mitigate these issues, cross-platform verification of offsetof results is essential using tools that inspect compiled object files for actual layout details. On Linux/ELF systems, the pahole utility from the DWARF debugging toolset analyzes structure padding and offsets by parsing debug information, allowing developers to confirm alignment-induced variances without runtime execution. [24] Complementarily, objdump can disassemble binaries to reveal symbol offsets and section alignments, aiding in comparisons across architectures like x86 and ARM, though it requires manual interpretation for complex structures. [25] Such testing workflows ensure consistency in multi-target builds, revealing discrepancies that static analysis might overlook.
Prior to the C89 standard's introduction of offsetof in <stddef.h>, developers relied on non-portable techniques to compute member offsets, often resulting in architecture-specific inconsistencies. Pointer arithmetic tricks, such as casting zero to a structure pointer and offsetting to a member address, mimicked the later standard implementation but invoked undefined behavior without standardization, leading to crashes or incorrect values on systems with protected null pages or differing pointer sizes. [7] These pre-C89 methods proliferated in legacy codebases, complicating migrations to standardized environments due to their dependence on compiler-specific behaviors and lack of cross-platform guarantees.