Executable and Linkable Format
The Executable and Linkable Format (ELF) is a standard file format for executables, relocatable object files, shared object libraries, and core dumps, primarily used on Unix-like operating systems to define the structure and organization of binary data for loading and linking.[1] Developed in the early 1990s by Unix System Laboratories (a subsidiary of AT&T) and Sun Microsystems as part of System V Release 4 (SVR4), ELF first appeared in Solaris 2.0 and has since become the de facto standard for many open-source and commercial Unix variants, including Linux, BSD, and Solaris.[2] Its design emphasizes flexibility, portability across architectures, and support for dynamic linking, replacing earlier formats like a.out and COFF to streamline software development and execution.[1] At its core, an ELF file begins with a fixed ELF header that provides essential metadata, including the magic bytes (0x7F 'E' 'L' 'F'), the file's class (32-bit or 64-bit), data encoding (little-endian or big-endian), the target architecture (e.g., x86, ARM), and the file type (relocatable, executable, shared object, or core).[3] Following the header, ELF files may include a program header table—an array of entries describing loadable segments for process image creation, such as code, data, and dynamic linking information—and a section header table that details smaller, linkable sections like .text (executable code), .data (initialized variables), .bss (uninitialized data), and .symtab (symbol table).[2] This dual structure allows ELF to serve both runtime loading (via program headers) and static linking/relocation (via section headers), enabling efficient memory mapping and shared library usage without redundant code duplication.[1]
ELF's extensibility supports versioning for symbols, relocation entries for address resolution, and notes sections for auxiliary information like debugging data or operating system-specific details, making it adaptable to modern features like position-independent code (PIC) and multi-architecture binaries.[1] Widely implemented in toolchains such as GCC and binutils, ELF facilitates cross-compilation and has influenced formats in non-Unix environments, underscoring its role as a foundational element in software portability and system reliability.[2]
History and Development
Origins and 86open Principles
The Executable and Linkable Format (ELF) originated in the late 1980s when Unix System Laboratories (USL) developed it as part of System V Release 4 (SVR4) to supersede the limitations of the earlier a.out format, providing a more flexible structure for executables, object files, and shared libraries.[1][4] USL collaborated with Sun Microsystems, incorporating elements of Sun's dynamic shared library system from SunOS 4.x (introduced in 1988), which enabled runtime linking of libraries to reduce executable sizes and improve modularity.[4] ELF was first specified in the SVR4 Application Binary Interface (ABI), with initial implementations appearing in SVR4-based systems such as Solaris 2.0, released in 1992.[1][4] Shortly thereafter, Sun Microsystems adopted ELF in Solaris 2.0 (also known as SunOS 5.0), marking one of the earliest widespread deployments and demonstrating its compatibility with SVR4-based environments.[4] Other Unix variants followed suit in the early 1990s, as the format's design facilitated portability across x86 systems without requiring OS-specific modifications.[1] In response to growing fragmentation among proprietary executable formats on x86 Unix platforms, the 86open project was founded in 1997 by a consortium including the Santa Cruz Operation (SCO) and Linux vendors, to establish a unified standard that would allow binaries to run seamlessly across diverse Unix implementations. The project focused on consensus-building for a common ABI, ultimately endorsing ELF as the solution before concluding in 1999.[4][5][6] The core principles of 86open emphasized a simple yet extensible file structure to accommodate future enhancements without breaking compatibility, robust support for dynamic linking to enable shared libraries, generation of position-independent code (PIC) for relocatable executables, and deliberate exclusion of OS-specific dependencies to ensure broad interoperability across Unix variants.[4] These guidelines addressed the proprietary silos of the era, promoting ELF's adoption as a vendor-neutral format that prioritized efficiency and cross-platform usability.[5]Standardization and Evolution
The formal standardization of the Executable and Linkable Format (ELF) was led by the Tool Interface Standard (TIS) committee, a consortium of industry leaders formed in 1993 to define portable formats for Unix-like systems. The TIS adopted ELF, originally developed for System V Release 4, as the standard object file format and published version 1.1 of the Portable Formats Specification in October 1993, extracting and refining ELF details from the System V Application Binary Interface.[7] This effort culminated in the TIS ELF Specification version 1.2 in May 1995, which incorporated minor fixes, clarifications, and extensions for broader portability across 32-bit architectures.[1] Subsequent evolution included the Generic Application Binary Interface (gABI), released in March 1997 as part of the System V ABI edition 4.1, which defined processor-independent ELF conventions to promote cross-distribution compatibility in Linux and other environments.[8] The gABI built on TIS foundations by specifying common ELF usage, such as dynamic linking and symbol resolution, while allowing processor supplements for specific architectures. System V ABI extensions further refined ELF for operating system interfaces, including process initialization and function calling sequences. Support for 64-bit architectures emerged in the mid-1990s, with ELFCLASS64 defined to accommodate larger address spaces; an initial ELF-64 specification was developed for the Alpha processor around 1995, enabling 64-bit object files on Digital Unix. This was extended to x86-64 in the early 2000s through processor-specific ABIs, maintaining backward compatibility with 32-bit ELF while supporting extended data types and relocations. In the 2000s, ELF evolved with security enhancements, notably the introduction of RELRO (Relocation Read-Only) as a linker option in GNU ld around 2007, which marks relocation sections as read-only after processing to prevent runtime tampering. Partial and full RELRO modes balanced performance and protection, becoming standard in distributions for mitigating exploits targeting global offset tables. Adaptations for modern hardware continued into the 2010s, with formal ARM ELF specifications published in 1999 to support embedded and mobile processors.[9] For RISC-V, ELF support was integrated into toolchains starting in the mid-2010s, with the processor-specific ABI specification finalized in 2021 to enable open-source implementations across microcontrollers and servers. In 2025, Xinuos published version 4.2 of the ELF specification and released a draft of version 4.3 for public review, formalizing updates such as separating the ELF spec from the gABI.[10]File Format Specifications
Overall Structure and Layout
The Executable and Linkable Format (ELF) organizes files in a hierarchical structure that begins with a fixed-size ELF header containing metadata about the file's layout and type. This header is immediately followed by zero or more program headers, which describe loadable segments for runtime execution, and the actual content of those segments or sections. The file concludes with a section header table that catalogs all sections, such as code, data, and debugging information, enabling link-time processing; the sections themselves occupy space between the program headers and the section header table.[1] ELF supports distinct file types tailored to different stages of software development and use. Executable files rely on program headers to define loadable segments that the operating system maps directly into memory for execution. Relocatable object files emphasize sections as the primary units, facilitating combination with other objects during linking to produce executables or shared libraries. Shared libraries incorporate dynamic linking mechanisms, including dedicated sections for symbol tables and relocation entries to resolve references at load time or runtime. Core dump files preserve process state, encompassing memory segments, thread information, and register values for post-mortem analysis.[1] ELF files vary by class and data encoding to accommodate diverse hardware. The 32-bit class uses 32-bit addresses and types suitable for traditional systems, while the 64-bit class employs 64-bit addressing for larger memory spaces and modern architectures. Data encoding supports little-endian byte order for processors like x86 or big-endian for others like some PowerPC variants. Identification begins with the magic bytes 0x7F 'E' 'L' 'F' in the file's initial bytes, distinguishing ELF from other formats. Common file types include ET_EXEC for standalone executables, ET_DYN for position-independent code like shared libraries, and ET_REL for relocatable objects awaiting linking.[1] Program headers and section headers provide complementary perspectives on the file: program headers offer a runtime-oriented view by grouping sections into coarse-grained segments optimized for efficient loading and execution by the dynamic linker, whereas section headers deliver a fine-grained, link-time view that permits the static linker to manipulate individual sections for tasks like relocation and symbol merging. This separation enhances modularity, allowing tools to operate on either view as needed without redundancy.[1]ELF Header Details
The Executable and Linkable Format (ELF) begins with a fixed-size header that provides essential metadata for interpreting the file, ensuring compatibility across different systems and architectures. This header is located at offset zero and contains an array of identification bytes followed by core structural fields, allowing parsers to validate the file format, determine its class (32-bit or 64-bit), encoding, and other attributes before processing the rest of the file. The header's design promotes portability by standardizing field positions and sizes, with variations only for bit width to accommodate different processor architectures. The header's size is 52 bytes for 32-bit ELF files and 64 bytes for 64-bit files, reflecting the use of 4-byte or 8-byte addressing for certain fields. The initial 16 bytes form the e_ident array, which serves as the file's "magic number" and configuration descriptor. Specifically, bytes 0-3 (EI_MAG) must contain the hexadecimal values 0x7f, 'E', 'L', 'F' to identify an ELF file; byte 4 (EI_CLASS) specifies the file class as 1 for 32-bit or 2 for 64-bit; byte 5 (EI_DATA) indicates data encoding as 1 for little-endian or 2 for big-endian; byte 6 (EI_VERSION) is always set to 1 for the current ELF version; byte 7 (EI_OSABI) denotes the operating system/ABI target, such as 0 for System V or 3 for Linux; and byte 8 (EI_ABIVERSION) provides the ABI version number, with bytes 9-15 reserved for padding (EI_PAD) initialized to zero. This array enables immediate format validation, as any mismatch (e.g., incorrect magic bytes) signals an invalid ELF file, preventing erroneous parsing.[11] Following e_ident, the header includes several core fields that describe the file's type, target machine, and layout pointers, all encoded in native byte order as determined by EI_DATA. The e_type field (2 bytes) classifies the file as ET_NONE (0, no file type), ET_REL (1, relocatable), ET_EXEC (2, executable), ET_DYN (3, shared object), or ET_CORE (4, core dump). The e_machine field (2 bytes) identifies the target architecture, such as EM_386 (3) for Intel 80386 or EM_X86_64 (62) for AMD x86-64. The e_version field (4 bytes) is fixed at 1, matching EI_VERSION for consistency. The e_entry field (4 or 8 bytes, depending on class) holds the virtual address of the program's entry point. Layout offsets are provided by e_phoff (4 or 8 bytes) for the program header table position and e_shoff (4 or 8 bytes) for the section header table position, both relative to the file start. The e_flags field (4 bytes) carries processor-specific flags, such as 0x00000001 for x86-64 code model adjustments. Header metadata includes e_ehsize (2 bytes) indicating the header's own size (52 or 64); e_phentsize (2 bytes) and e_phnum (2 bytes) for program header entry size (typically 32 or 56 bytes) and count; and e_shentsize (2 bytes), e_shnum (2 bytes), and e_shstrndx (2 bytes) detailing section header entry size (40 or 64 bytes), total count, and index of the string table section for section names. These fields collectively guide the loader or linker in navigating the file without prior knowledge of its internal structure.[11] To maintain alignment and portability, ELF headers adhere to strict padding and byte-order rules: the e_ident padding bytes are always zero, and all multi-byte fields (e.g., addresses and offsets) are stored in the endianness specified by EI_DATA, with natural alignment for 32-bit (4-byte) and 64-bit (8-byte) variants to avoid unaligned access issues on target architectures. This structure facilitates robust parsing by allowing tools to first verify the header's integrity—through magic checks, version consistency, and size validations—before advancing to variable components, thereby minimizing errors in cross-platform or multi-architecture environments. For instance, a mismatch in e_ehsize would indicate a corrupted or non-standard file, prompting immediate rejection.[11]| Field | Offset (32-bit) | Size (32-bit) | Type | Description |
|---|---|---|---|---|
| e_ident | 0 | 16 bytes | Array | Identification bytes for magic, class, data, version, OS/ABI, ABI version, and padding. |
| e_type | 16 | 2 bytes | Elf32_Half | File type (e.g., executable, shared object). |
| e_machine | 18 | 2 bytes | Elf32_Half | Target architecture (e.g., EM_386). |
| e_version | 20 | 4 bytes | Elf32_Word | Object file version (always 1). |
| e_entry | 24 | 4 bytes | Elf32_Addr | Entry point virtual address. |
| e_phoff | 28 | 4 bytes | Elf32_Off | Program header table offset. |
| e_shoff | 32 | 4 bytes | Elf32_Off | Section header table offset. |
| e_flags | 36 | 4 bytes | Elf32_Word | Processor-specific flags. |
| e_ehsize | 40 | 2 bytes | Elf32_Half | ELF header size in bytes. |
| e_phentsize | 42 | 2 bytes | Elf32_Half | Program header entry size. |
| e_phnum | 44 | 2 bytes | Elf32_Half | Number of program header entries. |
| e_shentsize | 46 | 2 bytes | Elf32_Half | Section header entry size. |
| e_shnum | 48 | 2 bytes | Elf32_Half | Number of section header entries. |
| e_shstrndx | 50 | 2 bytes | Elf32_Half | Index of section name string table. |
Program Headers and Segments
The program header table in an ELF file is an array of program header entries that describe the layout of loadable segments for creating a process image during execution. This table is optional but required for executable and shared object files; relocatable object files typically omit it. The ELF header references the table using thee_phoff field for its file offset and e_phnum for the number of entries. Each entry is a fixed-size structure: 32 bytes for 32-bit ELF files (Elf32_Phdr) and 56 bytes for 64-bit files (Elf64_Phdr). The table enables the operating system loader to map segments directly into memory without relying on section-level details, facilitating efficient runtime loading.[1]
Each program header entry contains fields that specify the segment's type, location, size, and attributes. The structure for a 32-bit ELF (Elf32_Phdr) is defined as follows:
| Field | Type | Description |
|---|---|---|
| p_type | Elf32_Word | Specifies the segment type, indicating how the entry should be interpreted (e.g., loadable segment or auxiliary information).[1] |
| p_offset | Elf32_Off | File offset where the segment begins, in bytes from the start of the file. For loadable segments, this must align with p_align; for non-loadable, it points to in-file data.[1] |
| p_vaddr | Elf32_Addr | Virtual address where the segment should be loaded in memory. Loadable segments start at this address after mapping.[1] |
| p_paddr | Elf32_Addr | Physical address for the segment, typically used in embedded systems or kernels; ignored by most user-space loaders.[1] |
| p_filesz | Elf32_Word | Size of the segment in the file, in bytes; for loadable segments, this is the portion copied from the file to memory.[1] |
| p_memsz | Elf32_Word | Size of the segment in memory, in bytes; may exceed p_filesz for segments requiring zero-initialization (e.g., BSS-like areas).[1] |
| p_flags | Elf32_Word | Access permissions: PF_R (read, bit 0), PF_W (write, bit 1), PF_X (execute, bit 2). These map to memory protection settings like read-only or executable.[1] |
| p_align | Elf32_Word | Alignment constraint: the segment's file offset and virtual address must be multiples of this value (e.g., 0x1000 for page alignment). Set to 0 or 1 for non-loadable segments.[1] |
p_type field defines the segment's purpose, with standard values outlined in the ELF specification. Common types include PT_NULL (0, unused entry), PT_LOAD (1, loadable segment for code or data), PT_DYNAMIC (2, dynamic linking information like symbol tables), PT_INTERP (3, path to the program interpreter, e.g., "/lib/ld-linux.so.2"), PT_NOTE (4, auxiliary notes such as build IDs or core dump metadata), and PT_GNU_STACK (0x6474e551, GNU extension for stack attributes, including executable permission for security hardening). Other types like PT_TLS (7, thread-local storage) and processor-specific extensions may appear depending on the platform. Loadable segments (PT_LOAD) are the core of execution, typically dividing the image into read-execute (text) and read-write (data) portions.[1][12]
In dynamic loading, the program header table guides the loader (such as ld.so on Linux) to construct the process address space. The loader reads the table, maps PT_LOAD segments into virtual memory at their p_vaddr with appropriate p_flags protections, initializes extra memory for p_memsz > p_filesz, and processes auxiliary segments like PT_DYNAMIC for relocation and symbol resolution or PT_INTERP to invoke the dynamic linker itself. This segment-based approach allows efficient loading without parsing finer-grained sections, supporting position-independent code in shared libraries. For instance, the initial process image combines segments from the executable and interpreter, with the loader applying relocations post-mapping.[1]
Section Headers and Contents
The section header table in an ELF file is an array of section header entries that describe the layout and attributes of each section within the object file, enabling tools like linkers and debuggers to interpret the file's contents.[13] Each entry is a fixed-size structure—40 bytes for 32-bit ELF (Elf32_Shdr) and 64 bytes for 64-bit ELF (Elf64_Shdr)—and the ELF header provides pointers to this table via the e_shoff (offset to the table), e_shnum (number of entries), and e_shentsize (size of each entry) fields.[11] The table typically appears at the end of the file, and section names are stored as indices into a dedicated string table section, often named .shstrtab.[13] The Elf32_Shdr structure consists of the following fields:| Field | Type | Size (bytes) | Description |
|---|---|---|---|
| sh_name | Elf32_Word | 4 | An index into the section header string table section (.shstrtab), giving the name of this section as a null-terminated string.[11] |
| sh_type | Elf32_Word | 4 | A value specifying the type of section, such as program data or symbol table (see section types below).[13] |
| sh_flags | Elf32_Word | 4 | Section flags, bitmasks indicating attributes like whether the section is allocatable (SHF_ALLOC), writable (SHF_WRITE), executable (SHF_EXECINSTR), or occupies no space in the file (SHF_MASKOS for OS-specific).[11] |
| sh_addr | Elf32_Addr | 4 | The virtual address at which the section should reside in memory, if applicable (0 if not relevant).[13] |
| sh_offset | Elf32_Off | 4 | The offset in bytes from the beginning of the file to the first byte of the section.[11] |
| sh_size | Elf32_Word | 4 | The size in bytes of the section, or 0 if the section occupies no space (e.g., .bss).[13] |
| sh_link | Elf32_Word | 4 | An index into the section header table for a related section, such as the associated string table for symbol tables (interpretation depends on sh_type).[11] |
| sh_info | Elf32_Word | 4 | Extra information, often an index into another section or table, with meaning varying by sh_type (e.g., target section for relocations).[13] |
| sh_addralign | Elf32_Word | 4 | The alignment requirement for the section in memory, expressed as a power of 2 (0 or 1 means no alignment).[11] |
| sh_entsize | Elf32_Word | 4 | The size in bytes of each entry if the section holds a table of fixed-size entries (e.g., symbols); 0 otherwise.[13] |
Example Hexdump and Parsing
To illustrate the practical structure of an ELF file, consider a minimal 32-bit executable for x86 architecture that prints "Hello world" and exits, compiled for Linux systems. Such files begin with the ELF identification bytes, followed by the ELF header and program headers, as defined in the official ELF specification. This example is 116 bytes total, with no section headers (e_shoff=0) and a single program header.[1][15] The following hexdump shows the full content of this minimal 32-bit ELF executable (little-endian byte order). The magic bytes (0x7F 'E' 'L' 'F') confirm the file class (32-bit), data encoding (little-endian), and version.[1]In this hexdump, bytes 0x00-0x0F form the e_ident array: 0x7F454C46 (magic number), 0x01 (EI_CLASS for 32-bit), 0x01 (EI_DATA for little-endian), 0x01 (EI_VERSION), 0x00 (EI_OSABI for System V), and padding zeros. Bytes 0x10-0x11 hold e_type = 0x0002 (ET_EXEC for executable), and 0x12-0x13 hold e_machine = 0x0003 (EM_386 for Intel 80386). The e_version at 0x14-0x17 is 0x00000001. The e_entry at 0x18-0x1B is 0x08048054 (entry point virtual address). e_phoff at 0x1C-0x1F is 0x00000034 (program header offset at byte 52). e_shoff at 0x20-0x23 is 0x00000000 (no section headers). e_flags at 0x24-0x27 is 0x00000000. e_ehsize at 0x28-0x29 is 0x0034 (52 bytes). e_phentsize at 0x2A-0x2B is 0x0020 (32 bytes per program header). e_phnum at 0x2C-0x2D is 0x0001 (one entry). e_shentsize at 0x2E-0x2F is 0x0000, e_shnum at 0x30-0x31 is 0x0000, e_shstrndx at 0x32-0x33 is 0x0000 (no sections).[1] Parsing proceeds sequentially from the file start. First, validate the magic bytes at offset 0 to ensure ELF format and extract e_ident for architecture details: class (32-bit vs. 64-bit determines header size and field widths), data (little-endian requires byte reversal on big-endian hosts), and OS/ABI for compatibility. Next, read e_type to confirm it's an executable (ET_EXEC = 2); e_machine specifies the target ISA (e.g., 3 for x86). The e_entry provides the virtual address to jump to after loading (0x08048054). Since e_shoff=0, there are no sections to parse. Program headers begin at e_phoff=0x34 (byte 52): the single PT_LOAD entry (p_type=1 at bytes 0x34-0x37) describes a loadable segment with p_offset=0x00000000 (file offset), p_vaddr=0x08048000 (virtual address, page-aligned), p_paddr=0x08048000, p_filesz=0x00000074 (116 bytes from file), p_memsz=0x00000074 (116 bytes in memory), p_flags=0x00000005 (read and execute). p_align=0x00001000 (4 KB page). This segment encompasses the entire file, including the code starting at file offset 0x54 (virtual 0x08048054): it performs sys_write (eax=4, ebx=1 for stdout, ecx=0x08048069 for string buffer, edx=11 for length) via int 0x80, then sys_exit (eax=1) via int 0x80, printing "Hello world" before terminating. The string resides at file offset 0x69 (virtual 0x08048069).[1][15] Common pitfalls in parsing include ignoring endianness from EI_DATA, leading to swapped multi-byte fields (e.g., misreading e_type as 0x0001 instead of 2 on big-endian systems), or assuming 64-bit structure for 32-bit files, which alters field sizes and offsets (64-bit headers are 64 bytes with 8-byte fields). Additionally, 32-bit vs. 64-bit differences affect alignment: 32-bit uses 4-byte words, while 64-bit uses 8-byte, potentially causing buffer overflows in parsers. In this minimal example, the absence of sections simplifies linking but limits debuggability; the self-contained loadable segment demonstrates runtime execution without separate link-time sections.[1]00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| 00000010 02 00 03 00 01 00 00 00 54 80 04 08 34 00 00 00 |........T...4...| 00000020 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 |........4. .....| 00000030 00 00 00 00 01 00 00 00 00 00 00 00 00 80 04 08 |................| 00000040 00 80 04 08 74 00 00 00 74 00 00 00 05 00 00 00 |....t...t.......| 00000050 00 10 00 00 b0 04 31 db 43 b9 69 80 04 08 31 d2 |......1.C.i...1.| 00000060 b2 0b cd 80 31 c0 40 cd 80 48 65 6c 6c 6f 20 77 |[email protected] w| 00000070 6f 72 6c 64 |orld |00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| 00000010 02 00 03 00 01 00 00 00 54 80 04 08 34 00 00 00 |........T...4...| 00000020 00 00 00 00 00 00 00 00 34 00 20 00 01 00 00 00 |........4. .....| 00000030 00 00 00 00 01 00 00 00 00 00 00 00 00 80 04 08 |................| 00000040 00 80 04 08 74 00 00 00 74 00 00 00 05 00 00 00 |....t...t.......| 00000050 00 10 00 00 b0 04 31 db 43 b9 69 80 04 08 31 d2 |......1.C.i...1.| 00000060 b2 0b cd 80 31 c0 40 cd 80 48 65 6c 6c 6f 20 77 |[email protected] w| 00000070 6f 72 6c 64 |orld |
Tools and Utilities
Core Manipulation Tools
The GNU Binutils suite comprises a set of command-line utilities essential for creating, modifying, and inspecting Executable and Linkable Format (ELF) files in software development workflows. Developed and maintained by the GNU Project, these tools facilitate the manipulation of ELF object files, executables, and libraries by handling linking, section copying, disassembly, and archiving operations.[16] The GNU linker, known as ld, serves as the primary tool for combining multiple ELF object files (.o) and libraries into a single executable program or shared library. It resolves symbols, applies relocations, and generates the final ELF binary by processing input sections and program headers as defined in the ELF specification. For instance, the commandld -o output.elf input1.o input2.o -lc links two object files with the standard C library to produce an executable. To create a shared library, the option -shared is used, as in ld -shared -o libexample.so input.o, which produces a position-independent ELF file suitable for dynamic loading.
objcopy enables the copying and transformation of ELF files, allowing developers to manipulate sections, such as removing debugging symbols or converting formats. It can extract specific sections, adjust headers, or strip unnecessary data to reduce file size. A common usage is stripping all symbols with objcopy --strip-all input.elf -o output.elf, which removes symbol tables and debug information while preserving the executable's functionality, aiding in production builds. Other options include --only-section=.text to copy just the code section.
For disassembly and inspection, objdump disassembles ELF sections to reveal machine code in assembler mnemonics, alongside headers and symbol tables. It is particularly useful for verifying the output of compilation and linking steps. Key flags include -d to disassemble executable sections, -h to display section headers, and -t for the symbol table. An example command, objdump -d input.elf, outputs the disassembled instructions from code sections, helping developers analyze binary structure. When combined with -S, it intermixes source code if available, as in objdump -S -d input.elf.[17]
The readelf utility provides detailed parsing of ELF file structures, displaying headers, program headers, section headers, symbols, and relocations without disassembly. It offers flags tailored to specific components, such as -h for the ELF header, -S for section headers, -l for program headers, and -s for the symbol table. For example, readelf -h input.elf prints the main ELF header fields like magic number and architecture, while readelf -S input.elf lists all sections with their sizes and attributes. This tool is invaluable for verifying format compliance during development.
Finally, ar functions as an archiver for creating static libraries in .a format, which bundle multiple ELF object files for use in linking. It maintains file metadata like timestamps and permissions within the archive. Common operations include creating an archive with ar rcs libexample.a obj1.o obj2.o, where r inserts files, c suppresses prompts, and s generates a symbol index for efficient linking. The --record-libdeps option can track inter-library dependencies. Thin archives, supported in modern versions, reference external ELF files instead of embedding them, optimizing build processes.[18]
Analysis and Debugging Tools
The GNU Debugger (GDB) serves as a foundational tool for runtime debugging of ELF executables on Unix-like systems, enabling developers to inspect program execution, set breakpoints within specific ELF sections such as .text or .data, and load ELF-formatted core dump files to analyze crash states. GDB examines dynamic symbols from the ELF's .dynsym section and supports symbol resolution through debugging information embedded in ELF files, facilitating step-by-step execution tracing and variable inspection during runtime.[19] This integration with ELF structures allows precise control over loaded segments and addresses, making it essential for diagnosing issues in dynamically linked binaries.[20] Strace is a diagnostic utility that traces system calls and signals made by ELF executables running on Linux, providing visibility into kernel interactions without requiring source code modifications.[21] By attaching to an ELF process via the ptrace mechanism, strace logs calls such as open, read, and mmap related to ELF loading and execution, helping identify I/O bottlenecks or permission errors in runtime behavior.[22] It supports filtering by syscall type or file paths, such as those involving ELF shared libraries, to focus analysis on specific aspects of program flow.[21] Complementing strace, ltrace traces dynamic library calls and signals in ELF binaries, intercepting invocations to functions in shared objects like libc.so during execution.[23] This tool records entry and return points for library APIs, revealing how ELF programs interact with dynamically loaded code and aiding in the diagnosis of linkage or API misuse issues.[24] Ltrace also captures associated system calls when invoked with appropriate flags, offering a layered view of runtime dependencies beyond kernel boundaries.[25] The ldd command, part of the GNU C Library (glibc), lists the shared library dependencies of an ELF executable or library by simulating the dynamic linker's resolution process.[26] It parses the ELF's program headers and dynamic section to output required .so files, their memory addresses, and any unresolved symbols, which is crucial for verifying linkage integrity before deployment. For example, running ldd on a binary reveals paths to libraries like libm.so.6, highlighting potential portability issues across systems.[27] Security auditing tools like checksec evaluate ELF binaries for protective features, checking attributes such as RELRO (Relocation Read-Only) to prevent GOT overwrites, NX (No eXecute) stack to block code execution in data areas, and PIE (Position Independent Executable) for ASLR compatibility.[28] By inspecting ELF headers and sections, checksec reports on canary usage for stack smashing protection and Fortify Source for buffered I/O safeguards, enabling quick assessments of binary hardening against exploits.[29] These audits are performed statically on ELF files, providing a security posture overview without execution.[30] The elfkickers suite comprises a collection of utilities for in-depth ELF file analysis, including scripts for computing section entropy to detect packed or obfuscated code and generating statistics on header layouts and symbol tables.[31] Tools within elfkickers, such as elfls for listing sections and readelf variants for parsing, facilitate forensic examination of ELF structures, revealing anomalies like unusual permissions or alignments that may indicate tampering.[32] This set is particularly useful for reverse engineering and quality assurance, focusing on static properties rather than runtime behavior.[33] Valgrind, a suite of dynamic analysis tools, primarily through its Memcheck instrumenter, detects memory errors in running ELF processes by shadowing allocations and tracking accesses in loaded segments.[34] It intercepts ELF binary execution to identify leaks, invalid reads/writes, and use-after-free bugs, with support for debugging information from ELF DWARF sections to pinpoint issues in source code lines.[35] Valgrind's compatibility with ELF on Linux allows comprehensive profiling of heap and stack usage in dynamically linked applications, often revealing subtle defects missed by static checks.[36]Usage and Adoption
Primary Use in Unix-like Systems
The Executable and Linkable Format (ELF) is the predominant binary file format for executables, shared libraries, and core dumps in Unix-like systems, enabling efficient loading and execution by kernels and dynamic linkers. In Linux, ELF has been the default format since kernel version 1.0 released in 1994, with initial support introduced in development kernel 0.99.13 in 1993. The Linux kernel integrates ELF loading through the binfmt_elf module, which registers the format with the execve system call to interpret ELF binaries during process creation. Upon execution, the kernel validates the ELF header's magic bytes and structure, then uses the program header table to map loadable segments into virtual memory via mmap, before passing control to the dynamic loader, typically /lib/ld-linux.so.2 for 32-bit systems or /lib64/ld-linux-x86-64.so.2 for 64-bit.[37][38] In BSD variants, ELF adoption occurred in the mid-1990s as a replacement for the older a.out format to support advanced features like dynamic linking and shared libraries. FreeBSD introduced ELF header files in version 2.2.6 (1996), marking the beginning of its transition, with full native support solidified by version 3.0 in 1998, including FreeBSD-specific extensions such as brand notes in ELF headers to denote compatibility features like ABI versioning. NetBSD transitioned to ELF as its primary format for i386 and sparc ports starting with release 1.5 in 2000, maintaining backward compatibility for a.out binaries through tools like elf2aout for bootloaders and debugging utilities. OpenBSD added initial ELF support in version 1.2 (1996) and made it the native format across all platforms from version 5.4 (2013) onward. These implementations leverage ELF's program headers for kernel-level segment mapping, ensuring portable execution across architectures.[39][40][41] Solaris, originating from System V Release 4 (SVR4) in 1988, was one of the first Unix systems to adopt ELF as its standard format with Solaris 2.0 in 1992, inheriting SVR4's design for object files and executables. The illumos project, an open-source continuation of OpenSolaris since 2010, retains this SVR4-derived ELF support, including unique extensions like the .SUNW_cap section for specifying software and hardware capabilities such as required CPU instructions or platform features to guide linking and loading decisions. In both systems, the kernel performs ELF header validation to confirm file integrity and architecture compatibility, maps program segments into process address space, and defers relocation resolution—such as adjusting addresses for position-independent code—to the runtime linker, ld.so.1, which processes dynamic relocation entries from .rel or .rela sections.[42][43] While macOS primarily employs the Mach-O format for native binaries, it supports ELF indirectly through cross-compilation toolchains in Xcode and third-party GNU binutils, allowing developers to generate ELF files targeting Linux or embedded Unix-like systems without altering the host OS's loader. This partial integration facilitates porting and building for Unix environments but does not involve direct kernel handling of ELF files on macOS itself. Overall, ELF's standardization in Unix-like kernels emphasizes robust header validation to prevent malformed binaries, memory-efficient segment mapping for shared libraries, and deferred relocation processing to minimize load times and enable address space layout randomization for security.[38]Adoption in Non-Unix Environments
The Executable and Linkable Format (ELF) has seen partial adoption in Windows environments through compatibility layers and development tools that bridge POSIX-like functionality with the native Portable Executable (PE) format. Cygwin, a POSIX emulation layer for Windows, supports handling ELF files via libraries such as ELFIO, enabling developers to read and generate ELF binaries within a Unix-like environment while ultimately producing PE executables for Windows execution.[44] Similarly, MinGW provides a GNU toolchain for Windows that can be configured to generate ELF object files for cross-compilation purposes, though native Windows applications remain in PE format.[45] ReactOS, an open-source implementation of the Windows NT kernel, incorporates ELF internals in its debugging subsystem, such as the dbghelp module, to parse ELF modules for compatibility and analysis tasks.[46] BeOS and its successor Haiku adopted ELF as the native executable format starting in the late 1990s, leveraging its Unix-like heritage for efficient binary handling on x86 architectures after transitioning from the earlier Preferred Executable Format (PEF) used on PowerPC.[47] This choice facilitated compatibility with Unix tools and binutils, allowing Haiku to maintain a lightweight yet robust binary ecosystem without major modifications to the core ELF specification.[48] Fuchsia, Google's modular capability-based operating system, employs ELF via its built-in ELF runner for launching executable components.[49] In firmware and real-time operating systems (RTOS), ELF has been adapted for resource-constrained embedded environments, notably in uClinux distributions for microcontrollers lacking memory management units (MMUs). uClinux employs ELF as the base format for executables and shared libraries, with modifications to support flat loading and no-MMU operation, enabling deployment on systems like ARM-based devices.[50] Cross-platform toolchains further extend this adoption; for instance, LLVM and Clang can generate ELF binaries targeting non-Unix architectures directly from Windows or other hosts, facilitating development for embedded and hybrid systems.[51] A key challenge in ELF's adoption outside Unix environments stems from application binary interface (ABI) differences, particularly in calling conventions. The System V ABI, integral to ELF on Unix-like systems, passes the first six integer or pointer arguments in registers RDI, RSI, RDX, RCX, R8, and R9, contrasting with the Windows x64 ABI, which uses RCX, RDX, R8, and R9 for the first four arguments and reserves additional registers for shadow space.[52] These variances necessitate adaptations in loaders and linkers to ensure compatibility, often requiring wrappers or recompilation for cross-environment execution.Applications in Embedded and Specialized Systems
The Executable and Linkable Format (ELF) finds significant application in game consoles, where it supports development kits, homebrew software, and proprietary variants tailored to hardware constraints. In the PlayStation series, the PS2 employs ELF files for homebrew applications and executable packing, enabling modular loading of code segments optimized for the Emotion Engine processor. Similarly, the PS3 utilizes SELF (Signed Executable and Linkable Format), a cryptographically signed extension of ELF, for system executables and dynamic libraries (SPRX files), ensuring secure loading on the Cell Broadband Engine architecture from 2006 onward. For the Nintendo Switch, homebrew development relies on ELF-based formats like .nro files, which are loaded via custom loaders to run unsigned code on the Tegra X1 SoC, facilitating community-driven applications since the console's 2017 release. In mobile ecosystems, ELF serves native code execution in Android, where the Native Development Kit (NDK) compiles C/C++ libraries into ELF shared objects (.so files) linked against Bionic libc, enabling high-performance components in apps since Android 1.0 in 2008. This format allows dynamic loading of optimized binaries for ARM architectures, supporting features like graphics rendering and signal processing in resource-limited environments. On iOS, while the native Mach-O format dominates, ELF plays a role in jailbreak tools through custom loaders that inject ELF shared objects, as demonstrated by developer Comex's "food" module, which enabled loading Android-compatible ELF libraries like libflashplayer.so on jailbroken devices. Historical and specialized uses of ELF extend to PowerPC-based systems, where AmigaOS 4 adopted ELF executables to replace the earlier Extended Hunk Format for PowerPC accelerator cards, providing a standardized structure for binaries on Amiga hardware since 2006. MorphOS, a lightweight OS for PowerPC Macs and Amiga clones, also employs ELF for its executables, leveraging the format's flexibility for efficient media-centric applications on constrained 32-bit and 64-bit PowerPC processors. IBM's AIX operating system on PowerPC historically favored the proprietary XCOFF format but incorporated ELF support through optional toolchains for compatibility with Linux environments, allowing cross-compilation of ELF binaries for PowerPC targets. In blockchain platforms, ELF underpins node implementations running on Unix-like hosts. Ethereum clients, such as the Go Ethereum (Geth) implementation, produce ELF executables for Linux deployments, facilitating consensus and transaction processing on x86 and ARM hosts since the network's 2015 launch. Solana validators, built in Rust, compile to ELF binaries for Unix systems, with on-chain programs specifically packaged as ELF files containing BPF bytecode, enabling high-throughput validation on diverse hardware including ARM-based servers. For embedded and IoT systems, ELF's modularity supports resource-constrained architectures like ARM and RISC-V. Raspberry Pi OS, a Debian derivative for ARM processors, uses ELF for all native executables and libraries, allowing seamless deployment of Linux applications on devices like the Raspberry Pi 4 since 2012. SiFive's RISC-V boards, such as the HiFive series, rely on ELF for firmware and application binaries, with toolchains handling relocations and sections tailored to embedded needs. To minimize footprint in these environments, optimizations like stripping unnecessary sections (e.g., debug symbols via binutils' strip tool) reduce ELF file sizes by up to 50% without affecting runtime functionality, as applied in RISC-V software for IoT edge devices.Extensions and Variants
Multi-Architecture Support Initiatives
As adoption grew, the format evolved to support multiple architectures through the e_machine field in the ELF header, a 16-bit identifier that specifies the target processor, enabling compatibility with over 60 architectures including ARM (EM_ARM=40), MIPS (EM_MIPS=8), and x86-64 (EM_X86_64=62). This design choice decoupled the core file structure from architecture-specific details, allowing ELF to serve as a flexible container for binaries across diverse hardware without requiring format redesigns.[4] To accommodate varying system conventions, ELF incorporates Application Binary Interface (ABI) extensions that build on the generic System V ABI. The Linux generic ABI (gABI), maintained by the Linux Foundation, provides a baseline for ELF usage across Linux distributions, defining common conventions for object files, executables, and shared libraries while leaving room for processor-specific adaptations.[53] Architecture-specific ABIs extend this foundation; for instance, the ARM Embedded Application Binary Interface (EABI), finalized in 2009, tailors ELF for ARM processors by specifying details like relocation types, dynamic linking tags, and procedure call standards to ensure efficient execution on resource-constrained embedded systems. These extensions maintain backward compatibility with the gABI while addressing unique requirements, such as ARM's support for both little- and big-endian byte orders via the e_ident[EI_DATA] field.[4] Efforts to consolidate multiple architectures into a single ELF file have included proposals like FatELF, which embeds several architecture-specific ELF binaries within one container file.[54] Cross-compilation standards have further enhanced ELF's multi-architecture capabilities, particularly through the LLVM project's toolchain. LLVM/Clang supports generating portable ELF object files and executables for numerous targets via the target triple specification (e.g.,armv7-linux-gnueabihf for ARM ELF), enabling developers to build binaries for remote architectures from a single host system without architecture-specific toolchains.[51] This portability relies on ELF's extensible structure, such as the e_flags field for architecture-specific attributes, and integrates with standards like the gABI to produce compatible outputs for linking and execution across ecosystems.[51]