Position-independent code
Position-independent code (PIC) is a form of machine code designed to execute correctly regardless of its absolute location in memory, relying on relative addressing, indirection tables, and dynamic resolution mechanisms rather than hardcoded absolute addresses.[1] This allows the code to be loaded and run at any memory address without requiring runtime modifications or relocations, distinguishing it from position-dependent code that assumes a fixed loading address.[2] PIC is primarily used to enable efficient sharing of code across multiple processes, as in shared libraries and dynamic linkers, where the same code segment can be mapped once and referenced by many programs simultaneously.[3] It supports key system features such as position-independent executables (PIE), which extend these benefits to entire programs, and address space layout randomization (ASLR), a security measure that randomizes memory layouts to hinder exploitation of vulnerabilities.[2] In modern operating systems like those based on ELF (Executable and Linkable Format), PIC facilitates modular software design, including loadable modules and optimized handling of read-only text segments.[4] Compilers such as GCC and Clang generate PIC through specific flags: in GCC,-fPIC produces unrestricted position-independent code suitable for shared objects using a global offset table (GOT) for address resolution, while -fpic imposes machine-specific limits on GOT entries for smaller binaries; analogous options -fPIE and -fpie apply to executables.[4] Clang mirrors this with -fPIC for shared libraries and -fPIE for executables, ensuring compatibility with dynamic loaders that resolve external references at startup.[1] On architectures like ARM, PIC employs PC-relative accesses or static base registers to maintain fixed offsets between code and data, supporting scenarios from embedded systems to full OS environments.[2] Overall, PIC enhances performance by minimizing relocation overhead and promotes secure, flexible software deployment.[3]
Fundamentals
Definition and Principles
Position-independent code (PIC) refers to machine code that can be loaded and executed at any arbitrary memory address without requiring any modifications to the code itself. This contrasts with position-dependent code, also known as absolute code, which embeds fixed, hardcoded memory addresses that assume a specific load location, necessitating adjustments if relocated.[5][6] The core principle of PIC relies on relative addressing modes, where instructions reference targets using offsets from the current position rather than absolute locations. For instance, program counter (PC)-relative addressing computes effective addresses by adding a signed offset to the value of the program counter (the address of the current instruction plus a small increment), enabling branches, jumps, and data accesses to remain valid regardless of the base load address. This approach avoids embedding absolute addresses in the instruction stream, allowing the code to support dynamic loading into non-contiguous memory regions and facilitating efficient sharing across multiple processes.[7][6][5] PIC is particularly useful for shared libraries, as it permits a single instance of the library code to be mapped into different processes' address spaces at varying virtual addresses.[7] A simple example in x86 assembly illustrates the difference: a relative jump likejmp [label](/page/Label) encodes the distance to the target label as an offset from the instruction's position (e.g., +10 bytes ahead), making it position-independent, whereas an absolute jump like jmp 0x400000 directly specifies a fixed memory address, rendering it position-dependent and requiring relocation if moved.[8][6]
Unlike relocatable code, which can be moved but requires runtime or link-time modifications via a relocation table to update embedded addresses, PIC demands no such adjustments for the code section itself, though data references may still need resolution.[9][6]
Benefits and Challenges
Position-independent code (PIC) offers several key advantages in software design, particularly in environments requiring flexibility and resource efficiency. One primary benefit is its support for efficient shared libraries, where the same code can be loaded and executed at different memory addresses across multiple processes without modification, thereby reducing overall memory usage by allowing read-only code segments to be shared in physical memory. This sharing eliminates the need for private copies of the text segment in each process, minimizing swap space reservations and improving system-wide memory efficiency. Additionally, PIC facilitates dynamic loading of modules, such as plugins, by enabling code to be relocated at runtime without fixed address dependencies, which is essential for modular systems like loadable libraries in embedded or extensible applications.[10][2] Another significant advantage is enhanced portability, as PIC allows code to be executed from any valid memory location, making it suitable for embedded systems or virtualized environments where address layouts may vary. For instance, in bare-metal embedded targets, self-relocating executables can be loaded dynamically without hardware-specific adjustments. Furthermore, PIC underpins security features like address space layout randomization (ASLR), which randomizes the base addresses of executables and libraries to mitigate exploits such as return-oriented programming by making memory layouts unpredictable.[2][11][12] Despite these benefits, PIC introduces notable challenges, primarily related to performance and implementation overhead. The use of indirect addressing mechanisms, such as global offset tables for data and procedure linkage tables for function calls, imposes a runtime penalty due to additional memory accesses and register pressure, particularly in code with frequent branches or jumps. On 32-bit x86 architectures, this can result in up to 26% performance degradation in some benchmarks with an arithmetic average of 10%, though the impact is negligible on 64-bit systems like x86_64 or AArch64.[11][12][13] Additionally, generating PIC often increases code size because relative references and indirection structures replace direct absolute addressing, leading to larger binaries in some cases. While startup-time overhead from relocations is another concern, it is generally minor compared to runtime effects in branch-intensive applications.[11][12]Implementation Techniques
Code Relocation Strategies
Code relocation strategies enable executable code to operate correctly regardless of its load address in memory, primarily through the use of relative addressing that avoids embedding absolute addresses in instructions. These techniques ensure that branches, calls, and loads within the code segment reference locations via offsets from the current program counter (PC), allowing the entire code block to be relocated without modification.[14] Relative addressing modes form the foundation of these strategies, where instructions compute target addresses by adding a signed offset to the PC value at the time of execution. For instance, in x86-64 architecture, PC-relative branches and calls use 32-bit relative offsets, while loads and stores leverage RIP-relative addressing, introduced in the Intel 64 architecture to simplify position-independent code generation by treating the instruction pointer (RIP) as the base register for memory accesses. This mode encodes the offset directly in the instruction, enabling efficient access to nearby code or constants without absolute addresses, and supports up to ±2^31 bytes displacement. Similarly, absolute jumps are avoided by using relative instructions such as CALL rel32, where the offset is encoded statically by the assembler relative to the instruction location.[14][14] Compilers play a central role in implementing these strategies through targeted code generation flags that enforce relative addressing. In GCC and Clang, the -fPIC flag instructs the compiler to produce position-independent code by favoring PC-relative instructions for branches, calls, and data accesses within the code segment, while deferring absolute references to runtime resolution where necessary. For PIC-compliant jumps in inline assembly, developers use relative opcodes like the x86 CALL rel32, which specifies a relative displacement from the current instruction pointer. These offsets are resolved at link time, ensuring the code remains valid after relocation.[15] Architecture-specific optimizations further refine these techniques to align with hardware capabilities. In ARM architectures, such as ARMv8-A, relative branches employ PC-relative addressing modes in load/store instructions and branch operations, where the offset is added to the PC to reach targets within a 26-bit signed range for B and BL instructions, facilitating position-independent execution in Thumb or AArch64 modes. For RISC-V, position-independent instructions rely on the AUIPC (Add Upper Immediate to PC) opcode paired with load/store or jump instructions; AUIPC adds a 20-bit immediate (shifted left by 12 bits) to the current PC and stores the result in a register, enabling PC-relative addressing for jumps via JAL (Jump and Link) with its 20-bit PC-relative offset, supporting relocations up to ±1 MiB without absolute dependencies. These methods collectively ensure code segments remain self-contained and relocatable, often integrating briefly with data relocation mechanisms for full executables.[16]Data Relocation and Tables
In position-independent code (PIC), data relocation addresses the challenge of referencing global variables and functions without embedding absolute addresses, which would break when the code is loaded at different memory locations. The Global Offset Table (GOT) serves as a key mechanism for this, functioning as an array of entries that store absolute addresses for global data and functions. Positioned in the data section of the executable or shared library, the GOT is accessed via relative offsets from the code, allowing the runtime loader to patch these entries with actual addresses post-loading. This indirection ensures that data references remain valid regardless of the load address, maintaining the read-only nature of the code segment.[17][18] Complementing the GOT for function calls is the Procedure Linkage Table (PLT), a series of stubs in the code section that enable indirect invocation of external functions without fixed addresses. Each PLT entry initially points to a resolver routine in the dynamic linker; upon the first call, it triggers lazy binding, where the linker resolves the target function's address and updates the corresponding GOT entry. Subsequent calls then jump directly to the resolved address via the GOT, avoiding repeated resolution overhead. This setup supports efficient dynamic linking while preserving PIC properties, as the PLT uses relative addressing to access the GOT.[19][17] The dynamic relocation process occurs primarily at load time, managed by the runtime loader, which processes relocation entries to adjust GOT contents. For each global symbol, the loader calculates the absolute address based on the library's load base and patches the relevant GOT slot— for instance, using relocation types likeR_X86_64_GLOB_DAT for data or R_X86_64_JUMP_SLOT for functions. This one-time adjustment ensures all indirect references resolve correctly without modifying the code itself. In assembly, GOT access typically involves PC-relative loading, such as the x86-64 instruction lea rax, [rip + GOT_offset], which computes the GOT's base address relative to the instruction pointer before adding a symbol-specific offset to reach the entry; the effective address is then dereferenced to obtain the target's value. Relative code addressing facilitates this table access by providing offsets from the current program counter.[20][18]
PIC implementations distinguish between small and large models to optimize GOT usage based on address space constraints, particularly differing in 32-bit and 64-bit systems. In the small model, prevalent in 64-bit environments, the GOT is assumed to lie within a 32-bit offset from any code location, enabling direct PC-relative access without an explicit GOT base register— for example, global data is loaded via mov rax, [rip + offset_to_got_entry]. This simplifies code and reduces instructions. Conversely, the large model, necessary for broader address ranges in 64-bit systems or certain 32-bit scenarios with extended addressing, requires establishing a GOT pointer in a register (e.g., via a multi-instruction prologue loading into rbx), followed by base-plus-offset addressing. The effective address calculation in this case follows the form:
\text{Effective address} = \text{base_register} + \text{offset}
where the base is the GOT pointer, accommodating 64-bit offsets for symbols potentially far from the code. These models balance performance and flexibility, with the small model favored for its efficiency in typical deployments.[21][18]
Historical Evolution
Early Innovations
The development of position-independent code (PIC) emerged in the 1960s as a response to the demands of early time-sharing systems, which required efficient memory utilization and sharing among multiple users to minimize storage redundancy and enable concurrent access. In these systems, code needed to be relocatable to arbitrary memory locations without modification, driven by the need to support diverse user workloads on limited hardware resources. This motivation was particularly evident in pioneering projects aimed at creating multi-user environments, where static addressing would have hindered dynamic loading and sharing.[22][23] Multics, initiated in 1965 by a collaboration between MIT's Project MAC (led by Fernando J. Corbató), Bell Telephone Laboratories, and General Electric, introduced PIC through its segmented memory model to facilitate time-sharing on the GE-645 computer, delivered in 1967. The virtual address space consisted of up to 2^{14} segments, each up to 2^{18} 36-bit words, referenced via generalized addresses (segment number and word offset) that were location-independent. Procedure segments were designed as pure, non-self-modifying code, allowing them to be loaded at arbitrary physical addresses and shared across processes without recompilation or relocation, a key enabler for its multi-user design serving remote terminals efficiently. Dynamic linking resolved symbolic references to these generalized addresses at runtime, supported by supervisor software and descriptor tables. This approach reduced memory clutter by promoting reusable code segments, aligning with the project's goal of an "information utility" for scalable computing.[22][23] Similarly, IBM's Time Sharing System/360 (TSS/360), made available on a trial basis in 1967 for the System/360 Model 67, employed PIC modules to support dynamic linking in a mainframe time-sharing environment. Each routine maintained separate virtual constants (V-cons) for code addresses and relocatable constants (R-cons) for data, stored in data segments; callers copied the appropriate R-con into a save area before invocation, enabling the code to execute regardless of its load address. This brute-force method separated code from position-dependent data, allowing shared modules to be loaded dynamically without per-process relocation, which was essential for concurrent task execution and resource sharing among programmer-users at terminals. Developed by IBM to compete in multi-user computing, TSS/360's PIC facilitated efficient mainframe utilization but remained experimental and unsupported as a product.[24][25] Early PIC approaches, however, were constrained by their dependence on specialized hardware support, such as the GE-645's segmentation capabilities for Multics or the Model 67's virtual addressing extensions for TSS/360, limiting portability across different architectures. These systems prioritized hardware-software integration for performance in controlled environments, laying groundwork for later, more generalized techniques.[23][25]Standardization in Modern OS
In the late 1980s, SunOS 4.x introduced position-independent code (PIC) as a core feature for shared libraries, enabling multiple processes to share the same code segments without requiring relocation at load time. This innovation, detailed in the seminal USENIX paper on SunOS shared libraries, relied on compiler-generated PIC using flags like -pic, which produced relocatable references resolved by the runtime linker at process startup. The a.out object file format was extended with dedicated relocation tables to support this dynamic loading mechanism, marking an early formalization of PIC in Unix-like systems.[26] The Executable and Linkable Format (ELF), developed and standardized by Unix System Laboratories in the early 1990s as part of the System V Application Binary Interface (ABI), established a comprehensive standard for PIC across Unix variants. ELF explicitly defines relocation types tailored for position-independent code, such as R_386_PC32, which supports PC-relative addressing to compute symbol offsets without absolute addresses, facilitating efficient shared library usage. This specification, first published in the Tool Interface Standard (TIS), provided a unified structure for sections like .rel.dyn and .rela.plt, where PIC relocations are stored and processed by the dynamic linker.[27] The shift from the a.out format to ELF during the 1990s significantly improved the portability of PIC implementations. Whereas a.out relied on ad-hoc vendor extensions for relocation data, often limiting interoperability, ELF's standardized relocation entries and segment mapping enabled consistent PIC support across diverse Unix architectures and vendors, streamlining dynamic linking and reducing compatibility issues in heterogeneous environments.[28] Key milestones in ELF's adoption include the 1992 release of the initial specification, which was immediately integrated into Solaris 2.0 (SunOS 5.0) for production use in shared libraries and executables. Linux followed suit in the mid-1990s, with Linux libc 5 providing initial ELF support in 1995, the GNU C Library (glibc) achieving full support with version 2.0 in 1997, and full kernel support by version 1.2 in 1995, accelerating widespread PIC standardization in open-source Unix derivatives.[27][28] In contrast to proprietary formats like COFF, which offered limited or architecture-specific relocation mechanisms, ELF's explicit PIC support—through dedicated types for global offset tables (GOT) and procedure linkage tables (PLT)—ensured greater flexibility and efficiency for relocatable code, promoting cross-platform adoption without custom modifications.[27]Applications in Operating Systems
Unix-like Systems
In Unix-like systems, position-independent code (PIC) is primarily implemented through the Executable and Linkable Format (ELF), which serves as the foundation for dynamic linking. In Linux distributions, the GNU Compiler Collection (GCC) enables PIC generation via the-fPIC flag during compilation, producing code that accesses global data and functions through a Global Offset Table (GOT) and Procedure Linkage Table (PLT). These tables allow relocations to be resolved at runtime by the dynamic linker ld.so, avoiding fixed addresses and supporting load-time address randomization.[4][29]
Shared object files (.so) in Linux leverage PIC to facilitate memory-efficient library sharing across multiple processes, as the code can be loaded at varying virtual addresses without per-process relocation. For instance, a typical compilation workflow using GCC involves first compiling source files with position independence: gcc -fPIC -c example.c, followed by linking into a shared library: gcc -shared -o libexample.so example.o. The resulting .so file contains deferred relocations that ld.so processes upon loading, updating GOT entries for data accesses and PLT stubs for function calls.[29]
Solaris implements PIC within the ELF framework using architecture-specific relocation types, such as R_AMD64_GOTPCREL for instruction-pointer-relative GOT addressing and R_AMD64_PLT32 for PLT-based procedure calls, enabling efficient dynamic linking in shared objects. These relocations differ from standard ELF by incorporating Sun Microsystems extensions for SPARC and x64 architectures, optimizing for Solaris's runtime environment while maintaining compatibility with generic ELF tools.[30]
FreeBSD adopts standard ELF relocations for PIC in shared libraries, with the ld-elf.so.1 dynamic linker handling GOT and PLT resolutions similar to Linux, though it includes BSD-specific variations like optimized thread-local storage support and stricter validation of ELF headers for security. This approach ensures seamless integration with FreeBSD's ports system, where PIC is mandatory for dynamic libraries to support address space layout randomization.[31]
In macOS, a Unix-like system based on Darwin, PIC is supported through the Mach-O executable format for shared libraries (.dylib files) and frameworks. The dynamic linker dyld resolves symbols at load time using techniques like relative addressing with @rpath paths and a dynamic loader environment, enabling code sharing and ASLR without fixed addresses. Compilers like Clang generate PIC code by default for shared objects via flags such as -dynamiclib, integrating with Xcode build tools for modular app development.[32]
Shared libraries have long been compiled with PIC by default in GCC-based toolchains. For enhanced security, many contemporary Linux distributions, including Ubuntu since version 16.10 on architectures like amd64, default to position-independent executables (PIE) via flags like -fPIE, promoting features like address space layout randomization and reducing relocation overhead, often set implicitly by build tools such as dpkg-buildflags. For performance tuning, GCC offers -fpic as an alternative to -fPIC for scenarios with limited global references, generating smaller code by assuming a GOT size under 64 KB (e.g., on x86), though -fPIC is preferred for larger libraries to avoid runtime errors from GOT overflow.[4][33]
Windows Systems
In Windows operating systems, dynamic-link libraries (DLLs) serve as the primary units for position-independent code, allowing them to be loaded at varying base addresses across different processes to support memory sharing and address space layout randomization (ASLR).[34] Unlike static linking, DLLs enable code reuse by resolving external dependencies at runtime through the Import Address Table (IAT), a data structure in the Portable Executable (PE) format that the loader populates with actual function addresses from dependent modules.[34] This mechanism facilitates relocation without requiring the DLL's code to be recompiled, though it relies on the loader to adjust references if the preferred load address is unavailable.[35] The PE file format supports this through base relocations, a table in the optional header that lists offsets requiring adjustment when the image is loaded away from its preferred base address.[34] At load time, the Windows loader applies these fixups to code and data pointers, making the DLL operational at the new location.[36] This approach contrasts with the Unix-like Global Offset Table (GOT) mechanism, where position-independent code often uses runtime indirection for symbol resolution, leading to direct addressing in Windows post-relocation but with load-time processing overhead for extensive fixups.[35] To compile DLLs compatible with ASLR and position-independent loading, developers use the/DYNAMICBASE linker flag in Microsoft Visual C++ (MSVC), which sets the dynamic base flag in the PE header and generates necessary relocation information.[37] For example, a DLL might export functions using __declspec(dllexport) (e.g., extern "C" __declspec(dllexport) int Add(int a, int b) { return a + b; }), while an executable imports them via __declspec(dllimport) and the IAT, allowing seamless linking without hardcoded addresses.[37]
DLLs were introduced in Windows 1.0 in 1985 as a means to share code and resources efficiently in 16-bit environments, evolving significantly with the shift to 32-bit in Windows 95 and further with 64-bit architectures in Windows XP (2001), where RIP-relative addressing enabled more efficient position-independent code without as many relocations for instructions.[38][39]
A key limitation in Windows is that main executables (EXEs) have traditionally not been compiled as position-independent, relying on a fixed base address unless explicitly enabled with flags like /DYNAMICBASE, in contrast to Unix-like position-independent executables (PIE) that are more routinely supported.[37] This design prioritizes load-time optimization for primary images but can complicate ASLR implementation for standalone applications.[40]
Advanced Topics
Position-Independent Executables
Position-independent executables (PIE) extend position-independent code (PIC) principles to the main program binary, compiling the entire executable—code, data, and dependencies—as relocatable, allowing it to load at any random base address without fixed assumptions about its location in memory. This contrasts with traditional position-dependent executables, which are linked to a specific load address and cannot be relocated without modification. PIE enables full address space layout randomization (ASLR) for the executable itself, randomizing not only libraries and stack but also the program's text and data segments.[41][42] To generate a PIE binary, compilers like GCC use the-pie linker flag alongside PIC options such as -fPIE, producing an ELF file with type ET_DYN (shared object) rather than the conventional ET_EXEC (executable). For instance, the command gcc -fPIE -pie main.c -o main creates a PIE executable. Analysis with tools like readelf confirms this: running readelf -h main displays "Type: DYN (Shared object file)" in the header, indicating its relocatable nature, while a non-PIE binary shows "Type: EXEC (Executable file)." In Linux, the execve system call invokes the dynamic linker (ld.so), which loads the PIE binary into a randomized virtual memory region, applying an offset based on the kernel's ASLR configuration to prevent predictable addressing.[43]
Adoption of PIE has grown for security hardening, becoming the default in major distributions to enable comprehensive ASLR. Fedora enabled PIE by default for all packages starting with Fedora 23 in 2015, applying it across architectures with exceptions for performance-critical components. Ubuntu introduced default PIE compilation starting in Ubuntu 16.10 (Yakkety Yak) for amd64, ppc64el, and s390x architectures.[44][45]
Compared to PIC shared libraries, which relocate only relative to the program's base, PIE requires relocating the entire executable, including global offsets and GOT/PLT entries, leading to higher initial load-time overhead from additional dynamic relocations—averaging about 16% performance impact on startup (though absolute times are minimal, ranging from 0.2 to 11 ms) but negligible at runtime. This trade-off enhances security by randomizing the executable's base address, making it harder for attackers to predict code locations for exploits like return-oriented programming.[12][42]