Calling convention
In computer science and computer architecture, a calling convention is a standardized set of rules that governs how one piece of code transfers control to a subroutine or function, passes arguments to it, handles the return of values, and manages the program's stack and registers to ensure proper execution and data integrity.[1][2][3] These conventions specify critical details such as the order and mechanism for passing parameters—typically prioritizing registers for the first few arguments (e.g., up to six or eight, depending on the architecture) before spilling excess to the stack—and define how return values are placed, often in specific registers like %rax in x86-64 or x10 in RISC-V.[1][2] They also delineate register preservation rules, distinguishing between caller-saved registers (which the calling code must preserve if needed) and callee-saved registers (which the called function must restore), to prevent unintended data corruption across function boundaries.[1][4] Stack management is another core aspect, including alignment requirements (e.g., 16-byte boundaries), the direction of stack growth (usually downward), and responsibility for cleanup—such as whether the caller or callee adjusts the stack pointer after the call.[1][2] Calling conventions are architecture-specific and often influenced by operating systems and application binary interfaces (ABIs), with prominent examples including the System V AMD64 ABI for Linux on x86-64, the Microsoft x64 convention for Windows, and the ARM Procedure Call Standard for ARM processors.[1][5][4] Their primary purpose is to enable interoperability, allowing code compiled by different compilers or written in various languages to link and execute correctly, while optimizing for performance by minimizing unnecessary memory accesses.[1][2] Violations of these conventions can lead to runtime errors, such as stack overflows or incorrect parameter values, making adherence essential in low-level programming like assembly and systems development.[6][3]Introduction
Definition and Purpose
A calling convention is a standardized set of rules that specifies how subroutines or functions receive parameters from their callers, manage the stack and registers, and return values to ensure interoperability between separately compiled modules of code.[7][8] These conventions define the interface for function invocation at the machine code level, governing aspects such as argument passing locations and the preservation of caller state.[1] The primary purpose of calling conventions is to facilitate modular programming by providing a consistent mechanism for procedure linkage across different compilers, assemblers, and tools, thereby preventing runtime errors such as stack corruption or incorrect parameter access.[7] They are essential for generating correct machine code that operates reliably in mixed-language environments or with external libraries.[8] By standardizing these low-level interactions, calling conventions enable binary compatibility, allowing object files from disparate sources to link successfully without recompilation.[7] Calling conventions emerged in the 1970s alongside the rise of structured programming and early compilers, particularly with architectures like the PDP-11, where initial subprogram calling rules were formalized to support efficient subroutine invocation and reentrancy.[9] Over time, they evolved to accommodate optimizations, portability across platforms, and advanced features like dynamic linking. Key benefits include enhanced debugging through predictable stack frame layouts, support for profiling and disassembly tools, and the integration of inline assembly within high-level languages.[7]Related Concepts
The Application Binary Interface (ABI) defines the low-level conventions for how compiled programs interact with the operating system, libraries, and other binaries on a specific architecture, encompassing aspects such as calling conventions, data type representations, and memory alignments. Calling conventions form a key subset of the ABI, dictating how function arguments are passed (e.g., via registers or stack) and results are returned, while the broader ABI also specifies the sizes, alignments, and layouts of fundamental data types like integers and structures to ensure binary compatibility across modules. For instance, in the System V ABI for x86-64, integers are typically 32 or 64 bits with natural alignment, whereas floating-point types may require specific padding to match hardware expectations.[10][11] In contrast, the Application Programming Interface (API) operates at a higher level, providing a source-code interface for developers to interact with software modules, libraries, or systems without concern for underlying binary details. While an API might define function signatures and behaviors in languages like C or Java, it does not specify binary-level details such as register usage or object layouts, which are instead governed by the ABI to enable linking of independently compiled code. This distinction ensures that source-code portability via APIs does not imply binary compatibility, as changes in the ABI (e.g., due to compiler updates) can break executables even if the API remains unchanged.[12][13] The Procedure Linkage Table (PLT) and Global Offset Table (GOT) are critical components in dynamic linking for formats like ELF, enabling runtime resolution of external function and data addresses in shared libraries. The PLT serves as a trampoline for indirect function calls, initially redirecting to a dynamic linker stub that populates the GOT with the actual addresses upon first invocation, thus supporting lazy binding and position-independent code. These mechanisms depend on the platform's calling conventions to correctly pass control and parameters during resolution; for example, in the x86-64 System V ABI, calls to PLT entries use the standard register-based parameter passing to invoke the linker resolver without disrupting the caller's stack frame.[14][7] Inline assembly and Foreign Function Interfaces (FFI) rely on calling conventions to seamlessly integrate low-level assembly code with high-level languages, ensuring that function calls across language boundaries adhere to the expected ABI. In FFI, high-level languages like Rust or Haskell declare foreign functions using conventions such asextern "C" to match the C ABI, which specifies parameter passing and return mechanisms, allowing safe invocation of C libraries while wrapping calls in unsafe blocks to handle potential memory or threading issues. Similarly, inline assembly in compilers like GCC or LLVM requires explicit adherence to the calling convention (e.g., preserving callee-saved registers and aligning the stack) to avoid corrupting the caller's state when embedding assembly snippets directly in C code.[15]
Debugging formats such as DWARF and PDB incorporate calling convention details to facilitate stack unwinding and symbol resolution during runtime analysis or exception handling. DWARF's Call Frame Information (CFI) encodes rules derived from the ABI—such as register mappings and stack adjustments at each instruction—to enable debuggers like GDB to reconstruct call stacks by virtually restoring registers and pointers across frames. Likewise, Microsoft's PDB format embeds unwind data in the PE optional header, using ABI-specific conventions (e.g., x64 table-based unwinding) to trace execution paths and locate variables, ensuring accurate backtraces even in optimized binaries.[16]
Types of Calling Conventions
Calling conventions can be classified by their approach to parameter handling, which determines how arguments are passed between functions. Stack-based conventions push all parameters onto the call stack, offering simplicity and compatibility across varying numbers of arguments but incurring overhead from memory accesses.[7] Register-based conventions pass the first few arguments in CPU registers to minimize memory operations and improve speed, particularly for small argument counts, while spilling excess parameters to the stack.[7] Hybrid conventions combine these methods, using registers for initial parameters and the stack for overflow, as seen in modern 64-bit systems where up to four integer arguments are register-passed before stack usage.[17] Another key classification distinguishes conventions by the responsibility for stack cleanup after parameter passing. In caller-cleanup conventions, such as __cdecl, the calling function adjusts the stack pointer to remove arguments post-call, enabling variable-argument functions like printf but adding code size due to repeated cleanup instructions.[18][7] Conversely, callee-cleanup conventions, exemplified by __stdcall, require the called function to clean the stack using a fixed argument count known at compile time, which reduces caller overhead and executable size for APIs with consistent signatures, such as the Windows API.[7] Performance-optimized variants like fastcall and vectorcall extend standard conventions by prioritizing register usage for specific argument types. Fastcall passes the first two integer parameters in registers—ECX and EDX on x86—to accelerate short calls, with remaining arguments on the stack, though its benefits diminish in 64-bit environments where registers are more abundant.[19][7] Vectorcall builds on fastcall by incorporating SSE/AVX registers (e.g., XMM0–XMM5 or YMM0–YMM5) for up to six vector arguments on x86 or four on x64, reducing stack pressure for SIMD-heavy code and enabling by-value passing of aggregates larger than eight bytes, which enhances throughput in vectorized computations.[20] Conventions also vary in support for position-independent code (PIC), which allows relocatable binaries without address fixes at load time. Position-dependent conventions assume fixed load addresses, simplifying direct jumps but limiting shared libraries. Position-independent variants, as in the System V ABI, employ IP-relative addressing or a Global Offset Table (GOT) accessed via a dedicated register like %r15 for indirect calls, enabling dynamic relocation while maintaining calling efficiency across small (IP-relative only), medium (with large data offsets), or large (full 64-bit GOT) models.[21] The evolution of calling conventions reflects architectural shifts from memory-constrained early systems to performance-oriented designs. Early conventions, prevalent in 16-bit x86 environments like DOS, relied heavily on stack-based models with limited registers, prioritizing simplicity for segmented memory.[22] Over time, as RISC architectures emphasized register abundance, conventions transitioned to register-heavy approaches, such as RISC-V's use of eight argument registers (a0–a7) for integers and fa0–fa7 for floats, reducing latency and aligning with compiler optimizations in 64-bit and embedded systems.[23][7]Core Elements
Parameter Passing Methods
Parameter passing methods in calling conventions define how arguments are transferred from the caller to the callee, ensuring interoperability between functions compiled by different tools. These methods balance efficiency, safety, and flexibility, with choices influencing runtime performance and memory usage. Common techniques include passing data by value, by reference, or specialized variants for output parameters, each implemented via registers or the stack as per the convention's rules.[24] In pass-by-value, the caller copies the argument's value into a location accessible to the callee, such as a register or stack slot, preventing modifications to the original data. This approach is straightforward and isolates the function's effects but can be inefficient for large data types due to copying overhead. For instance, in languages like C, primitive types like integers are typically passed by value.[24][17] Pass-by-reference, also known as pass-by-pointer, involves the caller providing the memory address of the argument, allowing the callee to access and potentially modify the original data without copying. This method is more efficient for large or complex structures, as only the address (often fitting in a register) is transferred, though it introduces risks like unintended aliasing where multiple parameters refer to the same memory. It is prevalent in C-like languages for modifiable parameters.[24] For output parameters, pass-by-result and pass-by-copy-restore provide mechanisms to return modified values to the caller. In pass-by-result, the callee writes to a caller-allocated location without initial copying, suitable for pure outputs but requiring careful memory management to avoid uninitialized access. Pass-by-copy-restore, or value-result, copies the argument into the callee's scope at entry, allows modifications, and copies the result back upon exit; this handles inputs that may be overwritten but incurs double-copy overhead and can lead to order-dependent behavior in multi-parameter calls. These variants appear in languages like Ada for in-out parameters.[24][25] When passing structures or aggregates, conventions distinguish between small and large types to optimize transfer. Small structs, often those fitting within one or two registers (e.g., up to 16 bytes in some standards), are passed by value directly in registers for speed, while larger ones are passed by reference or on the stack to avoid excessive copying. Alignment rules typically require padding to natural boundaries (e.g., 8-byte), adding minor memory overhead but ensuring efficient access. For example, in standard ABIs, a 4-byte struct might occupy a full 8-byte register slot.[17] Variadic arguments, as in C's ellipsis notation (...), follow hybrid rules where fixed parameters use the standard method (registers first), but additional arguments spill to the stack regardless of size, enabling flexible but less optimized handling. The callee accesses them via macros like va_start, which rely on the stack layout for enumeration, often with alignment to word size. This accommodates variable counts but complicates optimization due to unknown types.[26][17] Trade-offs in these methods center on speed versus flexibility: register-based passing (for values or pointers) minimizes latency compared to stack pushes, which involve memory operations, but registers are limited, forcing spills for many arguments. By-value suits small, immutable data to avoid aliasing risks, while by-reference or copy variants enable modifications at the cost of potential overhead—e.g., copying a large struct by value might double memory traffic. Alignment padding, though small (typically 0-7 bytes per parameter), accumulates in calls with aggregates, influencing overall efficiency. Overall, conventions prioritize register use for the first few parameters to leverage CPU speed, referencing stack management only when necessary.[24]Stack and Register Usage
In calling conventions, the stack frame represents the memory allocation on the call stack for a function invocation, typically including the return address, saved registers, local variables, and sometimes parameters if not fully passed in registers. The return address stores the location to resume execution in the caller after the function completes, while saved registers preserve the caller's state for callee-saved registers. Local variables occupy space allocated dynamically based on the function's needs, and parameters may spill onto the stack beyond the initial register allocation. This layout ensures proper isolation between function invocations and facilitates stack unwinding for debugging or exception handling.[27] A typical stack frame layout, growing downward from higher to lower memory addresses, can be represented as follows:This structure varies slightly by platform but maintains the core components for runtime management. For instance, in the PowerPC ELF ABI, the stack frame header includes a back chain to the previous frame, condition register save area, link register (return address) save area, and additional metadata like TOC save area.[28] The function prologue establishes the stack frame at entry, typically by saving the previous frame pointer, setting the current frame pointer, and allocating space for locals, while the epilogue reverses these steps at exit to restore the stack and return control. A common sequence in x86 conventions involves instructions likeHigher addresses +-------------------+ | Caller parameters | (including spills if exceeding registers) +-------------------+ | [Return address](/page/Return_address) | <- Pushed by CALL [instruction](/page/Instruction) +-------------------+ | Saved registers | <- e.g., frame pointer (optional) +-------------------+ | Local variables | <- Allocated in [prologue](/page/Prologue) +-------------------+ | Spill area / Temp | <- For additional data if needed +-------------------+ <- %rsp ([stack](/page/Stack) pointer) Lower addressesHigher addresses +-------------------+ | Caller parameters | (including spills if exceeding registers) +-------------------+ | [Return address](/page/Return_address) | <- Pushed by CALL [instruction](/page/Instruction) +-------------------+ | Saved registers | <- e.g., frame pointer (optional) +-------------------+ | Local variables | <- Allocated in [prologue](/page/Prologue) +-------------------+ | Spill area / Temp | <- For additional data if needed +-------------------+ <- %rsp ([stack](/page/Stack) pointer) Lower addresses
push ebp to save the caller's base pointer, mov ebp, esp to establish the new frame pointer, and sub esp, N to allocate local space in the prologue; the epilogue then uses mov esp, ebp, pop ebp, and ret to deallocate and return. These operations ensure the stack pointer is adjusted correctly and registers are preserved, with multiple epilogues possible for functions with early exits.[29][7]
Register preservation rules divide registers into caller-saved (volatile) and callee-saved (non-volatile) categories to balance efficiency and reliability across function calls. Caller-saved registers, used for temporaries like intermediate computations, may be freely modified by the callee without restoration, placing the burden on the caller to save them if needed before the call. In contrast, callee-saved registers hold longer-lived values and must be preserved by the callee, which saves them in the prologue (often on the stack) and restores them in the epilogue. This convention minimizes overhead: callers avoid saving registers they won't use post-call, while callees only save those they modify. For example, in the x64 Microsoft convention, registers like RAX and RCX are caller-saved, while RBX and R12-R15 are callee-saved.[17][7]
Some conventions incorporate a red zone, an unwritten area immediately below the stack pointer reserved for fast temporary storage without explicit allocation, enabling optimizations by avoiding prologue adjustments for small locals. In the x86-64 System V ABI, this zone spans 128 bytes and remains untouched by asynchronous events like signals, allowing leaf functions to use it directly for efficiency. However, its use is platform-specific and requires compiler awareness to prevent conflicts with interrupt handlers.[27]
Stack pointer alignment requirements ensure optimal performance for vector operations and SIMD instructions, typically mandating that the stack pointer be aligned to a 16-byte boundary upon function entry and maintained throughout, except in prologues/epilogues or leaf routines. Misalignment can incur hardware penalties, such as additional cycles for unaligned memory access, so conventions enforce this via prologue adjustments (e.g., subtracting multiples of 16 from the stack pointer). In 64-bit systems like x86-64, this 16-byte rule supports efficient use of 128-bit XMM registers, with higher alignments (e.g., 32 bytes) for 256-bit operations in some extensions.[27][17][7]
To mitigate stack overflows that could corrupt frames, many conventions integrate overflow protection mechanisms like canaries (or security cookies), which insert a random value between locals and sensitive components such as the return address. The prologue places this canary on the stack, and the epilogue verifies it before return; a mismatch triggers termination to prevent exploitation. Enabled via compiler flags like Microsoft's /GS or GCC's -fstack-protector, this adds minimal runtime overhead while protecting against buffer overruns in vulnerable functions.[30]
Function Return Mechanisms
In calling conventions, the return address is typically managed by the caller pushing it onto the stack immediately before transferring control to the callee via a jump instruction. Upon completion, the callee restores control to the caller by popping this address from the stack and branching to it, ensuring resumption of execution at the instruction following the original call. This mechanism maintains the integrity of the program's control flow across function boundaries.[21] Scalar return values, such as integers or small floating-point numbers that fit within a single register or pair of registers, are conventionally returned in designated registers to minimize overhead and enable efficient access by the caller. For instance, integer scalars are placed in general-purpose registers, while floating-point scalars use dedicated floating-point or vector registers. This approach leverages hardware efficiency for common data types.[17][21] For larger or composite return values, such as structures exceeding a certain size threshold (often 128 bits or more), calling conventions require the caller to allocate memory in advance and pass a hidden pointer to this location as an implicit parameter. The callee then writes the return value directly to this caller-provided memory, and may return the pointer itself in a register to inform the caller of the location. This "pass-by-reference" strategy for returns avoids the inefficiency of copying large data on the stack or in registers. Void-returning functions, by contrast, transfer control back to the caller without producing any value, simply executing the return mechanism to pop the address and branch.[7][21] Exception handling integrates with calling conventions through defined stack unwinding procedures, where runtime systems use frame information (such as DWARF tables or equivalent ABI-specified metadata) to traverse the call stack, invoke destructors for local objects, and propagate exceptions to appropriate handlers. This ensures that resources are properly cleaned up during error conditions without violating the convention's stack discipline. Tail call optimization further enhances efficiency by allowing a function's final call to another function to reuse the current stack frame, effectively replacing the return with a direct jump, provided the callee adheres to the same convention and no conflicting operations (like large returns or exception setup) are required; this is particularly useful for recursive algorithms to prevent stack overflow.[31][32][33]Platform Variations
Multiple Conventions per Platform
Platforms support multiple calling conventions to accommodate legacy software, optimize for specific use cases such as operating system kernels versus user-space applications, and facilitate interoperability between different programming languages and libraries.[34] Legacy support is particularly crucial on Windows, where older applications developed under 16-bit and early 32-bit environments rely on conventions like __cdecl for compatibility with existing binaries and dynamic link libraries (DLLs).[5] Optimization differences arise because conventions vary in stack management and register usage; for instance, __stdcall reduces overhead in frequent calls by having the callee clean the stack, making it suitable for API functions, while __fastcall prioritizes speed by passing initial arguments in registers like ECX and EDX on x86. Language interoperability is enhanced by allowing conventions tailored to language runtimes, such as those bridging C++ and COM components on Windows.[35] On Windows x86 (32-bit), the platform supports several conventions including __cdecl (default for C/C++), __stdcall (used for Win32 API calls), and __fastcall (for performance-critical functions with few arguments).[5] In contrast, Linux primarily adheres to the System V ABI, which defines a single dominant convention for user-space applications—passing up to six integer or pointer arguments in registers (RDI, RSI, RDX, RCX, R8, R9 on x86-64)—but includes alternatives for system calls and legacy 32-bit code using __cdecl-like behavior.[21] These alternatives on Linux often emerge in cross-compilation scenarios or when interfacing with Windows binaries via tools like Wine, where emulation layers handle convention translations.[36] Developers select conventions explicitly in code using compiler-specific pragmas or attributes to ensure compatibility. In Microsoft Visual C++ (MSVC), attributes like __stdcall or __fastcall are appended to function declarations, such asint __stdcall MyFunction(int a);, directing the compiler to generate code adhering to the specified stack and register rules.[37] GCC and Clang provide similar functionality through the stdcall or fastcall attributes, or via command-line flags like -mregparm=3 for register-based passing, allowing fine-grained control within the same binary.[38] This selection mechanism enables mixing conventions in a single program, such as using __stdcall for Windows API interactions while defaulting to System V for internal functions on Linux.
Interoperability challenges arise when conventions mismatch between caller and callee, often leading to stack corruption, incorrect parameter values, or application crashes due to improper stack pointer adjustments.[35] For example, calling a __stdcall function with __cdecl code results in the caller attempting to clean the stack, leaving residual data that corrupts subsequent operations.[39] Reverse engineering tools like disassemblers (e.g., IDA Pro or Ghidra) aid in detection by analyzing prologue/epilogue code patterns, such as the presence of RET n instructions indicating fixed stack cleanup sizes in __stdcall versus variable cleanup in __cdecl.[7]
The evolution of calling conventions on platforms reflects a shift from rigid, architecture-fixed standards in early systems to selectable, flexible models for modern embedded and desktop environments. Early 16-bit x86 systems under MS-DOS enforced conventions tied to segment registers for simplicity, but 32-bit transitions introduced multiplicity to support diverse APIs and reduce migration friction.[34] By the x86-64 era, platforms like Windows consolidated to a single fastcall-like convention for efficiency, while Linux's System V ABI emphasized register usage to minimize stack pressure, yet retained options for legacy via compiler flags.[7] This progression prioritizes backward compatibility in desktop OSes alongside optimization in resource-constrained embedded systems, where selectable conventions allow tailoring to hardware constraints like limited registers.[21]
ABI Integration
Calling conventions are integral components of Application Binary Interfaces (ABIs), which define the low-level interface between applications and the operating system, including how data is laid out in memory, how system calls are invoked, and how dynamic linking occurs to resolve symbols at runtime. In ABIs, calling conventions specify the mechanics of function invocation, such as register allocation and stack frame construction, ensuring that compiled code from different compilers or languages can interoperate seamlessly within the same ecosystem. For instance, data layout rules in an ABI dictate padding and alignment for structures passed via calling conventions, preventing misalignment faults on hardware, while system calls often adhere to the platform's calling convention to maintain consistency between user-space and kernel interactions. Dynamic linking relies on these conventions to correctly marshal arguments when loading shared objects, as mismatches can lead to runtime errors like stack corruption.[21] Prominent platform-specific ABIs incorporate calling conventions with tailored rules for alignment and padding to optimize performance and hardware compatibility. The System V ABI, widely used in Unix-like systems such as Linux and BSD, mandates 16-byte stack alignment before function calls and classifies parameters into integer, floating-point, or memory categories, with padding added to structures for natural alignment (e.g., up to 16 bytes for vector types). In contrast, the Microsoft x64 ABI for Windows employs a similar 16-byte alignment but introduces "shadow space" (32 bytes reserved on the stack for the callee) and passes the first four integer parameters in RCX, RDX, R8, and R9, with floating-point arguments in XMM0–XMM3; padding for structures follows a 8-byte maximum alignment in most cases, differing from System V's vector-aware rules. These differences arise from historical OS choices: System V emphasizes Unix portability, while Microsoft's design prioritizes integration with its runtime libraries, affecting how padding is inserted for aggregate types in parameter passing.[21][40][7] ABI versioning ensures long-term compatibility by providing stability guarantees, particularly for critical interfaces like system calls, to prevent disruptions from evolving calling conventions. In Linux, the kernel commits to ABI stability for the syscall interface, meaning existing syscall numbers, argument layouts, and return conventions remain unchanged across kernel versions unless deprecated with advance notice, allowing binaries compiled years ago to execute without modification. This stability extends to user-space ABIs like glibc, where calling conventions for public symbols are frozen to avoid breaking shared library dependencies. Such guarantees contrast with less stable systems, where kernel updates might alter low-level conventions, but Linux's policy has supported decades of binary compatibility since the early 2000s.[41][42] Cross-ABI issues emerge when mixing code from libraries adhering to different conventions, such as System V and Microsoft x64 in multi-OS environments or hybrid binaries, often requiring wrappers or thunks to bridge mismatches in register usage and stack management. Thunks are small assembly stubs that adapt argument passing—for example, relocating parameters from Windows' RCX/RDX registers to System V's RDI/RSI—enabling shared libraries to interoperate without recompilation. In Windows, thunks facilitate calls between ARM64EC (extended calling convention) and x64 code, handling volatility differences and alignment, while tools like GCC's ABI attributes generate such adapters automatically for cross-convention linkage. These mechanisms are essential for dynamic loading in shared libraries, where unresolved symbols might invoke functions under varying conventions, preventing crashes from misaligned stacks or lost arguments.[43][38] Post-2010 ABI extensions have incorporated support for advanced types like SIMD vectors and 128-bit integers to leverage modern hardware features without breaking compatibility. The System V ABI supplements, updated around 2013–2015, define passing of 128-bit integers (__int128 in GCC) by splitting them across two general-purpose registers (e.g., RAX:RDX) or spilling to the stack if registers are exhausted, with similar rules for returning values. For SIMD types, such as 128-bit or 256-bit vectors (e.g., AVX), ABIs classify them as HFA (Homogeneous Floating-Point Aggregates) or memory-passed if exceeding register limits, with alignment padded to 16 or 32 bytes to match vector unit requirements; Microsoft's x64 ABI aligns these similarly but uses XMM/YMM registers for up to four 64-bit floats in HFAs. These extensions, driven by SIMD's prevalence in compute-intensive applications, ensure efficient data transfer while maintaining backward compatibility through optional feature detection.[21][44][17]Architectures
x86 (32-bit)
In 32-bit x86 architectures, calling conventions primarily rely on the stack for parameter passing, with the stack growing downward from high to low addresses and maintaining 4-byte alignment for efficiency.[7] These conventions originated in the 1980s with early compilers for the Intel 8086 processor, evolving from 16-bit segmented memory models in DOS environments to flat 32-bit models in operating systems like Windows and Unix variants to ensure compatibility and performance.[7][45] Function return values are typically placed in the EAX register for scalar types up to 32 bits, while larger or complex returns use a caller-provided pointer.[5][46] The cdecl convention, the default for C language functions on both Unix and Windows platforms, passes parameters on the stack from right to left, allowing support for variable arguments.[5] The caller is responsible for cleaning the stack after the function returns, which can lead to slightly larger executables due to repeated cleanup code but provides flexibility for variadic functions.[18] No registers are specifically reserved for parameters in cdecl, preserving the general-purpose registers like EDI, ESI, EBP, and EBX across calls.[45] The stdcall convention, similar to cdecl in parameter ordering and stack-based passing, differs by having the callee clean the stack, which is beneficial for functions with a fixed number of arguments as it avoids redundant cleanup instructions.[37] It is the standard for the Win32 API, enabling efficient calls to DLL exports by standardizing stack management and reducing code size in callers.[5] Like cdecl, it does not use dedicated registers for parameters and maintains 4-byte stack alignment.[7] The fastcall convention optimizes for speed by passing the first two 32-bit parameters in the ECX and EDX registers, with any additional parameters pushed onto the stack from right to left; the callee handles stack cleanup.[19] Microsoft and Intel variants align closely, though minor differences exist in naming conventions, such as Microsoft's use of leading "@" symbols in symbol names (e.g., @function@8).[46][45] This approach suits small, frequently called functions but does not support variable arguments.[5] The thiscall convention, specific to C++ member functions, passes the "this" pointer in the ECX register, with subsequent parameters on the stack from right to left, and the callee performing cleanup.[5] It builds on fastcall principles for the implicit parameter but is the default for non-variadic C++ methods in Microsoft compilers, ensuring compatibility with object-oriented calling patterns.[46] Return values follow the standard EAX usage, and the convention preserves key registers like EBX and EBP.[45]x86-64
The x86-64 architecture, also known as AMD64, introduced calling conventions optimized for 64-bit processing, leveraging expanded register sets to reduce stack pressure compared to 32-bit x86 designs. These conventions vary by platform, with the System V ABI predominant on Unix-like systems (Linux, macOS, BSD) and the Microsoft x64 ABI on Windows. Both prioritize register-based parameter passing for efficiency in 64-bit addressing, supporting larger memory spaces and improved performance through direct register utilization.[47][17][7] In the System V AMD64 ABI, the first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, in left-to-right order, while the first eight floating-point arguments use XMM0 through XMM7. Additional arguments beyond these limits are passed on the stack in right-to-left order, with each occupying 8 bytes and the stack maintaining 16-byte alignment. Integer return values are placed in RAX (with RDX for multi-word results), and floating-point returns use XMM0 (or XMM0 and XMM1 for larger types).[47][7] The Windows x64 calling convention passes the first four integer or pointer arguments in RCX, RDX, R8, and R9, with the first four floating-point arguments in the lower 64 bits of XMM0 through XMM3. The caller must allocate 32 bytes of "shadow space" on the stack immediately before the call for these registers, allowing the callee to spill values there without further adjustment. Subsequent arguments are pushed onto the stack in right-to-left order, 8-byte aligned. Returns follow similar patterns to System V, with scalars in RAX or XMM0, though user-defined types larger than 64 bits are returned via a caller-provided pointer in RAX.[17] The vector calling convention extends these ABIs to handle Advanced Vector Extensions (AVX) and beyond, passing up to six vector arguments (e.g.,__m128 or __m256 types) in XMM0–XMM5 or YMM0–YMM5 on Windows, and up to eight in System V using XMM0–XMM7 or YMM0–YMM7. For AVX-512, it utilizes ZMM0–ZMM7 (System V) or ZMM0–ZMM3 (Windows), with stack alignment increased to 32 bytes for 256-bit vectors and 64 bytes for 512-bit. Homogeneous vector aggregates with four or fewer elements are passed in consecutive vector registers, while larger ones use references to avoid excessive register pressure. This convention, introduced in 2013 for Microsoft compilers, enhances performance in vector-intensive applications like graphics by minimizing stack spills.[48][7]
Both conventions enforce 16-byte stack alignment at the point of function calls, with the stack pointer (RSP) adjusted to maintain this invariant outside of function prologs and epilogs. The System V ABI includes a 128-byte "red zone" below RSP, usable by leaf functions for temporary storage without explicit stack allocation, further optimizing for 64-bit code execution. These features support seamless 64-bit addressing, enabling position-independent code and efficient handling of large address spaces.[47][7]
The x86-64 calling conventions emerged with the AMD64 architecture's release in 2003, initially specified in AMD's programmer manuals and refined through platform-specific ABIs to capitalize on the extended register file (16 general-purpose registers) and 64-bit operations for reduced overhead in function calls.[49][7]