Fact-checked by Grok 2 weeks ago

Calling convention

In computer science and computer architecture, a calling convention is a standardized set of rules that governs how one piece of code transfers control to a subroutine or function, passes arguments to it, handles the return of values, and manages the program's stack and registers to ensure proper execution and data integrity.^[1]^[2]^[3] These conventions specify critical details such as the order and mechanism for passing parameters—typically prioritizing registers for the first few arguments (e.g., up to six or eight, depending on the architecture) before spilling excess to the stack—and define how return values are placed, often in specific registers like %rax in x86-64 or x10 in RISC-V.^[1]^[2] They also delineate register preservation rules, distinguishing between caller-saved registers (which the calling code must preserve if needed) and callee-saved registers (which the called function must restore), to prevent unintended data corruption across function boundaries.^[1]^[4] Stack management is another core aspect, including alignment requirements (e.g., 16-byte boundaries), the direction of stack growth (usually downward), and responsibility for cleanup—such as whether the caller or callee adjusts the stack pointer after the call.^[1]^[2] Calling conventions are architecture-specific and often influenced by operating systems and application binary interfaces (ABIs), with prominent examples including the System V AMD64 ABI for Linux on x86-64, the Microsoft x64 convention for Windows, and the ARM Procedure Call Standard for ARM processors.^[1]^[5]^[4] Their primary purpose is to enable interoperability, allowing code compiled by different compilers or written in various languages to link and execute correctly, while optimizing for performance by minimizing unnecessary memory accesses.^[1]^[2] Violations of these conventions can lead to runtime errors, such as stack overflows or incorrect parameter values, making adherence essential in low-level programming like assembly and systems development.^[6]^[3]

Introduction

Definition and Purpose

A calling convention is a standardized set of rules that specifies how subroutines or functions receive parameters from their callers, manage the stack and registers, and return values to ensure interoperability between separately compiled modules of code.^[7]^[8] These conventions define the interface for function invocation at the machine code level, governing aspects such as argument passing locations and the preservation of caller state.^[1] The primary purpose of calling conventions is to facilitate modular programming by providing a consistent mechanism for procedure linkage across different compilers, assemblers, and tools, thereby preventing runtime errors such as stack corruption or incorrect parameter access.^[7] They are essential for generating correct machine code that operates reliably in mixed-language environments or with external libraries.^[8] By standardizing these low-level interactions, calling conventions enable binary compatibility, allowing object files from disparate sources to link successfully without recompilation.^[7] Calling conventions emerged in the 1970s alongside the rise of structured programming and early compilers, particularly with architectures like the PDP-11, where initial subprogram calling rules were formalized to support efficient subroutine invocation and reentrancy.^[9] Over time, they evolved to accommodate optimizations, portability across platforms, and advanced features like dynamic linking. Key benefits include enhanced debugging through predictable stack frame layouts, support for profiling and disassembly tools, and the integration of inline assembly within high-level languages.^[7] The Application Binary Interface (ABI) defines the low-level conventions for how compiled programs interact with the operating system, libraries, and other binaries on a specific architecture, encompassing aspects such as calling conventions, data type representations, and memory alignments. Calling conventions form a key subset of the ABI, dictating how function arguments are passed (e.g., via registers or stack) and results are returned, while the broader ABI also specifies the sizes, alignments, and layouts of fundamental data types like integers and structures to ensure binary compatibility across modules. For instance, in the System V ABI for x86-64, integers are typically 32 or 64 bits with natural alignment, whereas floating-point types may require specific padding to match hardware expectations.^[10]^[11] In contrast, the Application Programming Interface (API) operates at a higher level, providing a source-code interface for developers to interact with software modules, libraries, or systems without concern for underlying binary details. While an API might define function signatures and behaviors in languages like C or Java, it does not specify binary-level details such as register usage or object layouts, which are instead governed by the ABI to enable linking of independently compiled code. This distinction ensures that source-code portability via APIs does not imply binary compatibility, as changes in the ABI (e.g., due to compiler updates) can break executables even if the API remains unchanged.^[12]^[13] The Procedure Linkage Table (PLT) and Global Offset Table (GOT) are critical components in dynamic linking for formats like ELF, enabling runtime resolution of external function and data addresses in shared libraries. The PLT serves as a trampoline for indirect function calls, initially redirecting to a dynamic linker stub that populates the GOT with the actual addresses upon first invocation, thus supporting lazy binding and position-independent code. These mechanisms depend on the platform's calling conventions to correctly pass control and parameters during resolution; for example, in the x86-64 System V ABI, calls to PLT entries use the standard register-based parameter passing to invoke the linker resolver without disrupting the caller's stack frame.^[14]^[7] Inline assembly and Foreign Function Interfaces (FFI) rely on calling conventions to seamlessly integrate low-level assembly code with high-level languages, ensuring that function calls across language boundaries adhere to the expected ABI. In FFI, high-level languages like Rust or Haskell declare foreign functions using conventions such as extern "C" to match the C ABI, which specifies parameter passing and return mechanisms, allowing safe invocation of C libraries while wrapping calls in unsafe blocks to handle potential memory or threading issues. Similarly, inline assembly in compilers like GCC or LLVM requires explicit adherence to the calling convention (e.g., preserving callee-saved registers and aligning the stack) to avoid corrupting the caller's state when embedding assembly snippets directly in C code.^[15] Debugging formats such as DWARF and PDB incorporate calling convention details to facilitate stack unwinding and symbol resolution during runtime analysis or exception handling. DWARF's Call Frame Information (CFI) encodes rules derived from the ABI—such as register mappings and stack adjustments at each instruction—to enable debuggers like GDB to reconstruct call stacks by virtually restoring registers and pointers across frames. Likewise, Microsoft's PDB format embeds unwind data in the PE optional header, using ABI-specific conventions (e.g., x64 table-based unwinding) to trace execution paths and locate variables, ensuring accurate backtraces even in optimized binaries.^[16]

Types of Calling Conventions

Calling conventions can be classified by their approach to parameter handling, which determines how arguments are passed between functions. Stack-based conventions push all parameters onto the call stack, offering simplicity and compatibility across varying numbers of arguments but incurring overhead from memory accesses.^[7] Register-based conventions pass the first few arguments in CPU registers to minimize memory operations and improve speed, particularly for small argument counts, while spilling excess parameters to the stack.^[7] Hybrid conventions combine these methods, using registers for initial parameters and the stack for overflow, as seen in modern 64-bit systems where up to four integer arguments are register-passed before stack usage.^[17] Another key classification distinguishes conventions by the responsibility for stack cleanup after parameter passing. In caller-cleanup conventions, such as __cdecl, the calling function adjusts the stack pointer to remove arguments post-call, enabling variable-argument functions like printf but adding code size due to repeated cleanup instructions.^[18]^[7] Conversely, callee-cleanup conventions, exemplified by __stdcall, require the called function to clean the stack using a fixed argument count known at compile time, which reduces caller overhead and executable size for APIs with consistent signatures, such as the Windows API.^[7] Performance-optimized variants like fastcall and vectorcall extend standard conventions by prioritizing register usage for specific argument types. Fastcall passes the first two integer parameters in registers—ECX and EDX on x86—to accelerate short calls, with remaining arguments on the stack, though its benefits diminish in 64-bit environments where registers are more abundant.^[19]^[7] Vectorcall builds on fastcall by incorporating SSE/AVX registers (e.g., XMM0–XMM5 or YMM0–YMM5) for up to six vector arguments on x86 or four on x64, reducing stack pressure for SIMD-heavy code and enabling by-value passing of aggregates larger than eight bytes, which enhances throughput in vectorized computations.^[20] Conventions also vary in support for position-independent code (PIC), which allows relocatable binaries without address fixes at load time. Position-dependent conventions assume fixed load addresses, simplifying direct jumps but limiting shared libraries. Position-independent variants, as in the System V ABI, employ IP-relative addressing or a Global Offset Table (GOT) accessed via a dedicated register like %r15 for indirect calls, enabling dynamic relocation while maintaining calling efficiency across small (IP-relative only), medium (with large data offsets), or large (full 64-bit GOT) models.^[21] The evolution of calling conventions reflects architectural shifts from memory-constrained early systems to performance-oriented designs. Early conventions, prevalent in 16-bit x86 environments like DOS, relied heavily on stack-based models with limited registers, prioritizing simplicity for segmented memory.^[22] Over time, as RISC architectures emphasized register abundance, conventions transitioned to register-heavy approaches, such as RISC-V's use of eight argument registers (a0–a7) for integers and fa0–fa7 for floats, reducing latency and aligning with compiler optimizations in 64-bit and embedded systems.^[23]^[7]

Core Elements

Parameter Passing Methods

Parameter passing methods in calling conventions define how arguments are transferred from the caller to the callee, ensuring interoperability between functions compiled by different tools. These methods balance efficiency, safety, and flexibility, with choices influencing runtime performance and memory usage. Common techniques include passing data by value, by reference, or specialized variants for output parameters, each implemented via registers or the stack as per the convention's rules.^[24] In pass-by-value, the caller copies the argument's value into a location accessible to the callee, such as a register or stack slot, preventing modifications to the original data. This approach is straightforward and isolates the function's effects but can be inefficient for large data types due to copying overhead. For instance, in languages like C, primitive types like integers are typically passed by value.^[24]^[17] Pass-by-reference, also known as pass-by-pointer, involves the caller providing the memory address of the argument, allowing the callee to access and potentially modify the original data without copying. This method is more efficient for large or complex structures, as only the address (often fitting in a register) is transferred, though it introduces risks like unintended aliasing where multiple parameters refer to the same memory. It is prevalent in C-like languages for modifiable parameters.^[24] For output parameters, pass-by-result and pass-by-copy-restore provide mechanisms to return modified values to the caller. In pass-by-result, the callee writes to a caller-allocated location without initial copying, suitable for pure outputs but requiring careful memory management to avoid uninitialized access. Pass-by-copy-restore, or value-result, copies the argument into the callee's scope at entry, allows modifications, and copies the result back upon exit; this handles inputs that may be overwritten but incurs double-copy overhead and can lead to order-dependent behavior in multi-parameter calls. These variants appear in languages like Ada for in-out parameters.^[24]^[25] When passing structures or aggregates, conventions distinguish between small and large types to optimize transfer. Small structs, often those fitting within one or two registers (e.g., up to 16 bytes in some standards), are passed by value directly in registers for speed, while larger ones are passed by reference or on the stack to avoid excessive copying. Alignment rules typically require padding to natural boundaries (e.g., 8-byte), adding minor memory overhead but ensuring efficient access. For example, in standard ABIs, a 4-byte struct might occupy a full 8-byte register slot.^[17] Variadic arguments, as in C's ellipsis notation (...), follow hybrid rules where fixed parameters use the standard method (registers first), but additional arguments spill to the stack regardless of size, enabling flexible but less optimized handling. The callee accesses them via macros like va_start, which rely on the stack layout for enumeration, often with alignment to word size. This accommodates variable counts but complicates optimization due to unknown types.^[26]^[17] Trade-offs in these methods center on speed versus flexibility: register-based passing (for values or pointers) minimizes latency compared to stack pushes, which involve memory operations, but registers are limited, forcing spills for many arguments. By-value suits small, immutable data to avoid aliasing risks, while by-reference or copy variants enable modifications at the cost of potential overhead—e.g., copying a large struct by value might double memory traffic. Alignment padding, though small (typically 0-7 bytes per parameter), accumulates in calls with aggregates, influencing overall efficiency. Overall, conventions prioritize register use for the first few parameters to leverage CPU speed, referencing stack management only when necessary.^[24]

Stack and Register Usage

In calling conventions, the stack frame represents the memory allocation on the call stack for a function invocation, typically including the return address, saved registers, local variables, and sometimes parameters if not fully passed in registers. The return address stores the location to resume execution in the caller after the function completes, while saved registers preserve the caller's state for callee-saved registers. Local variables occupy space allocated dynamically based on the function's needs, and parameters may spill onto the stack beyond the initial register allocation. This layout ensures proper isolation between function invocations and facilitates stack unwinding for debugging or exception handling.^[27] A typical stack frame layout, growing downward from higher to lower memory addresses, can be represented as follows:

Higher addresses
+-------------------+
| Caller parameters |  (including spills if exceeding registers)
+-------------------+
| [Return address](/page/Return_address)    |  <- Pushed by CALL [instruction](/page/Instruction)
+-------------------+
| Saved registers   |  <- e.g., frame pointer (optional)
+-------------------+
| Local variables   |  <- Allocated in [prologue](/page/Prologue)
+-------------------+
| Spill area / Temp |  <- For additional data if needed
+-------------------+  <- %rsp ([stack](/page/Stack) pointer)
Lower addresses
Higher addresses
+-------------------+
| Caller parameters |  (including spills if exceeding registers)
+-------------------+
| [Return address](/page/Return_address)    |  <- Pushed by CALL [instruction](/page/Instruction)
+-------------------+
| Saved registers   |  <- e.g., frame pointer (optional)
+-------------------+
| Local variables   |  <- Allocated in [prologue](/page/Prologue)
+-------------------+
| Spill area / Temp |  <- For additional data if needed
+-------------------+  <- %rsp ([stack](/page/Stack) pointer)
Lower addresses

This structure varies slightly by platform but maintains the core components for runtime management. For instance, in the PowerPC ELF ABI, the stack frame header includes a back chain to the previous frame, condition register save area, link register (return address) save area, and additional metadata like TOC save area.^[28] The function prologue establishes the stack frame at entry, typically by saving the previous frame pointer, setting the current frame pointer, and allocating space for locals, while the epilogue reverses these steps at exit to restore the stack and return control. A common sequence in x86 conventions involves instructions like push ebp to save the caller's base pointer, mov ebp, esp to establish the new frame pointer, and sub esp, N to allocate local space in the prologue; the epilogue then uses mov esp, ebp, pop ebp, and ret to deallocate and return. These operations ensure the stack pointer is adjusted correctly and registers are preserved, with multiple epilogues possible for functions with early exits.^[29]^[7] Register preservation rules divide registers into caller-saved (volatile) and callee-saved (non-volatile) categories to balance efficiency and reliability across function calls. Caller-saved registers, used for temporaries like intermediate computations, may be freely modified by the callee without restoration, placing the burden on the caller to save them if needed before the call. In contrast, callee-saved registers hold longer-lived values and must be preserved by the callee, which saves them in the prologue (often on the stack) and restores them in the epilogue. This convention minimizes overhead: callers avoid saving registers they won't use post-call, while callees only save those they modify. For example, in the x64 Microsoft convention, registers like RAX and RCX are caller-saved, while RBX and R12-R15 are callee-saved.^[17]^[7] Some conventions incorporate a red zone, an unwritten area immediately below the stack pointer reserved for fast temporary storage without explicit allocation, enabling optimizations by avoiding prologue adjustments for small locals. In the x86-64 System V ABI, this zone spans 128 bytes and remains untouched by asynchronous events like signals, allowing leaf functions to use it directly for efficiency. However, its use is platform-specific and requires compiler awareness to prevent conflicts with interrupt handlers.^[27] Stack pointer alignment requirements ensure optimal performance for vector operations and SIMD instructions, typically mandating that the stack pointer be aligned to a 16-byte boundary upon function entry and maintained throughout, except in prologues/epilogues or leaf routines. Misalignment can incur hardware penalties, such as additional cycles for unaligned memory access, so conventions enforce this via prologue adjustments (e.g., subtracting multiples of 16 from the stack pointer). In 64-bit systems like x86-64, this 16-byte rule supports efficient use of 128-bit XMM registers, with higher alignments (e.g., 32 bytes) for 256-bit operations in some extensions.^[27]^[17]^[7] To mitigate stack overflows that could corrupt frames, many conventions integrate overflow protection mechanisms like canaries (or security cookies), which insert a random value between locals and sensitive components such as the return address. The prologue places this canary on the stack, and the epilogue verifies it before return; a mismatch triggers termination to prevent exploitation. Enabled via compiler flags like Microsoft's /GS or GCC's -fstack-protector, this adds minimal runtime overhead while protecting against buffer overruns in vulnerable functions.^[30]

Function Return Mechanisms

In calling conventions, the return address is typically managed by the caller pushing it onto the stack immediately before transferring control to the callee via a jump instruction. Upon completion, the callee restores control to the caller by popping this address from the stack and branching to it, ensuring resumption of execution at the instruction following the original call. This mechanism maintains the integrity of the program's control flow across function boundaries.^[21] Scalar return values, such as integers or small floating-point numbers that fit within a single register or pair of registers, are conventionally returned in designated registers to minimize overhead and enable efficient access by the caller. For instance, integer scalars are placed in general-purpose registers, while floating-point scalars use dedicated floating-point or vector registers. This approach leverages hardware efficiency for common data types.^[17]^[21] For larger or composite return values, such as structures exceeding a certain size threshold (often 128 bits or more), calling conventions require the caller to allocate memory in advance and pass a hidden pointer to this location as an implicit parameter. The callee then writes the return value directly to this caller-provided memory, and may return the pointer itself in a register to inform the caller of the location. This "pass-by-reference" strategy for returns avoids the inefficiency of copying large data on the stack or in registers. Void-returning functions, by contrast, transfer control back to the caller without producing any value, simply executing the return mechanism to pop the address and branch.^[7]^[21] Exception handling integrates with calling conventions through defined stack unwinding procedures, where runtime systems use frame information (such as DWARF tables or equivalent ABI-specified metadata) to traverse the call stack, invoke destructors for local objects, and propagate exceptions to appropriate handlers. This ensures that resources are properly cleaned up during error conditions without violating the convention's stack discipline. Tail call optimization further enhances efficiency by allowing a function's final call to another function to reuse the current stack frame, effectively replacing the return with a direct jump, provided the callee adheres to the same convention and no conflicting operations (like large returns or exception setup) are required; this is particularly useful for recursive algorithms to prevent stack overflow.^[31]^[32]^[33]

Platform Variations

Multiple Conventions per Platform

Platforms support multiple calling conventions to accommodate legacy software, optimize for specific use cases such as operating system kernels versus user-space applications, and facilitate interoperability between different programming languages and libraries.^[34] Legacy support is particularly crucial on Windows, where older applications developed under 16-bit and early 32-bit environments rely on conventions like __cdecl for compatibility with existing binaries and dynamic link libraries (DLLs).^[5] Optimization differences arise because conventions vary in stack management and register usage; for instance, __stdcall reduces overhead in frequent calls by having the callee clean the stack, making it suitable for API functions, while __fastcall prioritizes speed by passing initial arguments in registers like ECX and EDX on x86. Language interoperability is enhanced by allowing conventions tailored to language runtimes, such as those bridging C++ and COM components on Windows.^[35] On Windows x86 (32-bit), the platform supports several conventions including __cdecl (default for C/C++), __stdcall (used for Win32 API calls), and __fastcall (for performance-critical functions with few arguments).^[5] In contrast, Linux primarily adheres to the System V ABI, which defines a single dominant convention for user-space applications—passing up to six integer or pointer arguments in registers (RDI, RSI, RDX, RCX, R8, R9 on x86-64)—but includes alternatives for system calls and legacy 32-bit code using __cdecl-like behavior.^[21] These alternatives on Linux often emerge in cross-compilation scenarios or when interfacing with Windows binaries via tools like Wine, where emulation layers handle convention translations.^[36] Developers select conventions explicitly in code using compiler-specific pragmas or attributes to ensure compatibility. In Microsoft Visual C++ (MSVC), attributes like __stdcall or __fastcall are appended to function declarations, such as int __stdcall MyFunction(int a);, directing the compiler to generate code adhering to the specified stack and register rules.^[37] GCC and Clang provide similar functionality through the stdcall or fastcall attributes, or via command-line flags like -mregparm=3 for register-based passing, allowing fine-grained control within the same binary.^[38] This selection mechanism enables mixing conventions in a single program, such as using __stdcall for Windows API interactions while defaulting to System V for internal functions on Linux. Interoperability challenges arise when conventions mismatch between caller and callee, often leading to stack corruption, incorrect parameter values, or application crashes due to improper stack pointer adjustments.^[35] For example, calling a __stdcall function with __cdecl code results in the caller attempting to clean the stack, leaving residual data that corrupts subsequent operations.^[39] Reverse engineering tools like disassemblers (e.g., IDA Pro or Ghidra) aid in detection by analyzing prologue/epilogue code patterns, such as the presence of RET n instructions indicating fixed stack cleanup sizes in __stdcall versus variable cleanup in __cdecl.^[7] The evolution of calling conventions on platforms reflects a shift from rigid, architecture-fixed standards in early systems to selectable, flexible models for modern embedded and desktop environments. Early 16-bit x86 systems under MS-DOS enforced conventions tied to segment registers for simplicity, but 32-bit transitions introduced multiplicity to support diverse APIs and reduce migration friction.^[34] By the x86-64 era, platforms like Windows consolidated to a single fastcall-like convention for efficiency, while Linux's System V ABI emphasized register usage to minimize stack pressure, yet retained options for legacy via compiler flags.^[7] This progression prioritizes backward compatibility in desktop OSes alongside optimization in resource-constrained embedded systems, where selectable conventions allow tailoring to hardware constraints like limited registers.^[21]

ABI Integration

Calling conventions are integral components of Application Binary Interfaces (ABIs), which define the low-level interface between applications and the operating system, including how data is laid out in memory, how system calls are invoked, and how dynamic linking occurs to resolve symbols at runtime. In ABIs, calling conventions specify the mechanics of function invocation, such as register allocation and stack frame construction, ensuring that compiled code from different compilers or languages can interoperate seamlessly within the same ecosystem. For instance, data layout rules in an ABI dictate padding and alignment for structures passed via calling conventions, preventing misalignment faults on hardware, while system calls often adhere to the platform's calling convention to maintain consistency between user-space and kernel interactions. Dynamic linking relies on these conventions to correctly marshal arguments when loading shared objects, as mismatches can lead to runtime errors like stack corruption.^[21] Prominent platform-specific ABIs incorporate calling conventions with tailored rules for alignment and padding to optimize performance and hardware compatibility. The System V ABI, widely used in Unix-like systems such as Linux and BSD, mandates 16-byte stack alignment before function calls and classifies parameters into integer, floating-point, or memory categories, with padding added to structures for natural alignment (e.g., up to 16 bytes for vector types). In contrast, the Microsoft x64 ABI for Windows employs a similar 16-byte alignment but introduces "shadow space" (32 bytes reserved on the stack for the callee) and passes the first four integer parameters in RCX, RDX, R8, and R9, with floating-point arguments in XMM0–XMM3; padding for structures follows a 8-byte maximum alignment in most cases, differing from System V's vector-aware rules. These differences arise from historical OS choices: System V emphasizes Unix portability, while Microsoft's design prioritizes integration with its runtime libraries, affecting how padding is inserted for aggregate types in parameter passing.^[21]^[40]^[7] ABI versioning ensures long-term compatibility by providing stability guarantees, particularly for critical interfaces like system calls, to prevent disruptions from evolving calling conventions. In Linux, the kernel commits to ABI stability for the syscall interface, meaning existing syscall numbers, argument layouts, and return conventions remain unchanged across kernel versions unless deprecated with advance notice, allowing binaries compiled years ago to execute without modification. This stability extends to user-space ABIs like glibc, where calling conventions for public symbols are frozen to avoid breaking shared library dependencies. Such guarantees contrast with less stable systems, where kernel updates might alter low-level conventions, but Linux's policy has supported decades of binary compatibility since the early 2000s.^[41]^[42] Cross-ABI issues emerge when mixing code from libraries adhering to different conventions, such as System V and Microsoft x64 in multi-OS environments or hybrid binaries, often requiring wrappers or thunks to bridge mismatches in register usage and stack management. Thunks are small assembly stubs that adapt argument passing—for example, relocating parameters from Windows' RCX/RDX registers to System V's RDI/RSI—enabling shared libraries to interoperate without recompilation. In Windows, thunks facilitate calls between ARM64EC (extended calling convention) and x64 code, handling volatility differences and alignment, while tools like GCC's ABI attributes generate such adapters automatically for cross-convention linkage. These mechanisms are essential for dynamic loading in shared libraries, where unresolved symbols might invoke functions under varying conventions, preventing crashes from misaligned stacks or lost arguments.^[43]^[38] Post-2010 ABI extensions have incorporated support for advanced types like SIMD vectors and 128-bit integers to leverage modern hardware features without breaking compatibility. The System V ABI supplements, updated around 2013–2015, define passing of 128-bit integers (__int128 in GCC) by splitting them across two general-purpose registers (e.g., RAX:RDX) or spilling to the stack if registers are exhausted, with similar rules for returning values. For SIMD types, such as 128-bit or 256-bit vectors (e.g., AVX), ABIs classify them as HFA (Homogeneous Floating-Point Aggregates) or memory-passed if exceeding register limits, with alignment padded to 16 or 32 bytes to match vector unit requirements; Microsoft's x64 ABI aligns these similarly but uses XMM/YMM registers for up to four 64-bit floats in HFAs. These extensions, driven by SIMD's prevalence in compute-intensive applications, ensure efficient data transfer while maintaining backward compatibility through optional feature detection.^[21]^[44]^[17]

Architectures

x86 (32-bit)

In 32-bit x86 architectures, calling conventions primarily rely on the stack for parameter passing, with the stack growing downward from high to low addresses and maintaining 4-byte alignment for efficiency.^[7] These conventions originated in the 1980s with early compilers for the Intel 8086 processor, evolving from 16-bit segmented memory models in DOS environments to flat 32-bit models in operating systems like Windows and Unix variants to ensure compatibility and performance.^[7]^[45] Function return values are typically placed in the EAX register for scalar types up to 32 bits, while larger or complex returns use a caller-provided pointer.^[5]^[46] The cdecl convention, the default for C language functions on both Unix and Windows platforms, passes parameters on the stack from right to left, allowing support for variable arguments.^[5] The caller is responsible for cleaning the stack after the function returns, which can lead to slightly larger executables due to repeated cleanup code but provides flexibility for variadic functions.^[18] No registers are specifically reserved for parameters in cdecl, preserving the general-purpose registers like EDI, ESI, EBP, and EBX across calls.^[45] The stdcall convention, similar to cdecl in parameter ordering and stack-based passing, differs by having the callee clean the stack, which is beneficial for functions with a fixed number of arguments as it avoids redundant cleanup instructions.^[37] It is the standard for the Win32 API, enabling efficient calls to DLL exports by standardizing stack management and reducing code size in callers.^[5] Like cdecl, it does not use dedicated registers for parameters and maintains 4-byte stack alignment.^[7] The fastcall convention optimizes for speed by passing the first two 32-bit parameters in the ECX and EDX registers, with any additional parameters pushed onto the stack from right to left; the callee handles stack cleanup.^[19] Microsoft and Intel variants align closely, though minor differences exist in naming conventions, such as Microsoft's use of leading "@" symbols in symbol names (e.g., @function@8).^[46]^[45] This approach suits small, frequently called functions but does not support variable arguments.^[5] The thiscall convention, specific to C++ member functions, passes the "this" pointer in the ECX register, with subsequent parameters on the stack from right to left, and the callee performing cleanup.^[5] It builds on fastcall principles for the implicit parameter but is the default for non-variadic C++ methods in Microsoft compilers, ensuring compatibility with object-oriented calling patterns.^[46] Return values follow the standard EAX usage, and the convention preserves key registers like EBX and EBP.^[45]

x86-64

The x86-64 architecture, also known as AMD64, introduced calling conventions optimized for 64-bit processing, leveraging expanded register sets to reduce stack pressure compared to 32-bit x86 designs. These conventions vary by platform, with the System V ABI predominant on Unix-like systems (Linux, macOS, BSD) and the Microsoft x64 ABI on Windows. Both prioritize register-based parameter passing for efficiency in 64-bit addressing, supporting larger memory spaces and improved performance through direct register utilization.^[47]^[17]^[7] In the System V AMD64 ABI, the first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, in left-to-right order, while the first eight floating-point arguments use XMM0 through XMM7. Additional arguments beyond these limits are passed on the stack in right-to-left order, with each occupying 8 bytes and the stack maintaining 16-byte alignment. Integer return values are placed in RAX (with RDX for multi-word results), and floating-point returns use XMM0 (or XMM0 and XMM1 for larger types).^[47]^[7] The Windows x64 calling convention passes the first four integer or pointer arguments in RCX, RDX, R8, and R9, with the first four floating-point arguments in the lower 64 bits of XMM0 through XMM3. The caller must allocate 32 bytes of "shadow space" on the stack immediately before the call for these registers, allowing the callee to spill values there without further adjustment. Subsequent arguments are pushed onto the stack in right-to-left order, 8-byte aligned. Returns follow similar patterns to System V, with scalars in RAX or XMM0, though user-defined types larger than 64 bits are returned via a caller-provided pointer in RAX.^[17] The vector calling convention extends these ABIs to handle Advanced Vector Extensions (AVX) and beyond, passing up to six vector arguments (e.g., __m128 or __m256 types) in XMM0–XMM5 or YMM0–YMM5 on Windows, and up to eight in System V using XMM0–XMM7 or YMM0–YMM7. For AVX-512, it utilizes ZMM0–ZMM7 (System V) or ZMM0–ZMM3 (Windows), with stack alignment increased to 32 bytes for 256-bit vectors and 64 bytes for 512-bit. Homogeneous vector aggregates with four or fewer elements are passed in consecutive vector registers, while larger ones use references to avoid excessive register pressure. This convention, introduced in 2013 for Microsoft compilers, enhances performance in vector-intensive applications like graphics by minimizing stack spills.^[48]^[7] Both conventions enforce 16-byte stack alignment at the point of function calls, with the stack pointer (RSP) adjusted to maintain this invariant outside of function prologs and epilogs. The System V ABI includes a 128-byte "red zone" below RSP, usable by leaf functions for temporary storage without explicit stack allocation, further optimizing for 64-bit code execution. These features support seamless 64-bit addressing, enabling position-independent code and efficient handling of large address spaces.^[47]^[7] The x86-64 calling conventions emerged with the AMD64 architecture's release in 2003, initially specified in AMD's programmer manuals and refined through platform-specific ABIs to capitalize on the extended register file (16 general-purpose registers) and 64-bit operations for reduced overhead in function calls.^[49]^[7]

ARM (A32 and A64)

The ARM architecture employs distinct calling conventions for its 32-bit (A32, including Thumb) and 64-bit (A64) instruction sets, as defined by the Procedure Call Standard (PCS) family, which ensures interoperability between separately compiled subroutines.^[50]^[51] These conventions prioritize register-based parameter passing to leverage the reduced instruction set computing (RISC) design, minimizing stack usage for efficiency in embedded and mobile systems.^[52] The standards support both integer and floating-point operations, with variants for coprocessor extensions like VFP. In the 32-bit A32 mode, the AAPCS specifies that the first four integer or pointer arguments are passed in registers R0 through R3, with any additional arguments placed on the stack in a full-descending manner, maintaining word alignment (SP modulo 4 equals 0) and double-word alignment at public interfaces (SP modulo 8 equals 0).^[50] Return values up to word size (e.g., int or float) are placed in R0, while double-word values use R0 and R1, and larger composite types are returned via a memory location whose address is passed in R0.^[50] The return address is stored in the Link Register (LR, R14), which the callee must preserve or restore if needed, and the stack pointer (SP, R13) remains unchanged across the call except for local frame allocation.^[50] For the 64-bit A64 mode, the AAPCS64 extends this model to accommodate wider registers, passing the first eight integer or pointer arguments in X0 through X7, with excess arguments on a 16-byte aligned stack.^[51] Vector and floating-point arguments use V0 through V7, supporting homogeneous aggregates like up to four floating-point members in consecutive V registers.^[51] Returns follow a similar pattern, with integer or single-precision values in X0 and floating-point or vector results in V0, while larger types (e.g., exceeding one register) are returned indirectly via a hidden pointer in X8 to a callee-allocated memory block.^[51] The stack must maintain 16-byte alignment throughout public interfaces, and the frame pointer (FP, X29) and link register (LR, X30) aid in unwinding.^[51] A variant of the AAPCS incorporates the Vector Floating-Point (VFP) coprocessor, where the first four single-precision floating-point arguments are passed in S0 through S3 (or D0 through D1 for doubles), with subsequent values on the stack or in higher VFP registers; results follow the same registers for return.^[50]^[53] This soft-float alternative uses integer registers for all parameters, ensuring compatibility without hardware floating-point support.^[50] In Thumb mode, which uses 16-bit instructions for code density, the calling convention aligns closely with A32 but imposes restrictions on high register access (e.g., limited to R0-R7, SP, LR, PC without extensions), requiring interworking via the BX instruction for state switches and careful stack limit checks using simplified instructions like ADD/CMP for frames under 256 bytes.^[54] These adjustments ensure symmetric support between Thumb and A32 states without altering core parameter passing rules.^[54] The Procedure Call Standard for ARM originated in the 1990s as the ARM Procedure Call Standard (APCS) and Thumb Procedure Call Standard (TPCS) to facilitate subroutine linkage, with the modern AAPCS representing its fifth major revision, first publicly released in 2003 and updated through 2009 for 32-bit enhancements like VFPv3 support.^[55]^[50] For 64-bit, the AAPCS64 was introduced in the early 2010s to align with AArch64's expanded register file and alignment needs.^[51]

RISC-V

The RISC-V calling convention, specified in the RISC-V Procedure Call Standard for the application processor binary interface (psABI), leverages the instruction set architecture's (ISA) modular nature to support base integer operations alongside optional extensions for floating-point and vector processing. This design enables flexible implementations across embedded devices and high-performance systems, prioritizing simplicity and extensibility without mandating hardware features beyond the core ISA. The convention distinguishes between caller-saved (temporary) and callee-saved registers to minimize overhead in function calls, while ensuring compatibility with standard toolchains like GCC and LLVM.^[56] In the integer-focused ABI for RV32I and RV64I, the first eight arguments are passed in general-purpose registers a0–a7 (corresponding to x10–x17), with any excess arguments placed on the stack. Temporary registers t0–t6 (x5–x7 and x28–x31) are caller-saved, allowing the caller to use them freely but requiring restoration if needed across calls, while callee-saved registers s0–s11 (x8–x9 and x18–x27) must be preserved by the callee. Function return values are placed in a0 (and a1 for composite types up to two words in size). When the floating-point extensions (F for single-precision or D for double-precision) are enabled, up to eight floating-point arguments are passed in fa0–fa7, with returns in fa0 (and fa1 if necessary); in soft-float mode without hardware support, these fall back to integer registers. The stack grows downward and maintains 16-byte alignment at the point of each call, with no defined red zone for speculative access; the ABI also accommodates the compressed instructions extension (RVC) to reduce code size in resource-constrained environments.^[56] The psABI version 1.0 was ratified in November 2022, building on earlier drafts to standardize conventions for operating systems including Linux (with an appendix detailing ELF-specific rules) and Windows, ensuring binary portability across compliant implementations. RISC-V's emphasis on embeddability is reflected in the convention's lightweight register allocation and avoidance of complex shadow spaces, making it suitable for real-time and low-power applications. For the ratified vector extension (RVV 1.0), an optional variant calling convention introduces 32 vector registers v0–v31 for passing SIMD arguments and returns, where v0 handles masks, v8–v23 manage data and tuples scaled by LMUL (register group multiplier) and NFIELDS (tuple fields), and v1–v7 plus v24–v31 are callee-saved; functions using this must be explicitly marked, with large vectors passed by reference if register capacity is exceeded.^[56]^[57]^[58]

PowerPC and Power ISA

The PowerPC architecture, introduced in 1991 through the AIM alliance of Apple, IBM, and Motorola, established a reduced instruction set computing (RISC) design optimized for embedded systems, personal computers, and servers, with its calling conventions defined in the Embedded Application Binary Interface (EABI) supplement to the System V ABI.^[59] These conventions specify that integer and pointer parameters are passed in general-purpose registers (GPRs) r3 through r10, accommodating up to eight 32-bit arguments before spilling to the stack; floating-point parameters use floating-point registers (FPRs) f1 through f8 for up to eight single- or double-precision values.^[60] Return values for scalar integers and pointers are placed in r3, while floating-point results occupy f1, and the link register (LR) holds the return address, which the caller saves if necessary to preserve it across subroutine calls.^[60] This register-based approach minimizes stack usage for small functions, aligning with the architecture's emphasis on efficient branch and load/store operations. In 64-bit mode, as extended in the PowerPC ELF ABI, the conventions build on the 32-bit foundation but accommodate larger data sizes and additional parameters through stack overflow mechanisms, with the stack pointer (r1) maintaining 16-byte (quadword) alignment to support atomic operations and SIMD extensions.^[61] Parameters beyond the eight GPR slots (r3-r10 for 64-bit integers/pointers) or eight FPR slots (f1-f8 for floating-point) are stored in a parameter save area on the stack starting at offset 48 from the current stack pointer, ensuring doubleword alignment for subsequent arguments.^[61] Return mechanisms remain similar, with 64-bit scalars in r3 or f1, though aggregates larger than a doubleword are returned via a caller-allocated buffer whose address is passed in r3.^[61] The link register continues to manage returns, saved at stack offset 16 in the callee's frame. Variants of these conventions exist between operating systems, notably AIX and Linux distributions on PowerPC hardware, primarily differing in Table of Contents (TOC) handling for accessing global variables and functions. On AIX, GPR r2 serves as a dedicated TOC pointer, facilitating position-independent code by anchoring global data access within a 64 KB offset, whereas Linux implementations (under the ELF ABI) integrate TOC in r2 similarly for 64-bit but employ a global offset table (GOT) approach in 32-bit mode and allow ELFv2 optimizations that reduce cross-module TOC saves/restores.^[62] Parameter passing aligns closely across both, using r3-r10 and f1-f8 (or f1-f13 for extended floating-point on AIX), but AIX handles structures and unions by passing them in GPRs or memory without FPR allocation, while Linux may pass addresses to memory copies for complex types.^[62] The architecture defaults to big-endian byte ordering for data storage and transmission, though 64-bit Power ISA implementations support bi-endian modes configurable at runtime. For vector and SIMD processing via AltiVec (also known as VMX), up to 12 vector parameters are passed in vector registers v2 through v13, with returns in v2, integrating seamlessly with the scalar conventions by reserving these 128-bit registers only when the function signature indicates vector usage.^[61] The Power ISA, evolving from the original PowerPC specification, has seen iterative updates through the 2020s—reaching version 3.1c in 2024—to enhance support for high-performance computing in servers and AI accelerators, such as improved matrix math units in Power10 processors, while preserving core calling convention stability for backward compatibility.^[59]

MIPS

The MIPS architecture, originating in the early 1980s, employs several application binary interfaces (ABIs) that define its calling conventions, primarily tailored for reduced instruction set computing (RISC) principles in embedded and high-performance systems.^[63] The most foundational is the O32 ABI, which supports 32-bit operations and has been the standard since the architecture's inception with the MIPS R2000 processor in 1985. Subsequent extensions like N32 and N64 introduced 64-bit capabilities while maintaining backward compatibility where possible. These conventions emphasize register-based parameter passing for efficiency, with the stack used for overflow, and are characterized by big-endian byte ordering as the default.^[63] In the O32 ABI, the first four integer or pointer arguments are passed in registers a0 to a3 (corresponding to $4 to $7), while floating-point arguments use f12 for the first single-precision value and f12–f13 or f14–f15 for the first double-precision pair; additional arguments beyond these are placed on the stack in right-justified, 4-byte-aligned positions.[63] Return values follow a similar pattern: 32-bit integers or pointers in v0 ($2), with

v1 (&#36;3) used for a second word if needed (e.g., for 64-bit integers or pairs); single-precision floats return in

f0, and doubles in f0–f1.^[63] Structures and unions smaller than or equal to one word are returned in v0, while larger ones require the caller to allocate space and pass its address in a0, with the callee storing the result there and returning the address in

v0.[](https://refspecs.linuxfoundation.org/elf/mipsabi.pdf) The stack grows downward from higher addresses and must be doubleword (8-byte) aligned, with each frame reserving space for at least 16 bytes of arguments plus saved registers such as &#36;16-&#36;23 (callee-saved temporaries), &#36;30 (

fp, frame pointer), and $31 ($ra, return address).^[63] Doubles and long doubles require 8-byte alignment in memory, ensuring proper access without byte swapping in the big-endian environment.^[63] The N32 and N64 ABIs extend O32 for 64-bit architectures (MIPS III and later), with N32 providing an ILP32 model (32-bit integers and pointers, 64-bit longs) and N64 using LP64 (64-bit integers, longs, and pointers).^[64] These support up to eight integer arguments in $4–$11 and eight floating-point arguments in f12–f19, regardless of mixing integer and FP types, surpassing O32's limit of four registers.^[64] Returns use v0–v1 for 64-bit integers or pairs, and f0–f2 for floating-point values up to 128 bits; larger aggregates may use both general-purpose and FP registers or stack space.^[64] A key distinction is the treatment of the global pointer $gp ($28), which is callee-saved in N32/N64 (unlike caller-saved in O32) and points to the global offset table for position-independent code access to globals and dynamic symbols.^[64] Stack slots are 64-bit (8-byte) wide, with frames aligned to quadwords (16 bytes) for improved performance on 64-bit datapaths.^[64] MicroMIPS, a compact extension to the MIPS32/64 ISAs introduced in 2010, preserves the core register usage and ABI compatibility of O32, N32, and N64 but employs 16-bit and 32-bit instruction encodings to reduce code density, particularly beneficial in memory-constrained embedded environments.^[65] This impacts function prologues and epilogues by allowing compressed instructions like LWM16 and SWM16 for loading/storing multiple registers to/from the stack, ADDIUSP for immediate stack pointer adjustments, and JALRC16 for compact jump-and-link operations, which can shrink prologue/epilogue sequences by up to 50% in size compared to standard 32-bit MIPS encodings.^[65] Entry points remain 32-bit aligned for interoperability, though future MicroMIPS-specific ABIs could relax this to 16-bit alignment; relocation types are extended to handle smaller offset fields (e.g., 7-bit or 10-bit PC-relative).^[65] These optimizations do not alter parameter passing or return mechanisms but enhance efficiency in saving/restoring ra, fp, and callee-saved registers during prologues and epilogues.^[65] MIPS calling conventions remain relevant in the 2020s for embedded applications, including networking devices like routers from vendors such as Cisco, where legacy MIPS-based systems continue to operate in resource-limited scenarios.^[66]

SPARC

The SPARC (Scalable Processor ARChitecture) calling convention, developed by Sun Microsystems in 1987 as part of its RISC design, leverages register windowing to enable efficient procedure calls by minimizing stack accesses for parameter passing and local variables. This architecture organizes the 32 general-purpose registers (in 32-bit V8) or 72 (in 64-bit V9, including additional globals) into overlapping windows, each consisting of eight input (%i), eight local (%l), and eight output (%o) registers, plus eight global registers (%g) shared across windows. The current window pointer (CWP) manages window shifts via SAVE and RESTORE instructions, allowing the caller's %o registers to become the callee's %i registers seamlessly, which supports rapid context switching compared to architectures with static register allocation like MIPS. SPARC systems are big-endian, with stack frames aligned to 8 bytes in V8 and 16 bytes in V9, and the convention was integral to Sun's Solaris operating system until its decline in the 2010s amid shifts to x86 and ARM platforms.^[67]^[68]^[69]^[70] In the SPARC V8 ABI (32-bit), the first six integer or pointer arguments are passed in %o0 through %o5, with subsequent arguments placed on the stack starting at an offset of 92 bytes (%fp + 68) from the frame pointer, ensuring 64-byte stack alignment. Floating-point arguments use dedicated %f registers if applicable, but integers dominate register usage. The return value is placed in %i0 (the callee's view of the caller's %o0 after SAVE), while structures or larger types may use %i0 and %i1 or spill to the stack. Caller-saved registers include %o0–%o7 and %g1–%g5, requiring the caller to preserve them if needed, whereas callee-saved registers (%i0–%i7 and %l0–%l7) must be saved by the callee before modification. Register windows enhance performance by allocating a new set on entry (via SAVE, which decrements CWP), avoiding immediate stack spills for locals and parameters unless the window count (typically 8–32) is exhausted.^[71]^[70] The SPARC V9 ABI (64-bit) extends V8 with full 64-bit registers and instructions, passing arguments identically in %o0–%o5 but treating them as 64-bit slots, with excess parameters on the stack at offsets biased by 2047 bytes (%sp + 2047) for alignment to 16 bytes and a minimum frame size of 176 bytes. Additional global registers (%g0–%g7) provide more shared state, while %g6 and %g7 are reserved for operating system use, such as in Solaris. Return values follow V8 patterns but leverage 64-bit width, with integers in %i0 and doubles in %i0–%i1; floating-point returns use %f0 or %d0 (double-precision pair). Window management remains central, with 3–32 windows (implementation-dependent via NWINDOWS), but now supports up to 160 physical registers for deeper call chains without traps. Unlike V8's 32-bit cells, V9 eliminates hidden parameter words and passes small structures in registers, improving efficiency.^[72]^[70]^[71] Window overflow and underflow are handled via dedicated traps to maintain performance: when CANSAVE reaches zero during SAVE (no free windows), a spill trap (vectors 0x80–0xBF) saves the current window's locals and inputs to the stack, updating window counters (CANSAVE, CANRESTORE, CLEANWIN, OTHERWIN) before retrying. Conversely, underflow on RESTORE (CANRESTORE = 0) triggers a fill trap (vectors 0xC0–0xFF), reloading from the stack into the prior window. These traps, managed by privileged software, ensure seamless operation, with the FLUSHW instruction optionally flushing all but the current window for debugging or context switches. This dynamic spilling contrasts with static register schemes, enabling SPARC's high-performance procedure linkage in environments like early Solaris servers.^[70]

Other Architectures

The calling convention for IBM System/360 and its successor z/Architecture, used in z/OS since the architecture's introduction in 1964, employs a stack-based approach augmented by base registers for addressing. In the standard OS linkage convention, parameters are passed via a parameter list pointed to by register R1, while R13 holds the address of the caller's save area for preserving registers R0-R15 across calls, R14 contains the return address, and R15 points to the callee's entry point. This mainframe-oriented design emphasizes reentrancy and multitasking, with the stack managed through base and displacement addressing to handle large address spaces.^[73]^[74] For the SuperH (SH) architecture, particularly the Renesas SH-4 variant prevalent in embedded systems in Japan from the 1980s through the 2010s, the calling convention passes the first four 32-bit integer parameters in registers R4 through R7, with additional arguments placed on the stack following a 16-byte reserved home space. Return values are placed in R0, and the stack pointer is restored to its entry value upon function exit, supporting efficient leaf routines in resource-constrained environments. Floating-point parameters follow similar register allocation in non-variadic calls, using FR4-FR7.^[75] The Motorola 68000 (68k) series, a cornerstone of legacy systems like the Amiga and early Macintosh from the late 1970s, primarily passes all parameters on the stack in a right-to-left order, with the stack pointer A7 managing growth from high to low addresses. However, certain conventions, including those for AmigaOS and some compilers, utilize address registers A0 and A1 as caller-saved scratch registers for the first few pointer or address parameters, while A2-A5 serve as callee-saved locals and A6 as a frame pointer. Return values, such as pointers, are typically placed in A0.^[76]^[77] In the IBM 1130, a 16-bit system from the 1960s designed for Fortran programming on punch-card-based setups, subroutines are invoked via CALL statements followed by DC (define constant) directives to specify parameters, which are passed by reference using the system's limited registers—an accumulator and three index registers (XR1-XR3)—or directly onto the stack for arithmetic and I/O operations. This early stack-oriented convention reflects the era's focus on scientific computing with minimal hardware resources.^[78]^[79] These architectures share common traits in their calling conventions, including heavy reliance on the stack for parameter passing due to relatively limited general-purpose registers compared to modern designs, which necessitates careful management of stack frames and save areas. Big-endian byte ordering predominates in many historical examples, such as System/360, 68k, and 1130, facilitating consistent multi-byte data handling in early mainframes and workstations, though exceptions like SuperH's little-endian format exist for embedded applications.^[7]

Non-Architectural Conventions

Threaded Code

Threaded code represents a specialized execution model in interpreters where the program consists primarily of addresses pointing to executable code fragments, rather than traditional machine instructions or bytecode. In this paradigm, "calls" are implemented as indirect jumps, eliminating the overhead of conventional subroutine invocation and return mechanisms. Calling conventions in threaded code thus revolve around maintaining interpreter state—particularly the instruction pointer (IP)—across these jumps, typically by passing the IP in a dedicated register rather than on the stack or via parameters. This approach contrasts with standard architectures by treating code as data and leveraging an inner interpreter loop to dispatch execution.^[80] A core element of indirect threaded code, prevalent in Forth implementations, is the NEXT routine or macro, which orchestrates the dispatch. NEXT fetches the next code-field address (CFA) from the current IP, loads the execution address from that CFA, performs the indirect jump to the subroutine (which may be a primitive or another word), and then increments the IP to point to the subsequent instruction. The IP is usually held in a processor register to minimize memory access latency during this loop, ensuring seamless progression through the thread without explicit parameter passing or stack-based returns for control flow. This setup enables highly efficient interpretation, as the overhead per instruction is limited to a single indirect branch.^[81]^[80] In Forth-like systems, the calling convention aligns with a stack-based architecture, utilizing a data stack for operands and a separate control (return) stack for nesting levels and temporary IPs during word execution. Operations do not receive traditional parameters; instead, they pop required values from the data stack, perform computations, and push results back, while the control stack manages recursion or nesting via saved IPs. This eliminates the need for fixed argument registers or stack frames, making the model lightweight for embedded and real-time applications.^[80] Just-in-time (JIT) compilers extend threaded code principles dynamically, generating threaded sequences at runtime for optimized interpretation. For instance, LuaJIT employs a direct threaded interpreter, where bytecode opcodes map directly to addresses of assembly code snippets in a dispatch table, allowing the IP to jump straight to handlers without indirection layers. The IP advances explicitly after each handler (e.g., by adding instruction size to a register-held pointer), supporting rapid execution before JIT compilation kicks in for hot paths. This dynamic convention facilitates seamless transitions between interpreted and compiled modes.^[82] The advantages of threaded code include significantly more compact representations due to storing only addresses rather than full instructions, which is crucial for memory-constrained environments like virtual machines. It also yields faster dispatch on modern hardware with branch predictors, achieving up to 5.6x speedups in benchmarks through optimizations like superinstructions that fuse common sequences. Threaded code originated in the 1970s with Charles H. Moore's development of Forth at the National Radio Astronomy Observatory, where indirect threading enabled efficient interpretation on limited hardware; it persists today in embedded scripting and VMs for its balance of density and performance.^[83]^[81]^[80]

Language-Specific Examples

In PL/I, calling conventions are designed to support multitasking environments through descriptor-based parameter passing, particularly for variable-length data types like strings and arrays. Strings declared as CHARACTER(*) are passed using descriptors that include length and address information, allowing the callee to access dynamic extents without fixed-size assumptions; this approach adds an extra descriptor argument to the call stack for each such parameter. Structures are passed by reference unless specified with the VALUE attribute, ensuring efficient handling of complex business data in multitasking scenarios where shared access is common. Separate linkage conventions govern entry points, using an 8-byte structure comprising a 4-byte descriptor pointer and a 4-byte frame pointer to manage procedure nesting and external calls, compatible with systems like IBM z/OS.^[84]^[85] Pascal and Ada employ stack frame-based calling conventions that incorporate static links to handle nested scopes, enabling inner procedures to access variables from enclosing scopes without global visibility. In Pascal, parameters are passed by value by default for simple types but by reference using the VAR keyword for modifiable aggregates, with static links—a pointer to the parent's stack frame—passed implicitly to support lexical scoping in nested functions. Ada extends this with explicit parameter modes: IN (default, by copy for scalars and by reference for large composites to optimize performance), OUT, and IN OUT, while nested subprograms use static chains or display registers to resolve enclosing scope references, preserving type safety in concurrent or recursive calls. These mechanisms ensure referential transparency and efficient scope resolution in block-structured programs.^[86]^[87] COBOL's calling conventions emphasize record-oriented data passing for business applications, particularly on IBM mainframe systems, where the LINKAGE SECTION defines receiving areas for parameters from calling programs. Parameters are typically passed BY REFERENCE (default for efficiency with large records), BY CONTENT (copying values without modification), or BY VALUE, with records like 01-level groups in the LINKAGE SECTION aligned to match the caller's USING clause, supporting fixed-length fields for decimal arithmetic and packed data. IBM z/OS variants use the GENERAL linkage convention for inter-program calls, including stored procedures, where COBOL records are mapped directly to parameter lists without additional descriptors, facilitating seamless integration with assembler or C modules in enterprise environments.^[88]^[89]^[90] In modern languages like Rust and Go, calling conventions adapt to runtime constraints and custom environments. Rust's no_std mode, used in bare-metal or embedded systems, allows developers to bypass the standard library and define custom ABIs via attributes like #[unsafe(naked)], enabling direct control over register usage and stack frames without OS dependencies, while still defaulting to platform conventions like System V for interoperability. Go implements an internal ABI with runtime-managed stack growth, where arguments are passed using a register-based convention (introduced in Go 1.17) for the first few, with excess on the stack, to support goroutine preemption and garbage collection; the runtime dynamically resizes stacks during calls, copying frames as needed to handle recursion or channel operations efficiently.^[91]^[92]^[93]^[94] For inter-language foreign function interfaces (FFI), the C ABI serves as the lingua franca, standardizing parameter passing, name mangling, and data layout across languages to enable seamless integration. Languages like Rust and Go expose C-compatible interfaces via extern "C" blocks, passing pointers and scalars by value or reference per the C convention, avoiding language-specific features like exceptions or generics during cross-boundary calls. This approach ensures portability, as seen in Rust's cbindgen tool generating C headers for FFI, or Go's cgo bridging to C libraries, prioritizing stability over performance optimizations unique to each language.^[95]^[96]

References

[1]
Assembly 2: Calling convention – CS 61 2018
A calling convention governs how functions on a particular architecture and operating system interact. This includes rules about includes how function arguments ...
[2]
[PDF] Calling Conventions - CS@Cornell
Assume a function uses two callee-save registers. How do we allocate a stack frame? How large is the stack frame? What should be stored in the stack.
[3]
Guide to x86 Assembly - Computer Science
Mar 8, 2022 · The calling convention is a protocol about how to call and return from routines. For example, given a set of calling convention rules, a ...Instructions · Arithmetic And Logic... · Calling Convention
[4]
Calling Convention
### Summary of Calling Convention Explanation
[5]
Calling Conventions | Microsoft Learn
Aug 3, 2021 · The Visual C/C++ compiler provides several different conventions for calling internal and external functions.Missing: science | Show results with:science
[6]
[PDF] Lecture Notes on Calling Conventions
Feb 21, 2023 · Strict adherence to the calling conventions is crucial so that your code can interoperate with library routines, and the environment can call.
[7]
[PDF] Calling conventions - Agner Fog
Feb 1, 2023 · does not have to know what this code means in order to fulfill purpose 1 and 2. It only needs to check if strings are identical. Different ...
[8]
Calling Conventions - CS [45]12[01] Spring 2022 - Cornell University
A calling convention is a standardized contract about how to invoke functions. Having a calling convention allows code generated by different compilers and ...Missing: computer | Show results with:computer
[9]
[PDF] 'mDmnomn INTEROFFICE MEMORANDUM
PDP-11 Subprogram. Calling Conventions. DATE: November 10, 1970. TO: PDP-11 List C. PDP-11 Master List. FROM. Hank Spencer. DEPARTMENT: Programming.Missing: history | Show results with:history<|control11|><|separator|>
[10]
Compatibility (Using the GNU Compiler Collection (GCC))
### Extracted and Summarized Content
[11]
ABI and ISA - GNU MP 4.1
ABI (Application Binary Interface) refers to the calling conventions between functions, meaning what registers are used and what sizes the various C data types ...
[12]
Chapter 1 Introduction to the API (System Interface Guide)
The terms Application Binary Interface (ABI) and System Binary Interface (SBI) indicate the binary interfaces corresponding to the respective source level ...
[13]
Native interoperability ABI support - .NET - Microsoft Learn
May 27, 2025 · The Application Binary Interface (ABI) is the interface that runtimes and operating systems use to express low-level binary details.
[14]
19.2 Controlling the Exported Symbols of Shared Libraries - GNU.org
Within a shared library, a call to a function that is a global symbol costs a “call” instruction to a code location in the so-called PLT (procedure linkage ...
[15]
FFI - The Rustonomicon - Rust Documentation
Foreign calling conventions. Most foreign code exposes a C ABI, and Rust uses the platform's C calling convention by default when calling foreign functions.Missing: inline assembly
[16]
Exception Handling in LLVM — LLVM 22.0.0git documentation
An exception handling frame eh_frame is very similar to the unwind frame used by DWARF debug info. The frame contains all the information necessary to tear ...
[17]
x64 Calling Convention | Microsoft Learn
Jul 25, 2025 · This article describes the standard processes and conventions that one function (the caller) uses to make calls into another function (the callee) in x64 code.
[18]
__cdecl | Microsoft Learn
Aug 3, 2021 · The __cdecl calling convention creates larger executables than __stdcall, because it requires each function call to include stack cleanup code.
[19]
__fastcall | Microsoft Learn
Sep 15, 2023 · The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. This calling convention ...
[20]
__vectorcall | Microsoft Learn
Oct 17, 2022 · The __vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. __vectorcall uses more ...
[21]
[PDF] System V Application Binary Interface - AMD64 Architecture ...
Jul 2, 2012 · Medium position independent code model (PIC) This model is like the previ- ous model, but similarly to the medium static model adds large ...
[22]
https://devblogs.microsoft.com/oldnewthing/20040102-00/
[23]
[PDF] Calling Convention - RISC-V International
The RISC-V calling convention passes arguments in registers when possible. Up to eight integer registers, a0–a7, and up to eight floating-point registers, fa0– ...Missing: programming | Show results with:programming<|control11|><|separator|>
[24]
6.1. Parameter-Passing Mechanisms — Programming Languages
The remaining three parameter-passing mechanisms use lazy evaluation: The arguments of a function call are passed without being evaluated to the function.
[25]
[PDF] Compiler construction - UiO
As said, call-by-value and call-by-result are the two main alternative for classic procedural, ... copy-in-copy-out or copy-restore ... The last parameter-passing ...
[26]
Calling Variadics (The GNU C Library)
You don't have to do anything special to call a variadic function. Just put the arguments (required arguments, followed by optional ones) inside parentheses, ...
[27]
[PDF] System V Application Binary Interface - x86-64
Sep 13, 2002 · is given in Figure 3.6, the stack frame offset given shows the frame before calling ... calling conventions as user-level applications (see ...Missing: components | Show results with:components
[28]
64-bit PowerPC ELF Application Binary Interface Supplement 1.9
The 64-bit PowerPC ELF ABI is intended to use the same structure layout and calling convention rules as the 64-bit PowerOpen ABI.
[29]
x64 prolog and epilog - Microsoft Learn
Oct 3, 2025 · Epilog code exists at each exit to a function. Whereas there is normally only one prolog, there can be many epilogs. Epilog code trims the stack ...
[30]
/GS (Buffer Security Check)
### Summary of Stack Overflow Protection (Canaries/Security Cookies) in Calling Conventions
[31]
Itanium C++ ABI: Exception Handling ($Revision: 1.22 $)
For example, the first phase allows an exception-handling mechanism to dismiss an exception before stack unwinding begins, which allows resumptive exception ...
[32]
Exceptions and stack unwinding in C++ - Microsoft Learn
Nov 14, 2022 · Stack unwinding example. The following example demonstrates how the stack is unwound when an exception is thrown. Execution on the thread ...
[33]
LLVM Language Reference Manual — LLVM 22.0.0git documentation
... ABI (Application Binary Interface). Tail calls can only be optimized when this, the tailcc, the GHC or the HiPE convention is used. This calling convention ...
[34]
The history of calling conventions, part 1 - The Old New Thing
Jan 2, 2004 · In the 16-bit world, part of the calling convention was fixed by the instruction set: The BP register defaults to the SS selector, whereas the ...
[35]
Unmanaged calling conventions - .NET | Microsoft Learn
Aug 22, 2023 · Mismatches in unmanaged calling conventions lead to data corruptions and fatal crashes that require low-level debugging skills to diagnose.Missing: interoperability | Show results with:interoperability
[36]
System V ABI - OSDev Wiki
This is a 64-bit platform. The stack grows downwards. Parameters to functions are passed in via the registers rdi, rsi, rdx, rcx, r8, and r9. Floating-point ...Executable and Linkable Format · Calling Convention · i386 · x86-64
[37]
__stdcall | Microsoft Learn
Feb 10, 2025 · The __stdcall calling convention is used to call Win32 API functions. The callee cleans the stack, so the compiler makes vararg functions __cdecl.
[38]
x86 Function Attributes (Using the GNU Compiler Collection (GCC))
On 32-bit and 64-bit x86 targets, you can use an ABI attribute to indicate which calling convention should be used for a function. The ms_abi attribute tells ...<|separator|>
[39]
The cost of forgetting to specify a calling convention
Sep 2, 2021 · This led to customers consuming the header file incorrectly, and passing callback function pointers that used the __cdecl calling convention ...Missing: mismatches interoperability challenges
[40]
Overview of x64 ABI conventions - Microsoft Learn
Jun 25, 2025 · For details on the x64 calling convention, including register usage, stack parameters, return values, and stack unwinding, see x64 calling ...
[41]
ABI stable symbols - The Linux Kernel documentation
Most interfaces (like syscalls) are expected to never change and always be available.
[42]
Documentation-ABI-README - The Linux Kernel Archives
Most interfaces (like syscalls) are expected to never change and always be available. testing/ This directory documents interfaces that are felt to be stable, ...
[43]
Overview of ARM64EC ABI conventions | Microsoft Learn
Oct 14, 2022 · The ARM64EC ABI follows x64 software conventions including calling convention, stack usage, and data alignment, making ARM64EC and x64 code interoperable.
[44]
[PDF] SIMD Types: ABI Considerations [N4395] - Open Standards
Apr 10, 2015 · An ABI describes machine-, operating system-, and compiler-specific choices that are not covered by a programming language standard.
[45]
The history of calling conventions, part 3 - The Old New Thing
Jan 8, 2004 · The 32-bit x86 calling conventions all preserve the EDI, ESI, EBP, and EBX registers, using the EDX:EAX pair for return values. C (__cdecl).
[46]
C/C++/SYCL Calling Conventions - Intel
Oct 31, 2024 · Calling Conventions on Windows ; __thiscall. None. Default calling convention used by C++ member functions that do not use variable arguments.
[47]
[PDF] System V Application Binary Interface - CS 61
Sep 28, 2021 · Operating systems conforming to the AMD64 ABI may provide support for executing programs that are designed to execute in these compatibility ...
[48]
Introducing 'Vector Calling Convention' - C++ Team Blog
Jul 11, 2013 · Please note, the vector calling convention is only supported for native amd64/x86 targets and further it does not apply to MSIL (/clr) target.
[49]
The history of calling conventions, part 5: amd64 - The Old New Thing
Jan 14, 2004 · The last architecture I'm going to cover in this series is the AMD64 architecture (also known as x86-64). The AMD64 takes the traditional ...Missing: platforms legacy
[50]
[PDF] Procedure Call Standard for the ARM Architecture
Oct 16, 2009 · The AAPCS embodies the fifth major revision of the APCS and third major revision of the TPCS. It forms part of the complete ABI specification ...
[51]
[PDF] Procedure Call Standard for the ARM 64-bit Architecture - c9x.me
May 22, 2013 · This ABI supports two views of memory implemented by the underlying hardware. □ In a little-endian view of memory the least significant byte of ...
[52]
Procedure Call Standard - Arm Developer
It can be used as the static base register (SB) to point to position-independent data, or as the thread register (TR) where thread-local storage is used. In ...
[53]
Overview of ARM ABI Conventions | Microsoft Learn
Integer type values are returned in r0, optionally extended to r1 for 64-bit return values. VFP/NEON floating-point or SIMD type values are returned in s0, d0, ...Vfp Registers · Parameter Passing · Stage C: Assignment Of...
[54]
[PDF] The ARM-THUMB Procedure Call Standard
Nov 5, 1998 · This document defines a family of procedure call standards for the ARM and THUMB instruction sets. Keywords procedure call, function call, ...<|separator|>
[55]
[PDF] Procedure Call Standard for the ARM Architecture - 0x04.net
May 4, 2006 · The AAPCS embodies the fifth major revision of the APCS and third major revision of the TPCS. It forms part of the complete ABI specification ...
[56]
[PDF] RISC-V ABIs Specification
This specification is written in collaboration with the development communities of the major open- source toolchain and operating system communities, ...
[57]
https://github.com/riscv-non-isa/riscv-elf-psabi-doc
[58]
Ratified Specifications - RISC-V International
The RISC-V open-standard instruction set architecture (ISA) defines the fundamental guidelines for designing and implementing RISC-V processors.
[59]
Instruction Set Architecture - OpenPOWER Foundation
May 26, 2024 · The Power ISA is a specification describing the architecture used by POWER processors, defining the instructions the processor executes.
[60]
[PDF] Power Architecture™ 32-bit Application Binary Interface Supplement ...
Implementations of this Power Architecture 32-bit Application Binary Interface Supplement should indicate which ABI software features (see Appendix A) and ...
[61]
64-bit PowerPC ELF Application Binary Interface Supplement 1.7
The stack pointer (stored in r1) shall maintain quadword alignment. It shall always point to the lowest allocated valid stack frame, and grow toward low ...Introduction · Low Level System Information · Object Files
[62]
Register usage and conventions - IBM
In Linux on PPC the address of a copy in memory is passed in the next available gpr (or in memory). The varargs parameters are specifically handled and ...
[63]
[PDF] SYSTEM V APPLICATION BINARY INTERFACE - Linux Foundation
Frames are allocated dynamically on the program stack, depending on program execution. The architecture, standard calling sequence, and stack frame support.
[64]
[PDF] MIPSproTM N32 ABI Handbook
Calling Convention Implementations. This chapter describes the differences between o32, n32, and n64 ABIs with respect to calling convention implementations.Missing: specification | Show results with:specification
[65]
[PDF] MIPS32® Architecture Reference Manual Volume II-b: microMIPS
Jun 6, 2016 · ... ABI Compatibility ... Volume IV-c describes the MIPS-3D® Application-Specific Extension to the MIPS® Architecture.
[66]
A Brief History of the MIPS Architecture - SemiWiki
Dec 7, 2012 · MIPS is one of the most prolific, longest-living industry-standard processor architectures, existing in numerous incarnations over nearly ...Missing: ABI | Show results with:ABI
[67]
Everything You Need to Know About SPARC Architecture - Stromasys
SPARC (Scalable Processor Architecture) was introduced by Sun Microsystems in 1987. It is still powering NASA's 2020 Solar Orbiter mission and is an open, ...
[68]
Milestones:SPARC RISC Architecture, 1987
Mar 18, 2024 · Sun Microsystems first introduced SPARC (Scalable Processor Architecture) RISC (Reduced Instruction-Set Computing) in 1987. Over the course of ...
[69]
[PDF] Oracle Solaris and Sun SPARC Systems—Integrated and Optimized ...
SPARC (Scalable Processor ARChitecture) is a RISC instruction set architecture developed by Sun. Microsystems (now Oracle). The ―Scalable‖ in SPARC comes from ...
[70]
[PDF] The SPARC Architecture Manual, Version 9 - Texas Computer Science
SPARC-V9, like its predecessor SPARC-V8, is a microprocessor specification created by the SPARC Architecture Committee of SPARC International. SPARC-V9 is ...
[71]
[PDF] SPARC Assembly Language Reference Manual - Oracle Help Center
Compiler Calling Convention. The calling convention differs for each architecture. You can see this by examining the assembler code generated by the compiler ...
[72]
SPARC V9 ABI Features - Oracle Help Center
The basic calling convention is the same. The first six arguments of the caller are placed in the out registers %o0-%o5. The SPARC V9 ABI still uses a ...
[73]
[PDF] Program Linkage A Visible/Z Lesson - The Punctilious Programmer
Each type of program has it own conventions. The Calling Program's Conventions. 1) Register 13 should contain the address of a “Save Area”. The save area is a ...
[74]
Standard CALL linkage conventions - IBM
This topic describes the standard Language Environment protocols for passing arguments to external routines.Missing: 1130 architecture
[75]
The SuperH-3, part 12: Calling convention and function prologues ...
Aug 20, 2019 · To make things easier, variadic parameters are always passed in integer registers, so that the callee can just spill them into the home space ...
[76]
M68k Application Binary Interface (ABI)
In this section, we're going to talk about the standard calling convention used by M68k. It is splitted into three sub-sections: Stack frame layout, passing ...Missing: Amiga Mac
[77]
m68k - Free Pascal wiki
Nov 19, 2023 · The Motorola 68k CPU target supports several different calling conventions. stdcall: this calling convention is entirely stack based. It ...Supported CPU types · Supported Targets · Registers · Calling Conventions
[78]
[PDF] IBM 1130 Subroutine Library - Bitsavers.org
Each calling sequence used with subroutines in the 1130 system consists of a. CALL or LIBF statement (whichever is required to call the specific subroutine), ...Missing: convention | Show results with:convention
[79]
[PDF] IBM 1130 Subroutine Library
The appropriate sub- routine calls are generated by the FORTRAN com- piler whenever a read, write, arithmetic, or CALL statement is encountered. This ...
[80]
Threaded Code - Compilers and Languages
Threaded code is a technique for implementing virtual machine interpreters. There are various ways to implement interpreters.
[81]
[PDF] Threaded Code Variations and Optimizations
Forth has been traditionally implemented as in- direct threaded code, where the code for non- primitives is the code-field address of the word. To.Missing: macro | Show results with:macro
[82]
Re: Is Lua direct-threaded?
Jul 30, 2008 · Subject: Re: Is Lua direct-threaded? From: Mike Pall <mikelu-0807@...> Date: Wed, 30 Jul 2008 21:13:48 +0200 ...Suggestions on implementing an efficient instruction set simulator in ...RE: Implementation of Lua and direct/context threaded codeMore results from lua-users.org
[83]
The evolution of Forth | History of programming languages---II
Forth is unique among programming languages in that its development and proliferation has been a grass-roots effort unsupported by any major corporate or ...
[84]
Chapter 2: Language Concepts
... descriptors for S and T). Open PL/I Calling Conventions. The following sections describe the calling conventions for HP, Intel, RS/6000, and Sun Sparc systems.
[85]
Syntax and Linkage Conventions for the Callable Services - IBM
Syntax and Linkage Conventions for the Callable Services. All APPC/MVS callable services have a general calling syntax as follows ...Missing: descriptor- based
[86]
[PDF] Chapter 8 :: Subroutines and Control Abstraction
• Pascal, Ada, list, Scheme. • static chain used to locate objects. • static links points to frame of surrounding subroutine. • guaranteed surrounding ...
[87]
Functions and Procedures - learn.adacore.com
Parameters can be passed in three distinct modes: in , which is the default, is for input parameters, whose value is provided by the caller and cannot be ...Missing: links nested
[88]
Passing data - IBM
You can choose among three ways of passing data between programs: BY REFERENCE, BY CONTENT, or BY VALUE.Missing: variants | Show results with:variants
[89]
Example of GENERAL linkage convention - IBM
The following examples demonstrate how an assembler, C, COBOL, or PL/I stored procedure uses the GENERAL linkage convention to receive parameters.Missing: descriptor- based
[90]
Coding the LINKAGE SECTION - IBM
Coding the LINKAGE SECTION ... Code the same number of data-names in the identifier list of the called program as the number of arguments in the calling program.Missing: convention mainframe record
[91]
2972-constrained-naked - The Rust RFC Book
A naked function has a defined calling convention and a body which contains only assembly code which can rely upon the defined calling convention. A naked ...<|control11|><|separator|>
[92]
Go internal ABI specification - - The Go Programming Language
Go's ABI defines the layout of data in memory and the conventions for calling between Go functions. This ABI is unstable and will change between Go versions. If ...<|control11|><|separator|>
[93]
The Go low-level calling convention on x86-64 (updated)
Dec 1, 2020 · This article reviewed the low-level code generation of the Go compiler, as of version 1.10. A few things have changed since, and so an update is in order.
[94]
rustfoundation/interop-initiative - GitHub
Even for non-systems languages, C is the lingua franca for FFI generally and accessing OS-level resources in particular. As such, C ↔︎ Rust ...
[95]
Safety attributes for C - Open Standards
Jan 5, 2021 · This is useful not just to carry over cross-language information, but also enhances C's ability as lingua franca for defining interfaces (i.e., ...