Fact-checked by Grok 2 weeks ago

Indirect branch

An indirect branch is a type of in that transfers execution to a target computed dynamically at , typically loaded from a or , rather than using a fixed or immediate value in the itself. Unlike direct branches, which specify a relative from the current and allow early prediction in the processor pipeline, indirect branches support absolute jumps to any in the but introduce in the destination until execution. Indirect branches are prevalent in modern programming paradigms, particularly in object-oriented languages like C++ and , where they enable flexible constructs such as calls via virtual function tables, indirect function calls through pointers, and switch statements with computed indices. For instance, in , instructions like ARM's BX <Rd> or LDR pc, [Rd] exemplify indirect branches by using contents to determine the . These branches occur frequently in polymorphic code—up to once every 50 instructions in C++ programs—facilitating runtime polymorphism and essential for software extensibility. A key challenge with indirect branches lies in branch prediction, as processors must anticipate not just whether the branch is taken but also the exact multi-target destination, leading to higher misprediction rates (often 25-37% in benchmarks like SPECint95) that penalize superscalar designs by stalling . Advanced predictors, such as those using history-based indexing or target caches, aim to mitigate this by associating branch addresses with likely outcomes, though accuracy remains lower than for direct branches. Indirect branches gained significant attention starting in for security vulnerabilities, notably in attacks like , where mispredicted targets can leak sensitive data across security domains. mitigations, including Intel's Indirect Branch Predictor Barrier () and Indirect Branch Restricted Speculation (IBRS), insert barriers to flush or restrict predictor states between levels, helping to prevent unauthorized while preserving performance in trusted code paths. However, as of 2025, new variants such as Branch Privilege Injection (CVE-2024-45332) have demonstrated limitations in these defenses, underscoring ongoing challenges.

Fundamentals

Definition

An indirect branch is a in that transfers program execution to a target computed indirectly, typically by loading the address from a or location at , rather than specifying a fixed or immediate value within the itself. This mechanism allows the to perform an absolute jump to any location in the , where the destination is not determinable at but resolved dynamically during execution. Key characteristics of indirect branches include their support for non-sequential program flow, where the next instruction address is calculated based on data, enabling dynamic decision-making in . They play a crucial role in implementing flexible constructs such as function pointers, which allow calls to routines whose locations are determined at , and switch statements, often compiled into jump tables for efficient multi-way branching. A notable example in early minicomputer architectures is the PDP-11 series by in the 1970s, which introduced flexible addressing modes including indirect jumps via instructions like JMP with (denoted by @). These features supported modular by allowing jumps to addresses stored in or registers, facilitating subroutine calls and dynamic linking in .

Distinction from Direct Branches

Direct branches incorporate a fixed address directly into the instruction encoding, typically using PC-relative addressing where a signed is added to the current (PC) to determine the destination. This , often 16 bits in architectures like , allows the to be known at , enabling straightforward decoding and execution without additional accesses. In contrast, indirect branches specify the address indirectly, usually by referencing a value stored in a or location, which is resolved dynamically at . This distinction arises from the need for static versus dynamic in programs. The primary differences lie in predictability, flexibility, and encoding efficiency. Direct branches are statically predictable because their targets are fixed and embedded, facilitating simpler branch prediction mechanisms such as two-bit saturating counters or that achieve accuracies around 95% in typical workloads. Indirect branches, however, exhibit dynamic that vary based on conditions, making more challenging and often resulting in misprediction rates of 25-40% without advanced predictors like stacks or multi-level history tables. Regarding flexibility, indirect branches support decisions essential for structures like calls or switch statements, where the depends on computed values, whereas direct branches are suited to fixed control flows. Instruction encoding for direct branches is generally more compact, as the offset requires fewer bits than storing a full , reducing code size in position-independent executables. In use cases, direct branches are commonly employed for loops, conditional jumps in sequences, and unconditional transfers within a function, as seen in instructions like BEQ (branch if equal) or BNE (branch if not equal), which use a 16-bit PC-relative offset. Indirect branches, exemplified by JR (jump ) or JALR (jump and ), enable computed gotos, procedure returns via pointers, and polymorphic calls in , where the target is loaded from a at execution time. These examples highlight how direct branches promote efficient, predictable execution in straightforward paths, while indirect branches provide the adaptability needed for complex, data-driven program flows.

Implementation

Operational Mechanism

An indirect branch is fetched from by the 's fetch unit, typically as part of the pipeline's initial stage, where the current (PC) points to the 's location. During the decode stage, the interprets the and operands to identify the source of the target , such as a or location, while the effective is calculated if offsets or indexing are involved. In the execute stage, the (ALU) computes the target address by loading it directly from a specified for register indirect mode (e.g., branch to the contents of a register like RAX in x86 or Rd in ARM) or by first accessing at an effective address for memory indirect mode (e.g., branch to the contents of a memory location pointed by a register, such as [[Rd]] in ARM). Base and index with scaling or offsets can further refine the effective address calculation, performed by the ALU to resolve the final target. The resolved target address is then loaded into the (PC, such as RIP in x86 or PC in ARM), redirecting the fetch unit to the new location for subsequent execution, which may involve flushing speculative in the if the was mispredicted. This mechanism applies to variants like indirect jumps and calls, where the primary difference lies in whether a return address is pushed onto the . In pipelined processors, the address resolution typically occurs in the execute stage, delaying the confirmation of the next fetch address compared to direct branches and potentially stalling the until the is known, with the ALU's involvement ensuring accurate computation of complex addressing modes.

Types of Indirect Branches

Indirect branches are categorized primarily by their functional role in program , distinguishing between unconditional transfers, subroutine invocations, and terminations. These variants enable flexible addressing in scenarios where the target location is determined at rather than . Indirect jumps provide an unconditional transfer of control to an address computed from a or location, without preserving a . They are commonly employed in implementations of switch statements, where a jump table indexes into an of addresses based on a computed value, or in state machines that transition between states using dynamically selected pointers. For instance, in x86 architecture, the JMP instruction with a or operand (e.g., JMP r/m) effects such a transfer by loading the target directly into the instruction pointer. Similarly, in ARM architecture, instructions like BX ( with Exchange) or LDR pc, [] load and branch to the address in the specified or , supporting absolute jumps across the . Indirect calls extend the jump mechanism by saving the return address on the stack or in a dedicated register, facilitating subroutine invocation to a runtime-computed target. This form is essential for dynamic linking, where procedure linkage tables (PLTs) and global offset tables (GOTs) resolve symbols at load time, or for callbacks in event-driven systems that dispatch to handler functions via pointers. In x86, the CALL instruction with an indirect operand (e.g., CALL r/m) pushes the current instruction pointer onto the stack before jumping to the computed address. In ARM, indirect calls are typically realized by loading the target into a register followed by BLX (Branch with Link and Exchange), which performs the branch while storing the return address in the Link Register (LR). Indirect returns conclude subroutine execution by loading the return address from the or a and transferring back to the caller, accommodating stack frames in recursive or nested calls. This variant is prevalent in function returns where the stack pointer has been adjusted dynamically. The x86 RET pops the return address from the into the instruction pointer, optionally adjusting the stack with an immediate for frame cleanup. In , returns often use BX LR or POP {pc}, which branches to the address in the or loads it from the into the . Architectural support for indirect branches varies between RISC and CISC designs, reflecting their differing emphases on simplicity and . RISC architectures, such as , often use load-then-branch sequences with fixed-length s, where an address is loaded into a general-purpose before a like BX or BLX, though single-instruction memory-indirect branches are possible via s like LDR pc, [Rn], promoting pipelining efficiency. In contrast, CISC architectures like x86 integrate more complex modes within single s, allowing direct memory-indirect operands (e.g., JMP [mem] or CALL [mem]) that fetch the target in one step, though this introduces variable-length encoding and decoding overhead. These differences stem from RISC's focus on uniform, single-cycle operations versus CISC's aim to reduce count through multifaceted addressing.

Examples and Syntax

Assembler Syntax

In assembly languages, indirect branches are typically expressed using instructions that specify the target address via a , location, or implicit reference rather than a fixed or . Common generic forms include jumps to a (e.g., JMP reg), calls to a with an (e.g., CALL [mem + offset]), and returns that implicitly fetch the target from the (e.g., RET). These forms allow dynamic based on runtime-computed addresses. Variations in syntax arise across instruction set architectures (ISAs). In x86, an indirect jump uses the JMP with a , such as jmp ( syntax), where holds the 32-bit or 64-bit target address; for memory-indirect, it is jmp [ebx + 4], supporting both near (intra-segment) and far (inter-segment) jumps. In (AArch32), the and (BX) performs an indirect branch to a , written as bx r0, which loads the 32-bit address from r0 and optionally switches instruction sets based on the least significant bit. In , the Jump (JR) specifies the target via a , as in jr t0, using the 32-bit address in t0 for control transfer. Encoding details for these instructions involve specific and operand handling to accommodate address sizes. In x86, indirect JMP uses opcode FF /4 for near jumps (register or memory-indirect with 32/64-bit operands) or FF /5 for far indirect jumps, where the byte selects the and the effective address size depends on the mode (32-bit in , 64-bit in ). ARM's BX is encoded in a 32-bit word with bits 27-25 as 001 in the branch instruction , and bits 4-0 specifying the Rm, using a 32-bit register operand in AArch32; in , BR uses a different encoding in the Unconditional Branch (Register) with 64-bit registers. MIPS JR employs an R-type with opcode 000000 (6 bits) and function code 001000 (6 bits, 8), where the rs (bits 25-21) specifies the 32-bit , and rt/shamt/rd are zeroed. These encodings ensure compatibility with the architecture's addressing model, such as RIP-relative in for .

Architectural Examples

In x86 architecture, indirect branches are commonly implemented using the CALL instruction with an indirect , allowing the target address to be specified via a or memory location. For instance, the assembly snippet mov ebx, offset function_ptr; call [ebx] loads a into the EBX and performs an indirect call to the address stored there, pushing the return address onto the stack as in a direct call. This mechanism is detailed in the 64 and Architectures Software Developer's Manual, which specifies the encoding for near indirect CALL as FF /2 with a /memory . In architecture, particularly in Thumb mode, the BX (Branch and Exchange) instruction facilitates indirect branches by jumping to an address held in a while potentially switching between and Thumb instruction sets based on the least significant bit of the address. An example in Thumb is mov r0, #target_address; bx r0, where R0 holds the runtime-determined branch , enabling mode switches if the LSB is 1 for Thumb. The Architecture Reference Manual describes BX using the branch encoding, with the from the specified Rm, preserving the for returns in subroutine calls. The RISC-V instruction set employs the JALR (Jump and Link Register) instruction for register-indirect jumps, computing the target as the sum of a base register and a 12-bit signed immediate offset, while optionally storing the return address in a destination register. A typical indirect jump example is jalr x0, x1, 0, which branches to the address in X1 without linking (since rd=x0 is zero). According to the RISC-V Unprivileged ISA Specification, JALR uses the I-type format, with the target address aligned to 2 bytes by zeroing the LSB of rs1 for compressed instruction compatibility. These indirect branch instructions underpin dynamic dispatch in object-oriented programming, where virtual function calls resolve at runtime to the appropriate method based on the object's type, as seen in languages like C++ and Java that rely on vtables for polymorphic behavior.

Performance Implications

Branch Prediction Issues

Indirect branches pose significant challenges to branch predictors in modern CPUs because their target addresses are computed dynamically at runtime, typically loaded into a register, making them inherently harder to anticipate compared to direct branches where the target is statically encoded in the instruction. This dynamic nature contrasts with the binary "taken/not taken" decision for conditional direct branches, requiring predictors to forecast a full address from potentially numerous possibilities, which often results in higher misprediction rates—typically 25-37% for indirect branches versus around 3% for direct conditional branches. Key issues arise from the multi-target problem, where a single indirect can resolve to many different addresses depending on the register's value, such as in calls or switch statements with computed indices in object-oriented code. For instance, predictors must distinguish among dozens of per , but limited resources like branch target buffers (BTBs) or tables often fail to capture this variability accurately. Additionally, in predictor structures exacerbates errors, as multiple indirect branches or patterns can map to the same table entry, leading to conflict misses where an incorrect is selected due to shared indices in sets with low associativity. These mispredictions trigger pipeline flushes, discarding speculatively executed instructions and incurring substantial penalties of 10-20 cycles per event in contemporary out-of-order processors with deep . The overall impact is amplified in workloads heavy with indirect branches, such as those involving . Studies on benchmarks like SPECint95 reveal that indirect branches account for 20-50% of total mispredictions despite comprising only about 15% of dynamic branches, with individual misprediction rates reaching up to 66% in programs like . In one analysis, indirect branches were responsible for 55.7% of all branch mispredictions across a diverse set, underscoring their disproportionate effect on execution efficiency.

Optimization Strategies

Hardware approaches to optimizing indirect branches primarily involve advanced branch predictors and specialized buffers that address the multi-target nature of these instructions. The ITTAGE predictor, an extension of the TAGE architecture tailored for indirect branches, employs multiple tables indexed by global history and branch address, with a tagless base component and tagged geometric history length components to select the most relevant target. This design enables accurate prediction of multiple targets by prioritizing longer matching histories and incorporating confidence counters, outperforming traditional single-target mechanisms. Indirect branch target buffers (BTBs), enhanced to track multiple targets per entry, cache recent targets and use two-level adaptive predictors to resolve destinations based on historical patterns, reducing resolution latency in pipelined processors. Software techniques mitigate indirect branch overhead through or compile-time transformations. Binary rewriting, as in JumpSwitches, dynamically patches indirect calls into conditional direct calls by learning hot targets at , bypassing costly while maintaining security against attacks like . Thunking inserts intermediate stubs to redirect indirect branches, often combined with rewriting to serialize execution and avoid mispredictions in vulnerable contexts. Compiler optimizations, such as indirect call promotion (), leverage profile-guided data to replace indirect calls with compares, conditional branches, and direct calls to frequent targets, enabling further inlining and reducing misprediction rates. Case studies illustrate these strategies' impact on modern CPUs. In Intel's architecture (2011), an enlarged BTB and two-level indirect predictor supporting up to 24 targets per branch improved pattern recognition over Nehalem, with subsequent microarchitectures like Skylake (2015) achieving 20-30% reductions in mispredictions for complex indirect branches through enhanced TAGE-like components. AMD's (2020) and later cores incorporate TAGE-based predictors with larger BTBs (e.g., 1024 L1 entries in ), yielding excellent indirect jump accuracy and up to 1 taken indirect branch per clock cycle. ICP in LLVM-based compilers has delivered 2-9% speedups on SPEC benchmarks by promoting hot indirect calls. JumpSwitches rewriting provided up to 20% performance gains over retpoline thunks in system call workloads. These optimizations involve trade-offs, as advanced predictors like ITTAGE increase hardware complexity with larger tables and history mechanisms, potentially raising power consumption by 10% or more in branch-heavy workloads, though techniques like partial tagging mitigate this against accuracy gains. Multi-target BTBs demand more silicon area for associativity, balancing improved speculation efficiency with elevated dynamic power in high-frequency designs.

References

  1. [1]
    [PDF] Accurate Indirect Branch Prediction
    In C++ and Java programs, indirect branches occur with even higher frequency (see Table1). These languages promote a polymorphic programming style in which late.
  2. [2]
    Direct and indirect branches - Arm Developer
    Indirect branches perform an absolute branch, so can branch to any location in the address space. However, because the destination is specified in a register or ...Missing: definition | Show results with:definition
  3. [3]
    [PDF] An Efficient Indirect Branch Predictor - UTRGV Faculty Web
    This predictor XORs a pattern- or path-based history bits with the branch address to index the prediction. The Target Cache can reduce the misprediction rates.
  4. [4]
    Indirect Branch Restricted Speculation - Intel
    Jan 3, 2018 · Indirect Branch Restricted Speculation (IBRS) is an indirect branch control mechanism that restricts speculation of indirect branches.
  5. [5]
    Indirect Branch Predictor Barrier - Intel
    Jan 3, 2018 · The indirect branch predictor barrier (IBPB) is an indirect branch control mechanism that establishes a barrier, preventing software that executed before the ...
  6. [6]
    [PDF] Compiler Support for Value-based Indirect Branch Prediction*
    For every static indirect branch instruction, the compiler analyzes the source code to find the 'most recent definition' of the variable on which the indirect ...
  7. [7]
    [PDF] PDP-11 instruction reference
    The third mode bit is ''indirect''. It yields one extra indirection. In ... Branch instructions. (also see SOB in the following section). Format: (octal ...
  8. [8]
    A brief tour of the PDP-11, the most influential minicomputer of all time
    Mar 14, 2022 · The PDP-11 was introduced in 1970, a time when most computing was done on expensive GE, CDC, and IBM mainframes that few people had access to.
  9. [9]
    JMP — Jump
    The JMP instruction transfers program control to a different point, without recording return information. It can perform near, short, far, or task switch jumps.
  10. [10]
    Manuals for Intel® 64 and IA-32 Architectures
    ### Summary of Indirect Branches in x86 (JMP, CALL, RET) from Intel® 64 and IA-32 Architectures Software Developer Manuals
  11. [11]
    [PDF] A Comprehensive Analysis of Indirect Branch Prediction - accedaCRIS
    Indirect branch prediction is a performance limiting factor for current computer systems, preventing superscalar processors from exploiting the available ILP.Missing: definition | Show results with:definition
  12. [12]
    [PDF] A Tale of Two Processors: Revisiting the RISC-CISC Debate
    We also find a difference in the fraction of branch instructions, though not as sig- nificant as the differences observed for load instructions. For example, ...
  13. [13]
    ARM Compiler armasm User Guide Version 6.02 - Arm Developer
    The BX instruction causes a branch to the address contained in Rm and exchanges the instruction set, if required: The BX instruction can change the instruction ...
  14. [14]
    MIPS Encoding Reference
    For each instruction, the 6-bit opcode or function is shown. The syntax column indicates which syntax is used to write the instruction in assembly text files.
  15. [15]
    [PDF] The Impact of Branch Prediction on Control Structures for Dynamic ...
    Abstract: Dynamic dispatch, or late binding of function calls, is a salient feature of object-oriented programming languages like C++ and Java.
  16. [16]
    None
    ### Challenges in Indirect Branch Prediction
  17. [17]
    [PDF] Indirector: High-Precision Branch Target Injection Attacks Exploiting ...
    Aug 14, 2024 · This paper introduces novel high-precision Branch Target. Injection (BTI) attacks, leveraging the intricate structures of the Indirect ...
  18. [18]
    CPU Pipelines & Branch Prediction: Modern Processor Architecture
    Aug 5, 2025 · Explore CPU pipeline stages, instruction-level parallelism, pipeline hazards, and branch prediction through interactive visualizations.
  19. [19]
    [PDF] A 64-Kbytes ITTAGE indirect branch predictor
    ITTAGE is an indirect branch predictor with a tagless base and tagged components, using geometric history lengths and partial match for prediction.
  20. [20]
    [PDF] Optimizing Indirect Branch Prediction Accuracy in Virtual Machine ...
    The best predictor for indirect branches in widely avail- able CPUs is the branch target buffer (BTB). An idealised. BTB contains one entry for each branch and ...
  21. [21]
    [PDF] STRATEGIES FOR BRANCH TARGET BUFFERS - Stanford University
    One hardware method to predict branch targets is to use branch target buffers (btb) [LS84]. BTBs act as one entry per line caches where the index is the ...
  22. [22]
    [PDF] Restoring the Performance of Indirect Branches In the Era of Spectre
    Jul 10, 2019 · Since targets are dynamically computed, indirect branches can execute code depending on what data is present. This indirection is most ...
  23. [23]
    [PDF] Profile-based Indirect Call Promotion - LLVM
    Indirect call promotion (ICP) replaces an indirect call with a compare, conditional branch, and direct call to the hottest target, reducing misprediction ...
  24. [24]
    Sandy Bridge: Setting Intel's Modern Foundation - Chips and Cheese
    Aug 4, 2023 · Indirect Branch Prediction. Returns are a special case of indirect branches, or branches that go to multiple targets. Predicting indirect branch ...
  25. [25]
    [PDF] 3. The microarchitecture of Intel, AMD, and VIA CPUs - Agner Fog
    Sep 20, 2025 · The main stages in the pipeline are: branch prediction, instruction fetch, instruction decoding, register renaming, reorder buffer read ...
  26. [26]
    [PDF] Power-Aware Branch Prediction: Characterization and Design
    This paper explores tradeoffs between power and performance that stem from the choice of branch-predictor organization, and proposes some new techniques that ...Missing: indirect | Show results with:indirect
  27. [27]
    [PDF] CUSTOMIZING THE BRANCH PREDICTOR TO REDUCE ...
    This high operational complexity causes significant energy consump- tion: In certain cases, branch prediction accounts for more than 10 percent of total chip.Missing: indirect | Show results with:indirect