Fact-checked by Grok 2 weeks ago

Profile-guided optimization

Profile-guided optimization (PGO), also known as feedback-directed optimization (FDO), is a compiler technique that leverages runtime profiling data collected from representative program executions to inform and enhance static optimization decisions, such as function inlining, branch prediction, register allocation, and code layout, ultimately improving application performance and potentially reducing binary size.^[1]^[2]^[3] The process typically involves three phases: first, compiling the program with instrumentation to generate profiling code; second, executing the instrumented binary under realistic workloads to collect data on execution frequencies, such as branch probabilities and call sites; and third, recompiling the program using the profile data to apply targeted optimizations that prioritize frequently executed paths.^[2]^[4]^[5] This approach contrasts with purely static optimizations by incorporating dynamic behavior, enabling more precise transformations that can yield speedups of 2-14% in benchmarks, depending on the workload and compiler.^[1]^[6] PGO has been implemented in major compilers and toolchains, including Microsoft Visual C++ (via flags like /GENPROFILE and /USEPROFILE), GCC and Clang/LLVM (using -fprofile-generate and -fprofile-use), the Go compiler (with CPU profiles from runtime/pprof), and others like IBM XL C/C++ and Android NDK's Clang-based builds, often supporting architectures such as x86, x64, ARM, and PowerPC.^[2]^[5]^[4] While effective, PGO introduces challenges like the overhead of instrumentation (up to 16% runtime slowdown during profiling) and the need for profiles that accurately represent production workloads to avoid suboptimal optimizations.^[6]^[3] Recent advancements explore alternatives, such as machine learning-based inference of profiles to bypass collection costs, achieving up to 83% of traditional PGO benefits with minimal overhead.^[3]^[6]

Fundamentals

Definition and Overview

Profile-guided optimization (PGO) is a compiler optimization technique that uses runtime profile data gathered from instrumented executions of a program to inform and refine static optimization decisions, ultimately enhancing the performance of the resulting executable.^[5] This method allows compilers to tailor optimizations to the actual behavior observed during typical runs, rather than relying solely on conservative assumptions.^[2] PGO is also referred to as feedback-directed optimization (FDO) or profile-directed feedback (PDF).^[1] Its core principles involve integrating static compile-time analysis with dynamic runtime information to enable targeted improvements in areas such as code layout for better cache locality, function inlining based on call frequencies, branch prediction aligned with observed probabilities, and register allocation that prioritizes frequently used variables.^[7]^[8] By leveraging execution profiles, PGO bridges the gap between the limitations of static heuristics—which cannot capture workload-specific patterns—and the precision of runtime insights.^[9] A foundational prerequisite for PGO is an understanding of compiler optimization basics, including the contrast between static analysis (performed at compile time without execution) and dynamic analysis (derived from actual runs).^[9] For instance, consider a loop with a conditional branch where static analysis assumes a balanced 50% probability for each outcome; if profile data indicates that one branch is taken 90% of the time, the compiler can reorder the code to place the hot path first, improving branch prediction accuracy and reducing processor stalls.^[10] The roots of profile-guided optimization trace back to the late 1980s and early 1990s, with early work such as Pettis and Hansen's 1990 profile-guided code positioning exploring the use of execution profiles to guide compilation.^[9]^[11]

Historical Development

The concept of profile-guided optimization (PGO) traces its origins to early efforts in compiler design that leveraged runtime profiling for better code performance. In 1992, foundational work by Joseph Fisher and Stefan Freudenberger introduced the use of runtime profile information for static branch prediction, marking an early application of profiling to guide compiler decisions on program behavior.^[9]^[12] This built on earlier ideas, such as Pettis and Hansen's 1990 work on profile-guided code positioning, which used execution profiles to optimize procedure and basic block placement for better instruction cache performance.^[9]^[11] This approach built on prior ideas of program analysis but emphasized empirical data from execution traces to inform optimizations like branch prediction and code layout, setting the stage for more sophisticated feedback-directed techniques.^[9] Advancements accelerated in the 1990s with the formal introduction of PGO in commercial compilers, driven by the need to optimize for increasingly complex superscalar processors. Intel's compiler team developed profile-guided techniques between 1992 and 1993, incorporating them into their C/C++ compilers to improve code positioning, inlining, and branch optimization based on execution profiles, which yielded significant performance gains on Pentium processors.^[10] Similarly, Microsoft initiated PGO integration in Visual C++ during the late 1990s, initially focusing on Itanium architecture to enhance whole-program optimization using runtime feedback data.^[13] These efforts represented a shift from purely static analysis to hybrid methods that combined compile-time heuristics with real-world execution insights.^[9] The early 2000s saw PGO's adoption in open-source compilers, broadening its accessibility. The GNU Compiler Collection (GCC) introduced support for profile-guided optimizations in version 3.3, released in 2003, enabling developers to use instrumentation flags like -fprofile-generate and -fprofile-use for feedback-directed enhancements such as improved function inlining and loop optimizations.^[14] This milestone democratized PGO, allowing widespread experimentation and refinement in diverse environments. As profiling overhead became a concern for large-scale applications, evolution toward low-overhead variants emerged; notably, Google's AutoFDO in 2013 utilized hardware performance monitoring units (PMUs) for sampling-based profiling, automating feedback collection without full instrumentation and achieving up to 5-10% performance improvements in warehouse-scale systems.^[15]^[9] Recent developments have focused on hardware-assisted and adaptive PGO to address modern challenges like deployment overhead and profile staleness. Intel's experimental Hardware Profile-Guided Optimization (HWPGO), introduced in the 2024 oneAPI compiler release, leverages processor event-based sampling (PEBS) and last branch records for non-intrusive profiling during production runs, enabling optimizations in highly optimized binaries without recompilation cycles.^[16] By 2025, HWPGO remains in active experimentation within Intel's toolchain for broader hardware support.^[9] Concurrently, studies up to 2024 have tackled stale profiles in evolving binaries, with techniques like multi-level hash matching proposed to align outdated profiles with updated codebases, ensuring sustained optimization efficacy in dynamic software environments.^[17]^[9]

Process

Instrumentation and Profiling

In the instrumentation phase of profile-guided optimization (PGO), the compiler inserts probes into the intermediate representation or generated code during a dedicated build process to capture runtime behavior. These probes typically include counters for branches and edges in the control flow graph, allowing the collection of execution frequencies without altering the program's semantics. For instance, tools like those in LLVM or GCC add instrumentation code that increments counters at key points, such as conditional branches or function entries, generating an instrumented binary suitable for profiling.^[2]^[18] Profile collection occurs by executing the instrumented binary on representative workloads, which records data on dynamic execution patterns. This involves running the program through typical use cases, where the probes log metrics such as branch taken/not-taken ratios, function call frequencies, and loop iteration counts into profile files (e.g., .profdata in LLVM). The collected data reflects the actual hot paths and usage patterns encountered during execution.^[19]^[20] Common types of profiles generated include edge profiles, which measure the frequency of transitions between basic blocks in the control flow graph to inform branch prediction and layout decisions; value profiles, which track the most frequent runtime values of variables or operands to enable specialization; and call graph profiles, which capture function invocation hierarchies and frequencies to guide inlining and devirtualization. These profiles provide a statistical view of program behavior, with edge profiles being particularly foundational for control flow analysis.^[19]^[21] Instrumentation introduces runtime overhead, typically ranging from 10% to 50% slowdown depending on the program's complexity and profiling granularity, as counters and logging impose additional instructions and memory accesses. This overhead can be mitigated through sampling techniques, such as hardware-based sampling (e.g., using performance counters) or selective instrumentation, which reduce the frequency of data collection to as low as 1-5% in production-like scenarios.^[22]^[16] The accuracy of profiles hinges on using representative inputs that mirror production workloads, as mismatched data can lead to suboptimal optimizations for rare paths. For example, in web browser applications like Chromium, profiling involves simulating common webpage loads and user interactions via benchmark suites to capture realistic rendering and JavaScript execution patterns. These collected profiles subsequently inform the compiler's optimization decisions in later phases.^[23]^[19]

Optimization Using Profiles

In the feedback phase of profile-guided optimization (PGO), the compiler reads profile data files generated from prior execution runs and applies this information to inform optimization decisions during code generation. For instance, in GCC, the compiler processes .gcda files containing execution counts for branches, calls, and basic blocks when invoked with the -fprofile-use flag, enabling biased transformations that align with observed runtime behavior.^[24] Similarly, in LLVM-based compilers like Clang, profile data in formats such as .profdata is loaded via flags like -fprofile-instr-use, allowing the optimizer to prioritize hot paths and frequent operations.^[25] This phase follows the multi-phase compilation workflow: an initial instrumentation build produces an executable that collects profiles during a training run on representative inputs, after which the feedback compilation generates the final optimized binary.^[26] One key optimization enabled by profiles is improved function inlining at hot call sites, where the compiler selectively replaces calls to frequently executed functions with their inline bodies to reduce overhead and enable further transformations. Profile data identifies "hot" callees based on invocation counts, allowing aggressive inlining only where beneficial, as implemented in modern compilers like GCC and Clang.^[24]^[25] Function reordering leverages call graph profiles to rearrange procedures in memory, placing frequently interacting functions closer together to enhance instruction cache locality and reduce fetch latencies.^[26] Branch layout optimization uses predicted probabilities to reorder conditional code, positioning likely-taken branches in faster execution paths; the probability p for a branch is computed as

p = \frac{\text{taken_count}}{\text{total_executions}}

, derived from arc counts in the profile, which guides basic block sequencing without full derivation here.^[25]^[24] Advanced applications include value-based optimizations, where profiles of frequent operand values enable code specialization, such as generating tailored versions of loops or conditionals for common cases observed in training runs.^[25] For indirect calls, profiles annotate potential targets with their relative frequencies, allowing the compiler to promote likely targets to direct calls or optimize virtual function dispatch layouts for better prediction and reduced indirection costs.^[25] These techniques, rooted in early work on using execution profiles to guide code placement, have evolved to integrate seamlessly into compiler pipelines for targeted performance gains.

Advantages and Challenges

Benefits

Profile-guided optimization (PGO) typically yields runtime performance improvements of 5-10% across a variety of applications by enabling more accurate compiler decisions on hot code paths.^[27] In warehouse-scale applications like Google Chrome, AutoFDO—a sampling-based PGO variant—achieves a geometric mean speedup of 10.5%, capturing 85% of the benefits from traditional instrumentation-based PGO.^[28] For branch-heavy code, gains can reach 10-20%, as seen in benchmarks where code layout optimizations reduce branch mispredictions by up to 22%, leading to speedups of up to 16% on integer workloads.^[29] PGO also enhances efficiency through targeted inlining and code reorganization, reducing executable size by 2-10% in native mobile applications via hot/cold function partitioning.^[30] This improves cache utilization, with page fault reductions of 20-36% in mobile apps, minimizing memory access overheads akin to cache misses by 10-20% in locality-focused optimizations.^[30]^[31] In specific cases, such as the HotSpot JVM for dynamic languages, PGO refines just-in-time compilation by enhancing inlining decisions, yielding up to 58% performance uplift from profile-informed transformations on standard benchmarks.^[32] For server applications dominated by hot paths, PGO prioritizes optimizations like function reordering, amplifying gains where execution profiles reveal skewed branch behaviors. A quantitative example illustrates branch optimization: in a loop with 90% predictable branches, PGO can reorder basic blocks to align likely paths, slashing misprediction rates and penalties (typically 10-20 cycles per misprediction on modern CPUs) from potentially high overhead to near-zero in hot loops, boosting overall throughput.^[29]^[33] Broader impacts include energy savings in mobile and embedded systems, where faster execution and reduced memory accesses extend battery life by optimizing resource-constrained workloads without hardware changes.^[34]

Limitations

Profile-guided optimization (PGO) introduces significant overhead and complexity due to its multi-step process, which involves instrumenting the code for profiling, executing the instrumented binary with representative workloads to generate profile data, and then recompiling with those profiles applied. This workflow increases overall build times, as the instrumentation phase adds compilation and runtime costs, while the profiling execution often requires multiple runs on hardware or emulators to capture adequate data.^[9] Moreover, obtaining representative inputs is challenging, as profiles must closely mirror real-world usage; deviations can lead to misdirected optimizations that fail to improve or even degrade performance in unprofiled scenarios.^[35] A key limitation arises from profile staleness, where profiles collected from outdated code versions become invalid for subsequent builds, resulting in suboptimal optimizations or performance regressions. As binaries evolve through updates, the mismatch between profile data and the current codebase can cause the compiler to apply inappropriate decisions, such as inlining cold paths or misprioritizing branches. Recent 2024 studies highlight this issue.^[36]^[9] PGO's effectiveness is limited in scenarios with unpredictable workloads, such as user-input-driven applications where execution paths vary widely and cannot be reliably profiled in advance. In such cases, the optimizations tuned to specific training inputs may not generalize, leading to neutral or negative impacts on diverse runtime behaviors.^[9]^[35] The technique also imposes resource demands in large codebases, where collecting and processing profile data can consume memory and disk space. Sampling-based variants reduce runtime overhead but may introduce noise that affects optimization quality.^[9] In diverse workloads like emulators, mismatched or non-representative profiles can exacerbate these issues, causing performance degradation due to erroneous hot path assumptions.^[9]

Applications and Implementations

Compiler and Tool Support

Microsoft Visual C++ has provided full profile-guided optimization (PGO) support since the late 1990s, enabling developers to optimize executables using runtime profile data collected from test runs.^[13] This implementation relies on a multi-phase build process: compilation with whole program optimization (/GL), linking to generate a program database (.pgd) file with /GENPROFILE, running the instrumented binary to produce .pgc files capturing execution counts, and final linking with /USEPROFILE to apply optimizations informed by the profiles.^[2] Key features include inlining based on hot paths, virtual call speculation, improved register allocation, and function reordering for better cache locality, all tailored to the observed workload.^[2] The GNU Compiler Collection (GCC) supports PGO through instrumentation options like -fprofile-generate for initial compilation, which embeds counters to collect profile data during execution, followed by recompilation with -fprofile-use to feed the resulting .gcda files back into the optimizer.^[37] This enables optimizations such as branch prediction, basic block placement, and interprocedural analysis that are particularly effective with profile feedback, though it requires multiple build stages and careful management of profile files to avoid inconsistencies.^[37] Clang, part of the LLVM project, implements PGO using the same standard flags as GCC (-fprofile-generate and -fprofile-use) for compatibility, while integrating it seamlessly with ThinLTO for scalable link-time optimization that accelerates builds without sacrificing the benefits of profile-driven code layout and inlining.^[38] This combination allows for efficient handling of large codebases, where ThinLTO distributes summary-based optimizations across modules before applying full PGO during the final link phase.^[39] Intel's oneAPI DPC++/C++ Compiler offers advanced PGO capabilities, including detailed flow and loop analysis to refine branch probabilities and loop unrolling based on profile data, surpassing basic instrumentation through enhanced feedback mechanisms.^[16] As of 2025, it includes experimental Hardware Profile-Guided Optimization (HWPGO), which leverages CPU performance counters for sampling-based profiling on optimized production binaries, reducing overhead compared to traditional instrumentation while enabling optimizations like precise last-time-taken misprediction reductions.^[40]^[41] Other languages and ecosystems have adopted PGO through their compilers. The Go programming language introduced PGO in version 1.20 (released in 2023), utilizing CPU profiles from the pprof package to guide optimizations like function inlining and branch layout without requiring separate instrumentation builds.^[1] Rust supports PGO via its LLVM backend, with the cargo-pgo tool simplifying the workflow by automating profile collection and recompilation for binaries, ensuring compatibility with Cargo's build system.^[7] In .NET, starting with version 6 (2021), ReadyToRun compilation incorporates PGO to pre-JIT code with runtime-informed optimizations for common types and paths, enabled dynamically via tiered compilation for further workload-specific tuning.^[42] For specialized use cases, Google's AutoFDO provides a sampling-based alternative to full instrumentation PGO, capturing hardware performance counter data from production runs on optimized binaries and converting it into feedback for compilers like GCC, allowing seamless optimization of warehouse-scale applications without build-time overhead.^[28]

Real-World Usage

Profile-guided optimization (PGO) has seen widespread adoption in major web browsers, where performance is critical for user experience. Mozilla Firefox has utilized PGO since the early 2010s to enhance rendering and JavaScript execution speeds, initially with GCC and later transitioning to LLVM/Clang builds that support advanced PGO features for better code layout and inlining decisions. Google Chrome enabled PGO in its 64-bit version starting with Chrome 53 in 2016, leveraging Microsoft's PGO tools to achieve up to 15% overall performance improvements on Windows, including reductions in startup time by 17% and page load times by 6%. The V8 JavaScript engine in Chrome and Node.js incorporates profile-guided tiering and runtime profiling to optimize hot code paths dynamically, focusing resources on frequently executed functions for faster script execution. In runtime environments, PGO enhances just-in-time (JIT) compilation for managed languages. The HotSpot JVM employs dynamic profile-guided optimizations during runtime, collecting execution profiles to inform inlining, loop unrolling, and speculation, which customizes native code generation for specific workloads and improves application throughput. Similarly, the .NET runtime integrates dynamic PGO, enabled by default in .NET 8 and expanded in .NET 9 to cover more code paths, enabling cross-platform applications to benefit from profile-informed JIT decisions that reduce execution time in scenarios like web services and desktop apps. Recent trends highlight PGO's application in diverse systems. In 2024, Ubuntu conducted a case study applying PGO to QEMU for RISC-V emulation on AMD64 hosts, using tools like perf and AutoFDO to profile builds of packages such as OpenSSL and Python; this resulted in 5-7% reductions in CPU utilization and build times across tested workloads. Conference talks at Google I/O and Conf42 Golang in 2024 discussed using CPU profiles from pprof to enable PGO in Go applications, streamlining hot paths and reducing resource consumption for server and cloud-native software. A 2025 master's thesis demonstrated PGO adaptations for mobile GPU graphics pipelines on Arm Mali hardware via LLVM passes, achieving up to 3.74% fewer clock cycles and 6.32% fewer message instructions in simulated environments. Emerging areas leverage PGO where execution patterns are predictable. In serverless computing, PGO addresses cold start latencies by optimizing library loading and function initialization; for instance, tools like ColdSpy applied to AWS Lambda workloads reduced startup times by up to 42% in real-world applications. AI workloads benefit similarly in scenarios with stable hot paths, such as inference pipelines, though direct integrations remain exploratory. However, PGO adoption is limited in embedded systems due to resource constraints and dynamic workloads, where profiling overhead and unrepresentative data can outweigh gains on devices with tight power and memory budgets. A notable case study is Chrome's 2016 PGO integration on Windows, which optimized the browser's core binaries using representative user profiles; this yielded approximately 10% faster overall load times in median scenarios, with ongoing refinements in subsequent releases maintaining these benefits through updated profiling data.

References

[1]
Profile-guided optimization - The Go Programming Language
Profile-guided optimization (PGO), also known as feedback-directed optimization (FDO), is a compiler optimization technique that feeds information (a profile)
[2]
Profile-guided optimizations | Microsoft Learn
Oct 18, 2022 · Profile-guided optimization (PGO) lets you optimize a whole executable file, where the optimizer uses data from test runs of the .exe or .dll file.Steps to optimize your app · Optimizations performed by PGO
[3]
Profile Guided Optimization without Profiles: A Machine Learning ...
Dec 24, 2021 · Abstract:Profile guided optimization is an effective technique for improving the optimization ability of compilers based on dynamic behavior ...
[4]
Profile-guided Optimization | Android NDK
Sep 24, 2024 · Profile-guided optimization (PGO) is a well known compiler optimization technique. In PGO, runtime profiles from a program's executions are used by the ...
[5]
Profile Guided Optimization (PGO) - IBM
PGO, also known as profile-directed feedback (PDF), is a compiler optimization technique in computer programming that uses profiling to improve program runtime ...
[6]
[1411.6361] Hardware Counted Profile-Guided Optimization - arXiv
Nov 24, 2014 · Abstract:Profile-Guided Optimization (PGO) is an excellent means to improve the performance of a compiled program.
[7]
Profile-guided Optimization - The rustc book - Rust Documentation
The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform ...
[8]
Profile-guided Optimizations Overview
Profile-Guided Optimization (PGO) improves application performance by reorganizing code layout to reduce instruction-cache problems, shrinking code size, and ...
[9]
[PDF] Unveiling the Profile Guided Optimization - arXiv
Jul 22, 2025 · Profile Guided Optimization (PGO) uses runtime profiling to inform compiler decisions, effectively combining static analysis with actual ...
[10]
Profile-Guided Optimizations - Jacob Filipp
Suppose a condition usually evaluates to False. Profile-guided optimization would arrange the branch so that the False block comes before the True block.
[11]
Build faster and high performing native applications using PGO
Apr 4, 2013 · PGO was initiated by the Visual C and Microsoft Research groups in the late 90's and primarily focused on Itanium architecture. PGO was ...Missing: 1990s | Show results with:1990s
[12]
profile-guided opt. w/ GCC - PostgreSQL
Sep 30, 2004 · - I've only bothered adding support for GCC 3.4 (IIRC profile-guided optimization was introduced in GCC 3.3, but 3.4 adds a simpler interface
[13]
[PDF] AutoFDO: Automatic Feedback-Directed Optimization for Warehouse ...
Mar 12, 2016 · Abstract. AutoFDO is a system to simplify real-world deployment of feedback-directed optimization (FDO). The system works by.
[14]
Hardware-based Profile Guided Optimization (PGO) from Intel
Feb 29, 2024 · Branch misprediction occurs when the processor incorrectly predicts the outcome of a conditional branch instruction, which include if-then-else ...
[15]
https://research.google.com/pubs/archive/45290.pdf
[16]
How Profile-Guided Optimization (PGO) works - Android Developers
Mar 29, 2023 · Profile-guided optimization (also known as PGO, or "pogo") is a way of further optimizing optimized builds of your game using information about the way that ...Generating a profile · Merging profile data · When to use Profile-Guided...
[17]
[PDF] Profile Guided Compiler Optimizations
A profile-guided optimizer uses a program's execution profile in two ways to aggressively optimize the program. First the profiles can be used to identify new ...Missing: 1970s | Show results with:1970s<|control11|><|separator|>
[18]
Profile-Guided Optimization - GraalVM
Profile-Guided Optimization (PGO) is a technique that brings profile information to an AOT compiler to improve the quality of its output in terms of ...
[19]
[PDF] Value Profiling and Optimization
In this section we examine the performance potential of using value profiling to guide code specialization. ... Chambers, \Profile-guided receiver class predic-.
[20]
[PDF] Efficient Path Profiling
Efficient path profiling opens new possibilities for pro- gram optimization ... Comparison of path profiling (PP) against Ball-Larus edge profiling (QPT2).
[21]
Chromium Docs - Profile-Guided Optimization (PGO)
### Summary of Representative Inputs for Profiling in Web Browser Context
[22]
Instrumentation Options (Using the GNU Compiler Collection (GCC))
### Summary of Profile Data Usage from .gcda Files in GCC Feedback Phase
[23]
https://chromium.googlesource.com/chromium/src/+/HEAD/docs/pgo.md
[24]
Profile-guided optimization (PGO) using GCC on IBM AIX
Jan 17, 2019 · In this article, you can learn how profile guided optimization works, and how it can be used with GCC to acclerate any application.Profile-guided optimization · Optimization · Function inlining · Register allocation
[25]
Profile-guided optimization: A case study - Ubuntu
Nov 18, 2024 · Profile-guided optimization (PGO) uses a profiler and compiler to automatically optimize software. The case study showed average CPU and build ...<|control11|><|separator|>
[26]
AutoFDO: Automatic Feedback-Directed Optimization for Warehouse ...
AutoFDO is a system to simplify real-world deployment of feedback-directed optimization (FDO). The system works by sampling hardware performance monitors.Missing: 2013 | Show results with:2013
[27]
[PDF] Code Placement for Improving Dynamic Branch Prediction Accuracy
Apr 14, 2005 · On SPEC CPU integer benchmarks, our technique reduces branch mispredictions by up to 22% and 3.5% on aver- age. This reduction yields a speedup ...<|control11|><|separator|>
[28]
Efficient profile-guided size optimization for native mobile applications
Mar 18, 2022 · This paper introduces MIP, a lightweight instrumentation, and PGO to order hot/cold functions, improving size and performance of mobile apps.Missing: gains | Show results with:gains
[29]
Profile-guided proactive garbage collection for locality optimization
Our system proactively reorganizes the heap by leveraging the garbage collector and uses profile information collected through a low-overhead mechanism to guide ...
[30]
[PDF] Exploring Impact of Profile Data on Code Quality in the HotSpot JVM
Compiler transformations that employ profile information to benefit program performance are called profile-guided optimizations (PGO). Managed language ...
[31]
Mispredicted branches can multiply your running times
Oct 15, 2019 · You should expect a penalty of more than 10 cycles for each mispredicted branch. It can multiply the running times.
[32]
[PDF] Profile-Guided Optimization for Mobile and Embedded Systems
Oct 30, 2023 · By leveraging. PGO, developers can significantly enhance the performance and resource efficiency of software on these constrained platforms.Missing: savings | Show results with:savings
[33]
Profile-Guided Optimization | Arm Learning Paths
Profile-Guided Optimization (PGO) is a compiler optimization technique that enhances program performance by utilizing real-world execution data. In GCC/G++, PGO ...
[34]
Stale Profile Matching | Proceedings of the 33rd ACM SIGPLAN ...
Feb 20, 2024 · Profile staleness occurs when profile data is collected on a different binary version than the one being optimized, making it invalid for ...Missing: studies | Show results with:studies
[35]
Optimize Options (Using the GNU Compiler Collection (GCC))
Enable profile feedback-directed optimizations, and the following optimizations, many of which are generally profitable only with profile feedback available:.
[36]
Clang Compiler User’s Manual — Clang 22.0.0git documentation
Summary of each segment:
[37]
ThinLTO — Clang 22.0.0git documentation - LLVM
In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the compile phase. The ThinLTO bitcode is augmented with a compact summary of the module.Missing: PGO | Show results with:PGO
[38]
Hardware Profile-Guided Optimization - Intel
Oct 31, 2024 · The Intel® oneAPI DPC++/C++ Compiler provides an llvm-profgen tool to understand Common Object File Format (COFF) binaries with associated ...Missing: PGO
[39]
fprofile-sample-use - Intel
Enables the compiler and linker to use information for Hardware Profile-Guided Optimization (HWPGO). This is an experimental feature.
[40]
What's new in .NET 6 - Microsoft Learn
May 26, 2023 · Profile-guided optimization (PGO) is where the JIT compiler generates optimized code in terms of the types and code paths that are most ...