Fact-checked by Grok 2 weeks ago

Gprof

Gprof is a performance tool included in the GNU Binutils suite that generates execution profiles for , Pascal, or 77 programs by analyzing data from instrumented executions. It incorporates the time spent in called routines into the profile of each caller, enabling developers to pinpoint functions and code paths that consume the most during program execution. Developed as part of the GNU Project, gprof requires programs to be compiled with the -pg flag using or compatible compilers, which inserts code to produce a gmon.out file containing runtime statistics. The tool processes this profile data to output several report formats, including a flat profile that lists functions by time percentage and self-time, a showing caller-callee relationships with estimated call counts and inclusive times, and optional annotated source code listings highlighting execution hotspots. Key options allow customization, such as -p for flat profiles, -q for call graphs, -A for source annotation, and options like -k to exclude specific arcs or -E to suppress display of certain symbols. Gprof handles cycles in the call graph by propagating times appropriately and supports multiple profile files for aggregated analysis, though it assumes normal program termination via exit(2) or main return to ensure data capture. GNU gprof was originally written by Jay Fenlason and has been maintained as a core component of Binutils since its early versions, with the current documentation covering releases up to version 2.45. Widely used in for optimization, it remains a standard tool on systems despite limitations like its focus on over I/O or memory, and lack of support for multi-threaded or dynamic linking scenarios without extensions. Its integration with tools makes it essential for open-source and performance-critical applications.

Overview and Usage

Purpose and Features

Gprof is a hybrid instrumentation and sampling profiler for systems, integrated as part of Binutils suite of tools. Developed originally at the , it extends the capabilities of the earlier Unix profiler prof(1) by incorporating analysis to provide more detailed insights into program execution. The primary goal of Gprof is to identify functions that account for the majority of a program's execution time and to visualize the call relationships between functions, enabling developers to pinpoint bottlenecks in modular software. This approach attributes runtime costs not just to individual routines but also to the abstractions they implement, facilitating targeted optimizations. Key features of Gprof include the production of flat profiles, which list functions sorted by descending execution time along with call counts, and call graphs that depict caller-callee interactions while propagating time from called routines back to their callers. These outputs help reveal unexpected call patterns and the cumulative impact of subroutine hierarchies. Gprof can profile programs written in languages such as , , Pascal, and Fortran 77, compiled with or compatible compilers, leveraging the compiler's instrumentation to collect data within the broader binutils environment. Programs are instrumented via the -pg compilation flag to generate the necessary profiling data.

Basic Usage Workflow

To use Gprof for a program, the first step is to compile the source code with instrumentation enabled. This is achieved by including the -pg flag when using the , which inserts calls to functions at strategic points in the code. For instance, the command gcc -pg -g -o myprogram source.c compiles the file source.c into an named myprogram, where -g adds debugging information for better symbol resolution in the output. Once compiled, the instrumented program is executed in the usual manner, such as ./myprogram, with any necessary input provided via standard input, arguments, or files. During , the program collects data on calls and execution times, writing this information to a named gmon.out upon normal termination (via return from main or an exit call). If the program crashes or is interrupted abnormally, the gmon.out file may not be generated, requiring re-execution under controlled conditions. After execution, the profiling data is analyzed using the gprof command, which processes the gmon.out file alongside the to produce human-readable reports. The basic is gprof myprogram gmon.out > profile.txt, which generates a containing a flat of time distribution across functions and a showing relationships. Additional options allow customized views; for example, gprof -A myprogram gmon.out produces an annotated source listing with execution percentages overlaid on the original code lines. For scenarios involving multiple runs to accumulate more accurate data, several gmon.out files can be merged using gprof -s myprogram gmon.out1 gmon.out2, which combines the inputs into a single gmon.sum file. This summed data is then analyzed as usual, such as gprof myprogram gmon.sum > combined_profile.txt, providing aggregated statistics over repeated executions. To create multiple distinct profile files, rename or move gmon.out after each execution (e.g., [mv](/page/MV) gmon.out gmon1.out), then run the program again to generate the next file. A complete example workflow uses a simple C program (fib.c) that computes Fibonacci numbers iteratively to demonstrate time spent in a loop-heavy function:
c
#include <stdio.h>

long fib(int n) {
    long a = 0, b = 1, c;
    if (n <= 1) return n;
    for (int i = 2; i <= n; i++) {
        c = a + b;
        a = b;
        b = c;
    }
    return b;
}

int main(int argc, char *argv[]) {
    if (argc > 1) {
        int n = atoi(argv[1]);
        printf("Fib(%d) = %ld\n", n, fib(n));
    }
    return 0;
}
The terminal sequence proceeds as follows:
$ gcc -pg -g -o fib fib.c
$ ./fib 40
Fib(40) = 102334155
$ gprof fib gmon.out > fib_profile.txt
$ cat fib_profile.txt
This produces output including a flat profile (e.g., showing most time in main and fib) and call graph, with fib invoked once from main. For multiple runs, rename gmon.out (e.g., mv gmon.out gmon1.out), execute ./fib 40 again to generate a second file, then sum and analyze as described.

Implementation

Code Instrumentation

Gprof achieves precise call counting and call graph construction through compile-time instrumentation of the source code, primarily facilitated by the . When compiling with the -pg flag, inserts calls to the monitoring function mcount (or _mcount or __mcount, depending on the platform and configuration) at the entry point of every instrumented function. This instrumentation enables the collection of exact caller-callee relationships without relying on statistical sampling. The mcount function serves as the core mechanism for recording dynamic call graph arcs during program execution. Upon invocation, mcount examines the program's stack frame to determine the caller (parent) routine's address and the current (callee or ) function's address. It then increments a counter in an in-memory structure, using the call site as the primary key and the callee address as a secondary key to track the number of times each arc is traversed. This process builds a directed that captures the program's , including self-calls and , while the function also initializes necessary data structures on its first invocation. Cycles in the call graph, including those from recursion, are recorded as arcs but detected and collapsed into strongly connected components during post-processing. Leaf functions, which make no outgoing calls, have their execution times propagated directly to callers in post-processing based on call frequencies, as they contribute no descendant time. Integration with GCC's support occurs through linkage to the libgmon.a , which provides the implementation of mcount, internal routines like __mcount_internal, and cleanup functions such as mcleanup for dumping data to the gmon.out file at program termination. This ensures compatibility across separately compiled modules, as no special recompilation is required beyond using -pg during both compilation and linking. Unlike pure sampling-based profilers that approximate call frequencies through periodic interrupts, gprof's instrumentation approach yields exact call counts by explicitly logging each invocation, though at the cost of added runtime overhead from the inserted calls.

Runtime Profiling

Gprof collects runtime profiling data by integrating code with statistical sampling to capture both call frequencies and execution time estimates. The instrumentation aspect relies on the mcount function, which is automatically invoked at the entry of each profiled routine during with the -pg flag; this function records directed arcs in the call graph by identifying the caller-callee pair (using the return address on the stack) and incrementing the call count for that arc. These arc records form the structural backbone of the call graph, enabling later attribution of time to specific callers. Meanwhile, sampling provides time data: the operating system generates periodic interrupts via a , such as through the setitimer mechanism, at a default interval of 10 milliseconds (corresponding to a 100 Hz sampling rate). Each interrupt triggers a handler that records the current (PC) value, approximating the location of execution at that instant. The sampled PC values are aggregated into a , which serves as the primary source for estimating routine execution times. This is an array of fixed-size bins (typically 16-bit counters) covering the 's text segment, where each bin corresponds to a range of addresses and tallies the number of samples falling within it; the total samples multiplied by the sampling interval yield the 's overall , and per-bin counts estimate self-time for routines. Upon , the moncontrol or _mcleanup function writes the raw data to the gmon.out file in a binary format: a header with (e.g., text segment range, dimensions, and clock resolution), followed by the section, and then the section—a sequence of records each containing from-address, self-address, and call count for each . This structure allows post-processing to propagate -derived time estimates along the arcs, attributing inclusive time (self plus descendants) to callers while preserving exact call counts from mcount. Gprof provides partial support for dynamic linking and shared libraries, but with limitations stemming from loading. Both the main and shared libraries must be compiled with -pg to include code, yet dynamically loaded libraries may lack initialized structures, leading to incomplete arc recording or segmentation faults; symbol resolution across libraries relies on the runtime linker's , but accurate call graphs often require static linking (e.g., via -static-libgcc) or explicit of all dependencies to avoid missing data. To manage output in environments with multiple processes or parallel runs, the GMON_OUT_PREFIX allows customization of the gmon.out filename; if set, the prefix is prepended to the default name, and the process ID is appended (e.g., GMON_OUT_PREFIX=myprof yields myprof.gmon.out.1234), preventing overwrites while facilitating per-process data isolation.

Output Analysis

Gprof generates reports in two primary formats: the flat profile and the call graph, which users analyze to identify performance bottlenecks in their programs. The flat profile provides a straightforward summary of time spent in each , independent of calling relationships, while the call graph illustrates the hierarchical call structure and time attribution across functions. These outputs are derived from sampling-based measurements collected during program execution, allowing developers to pinpoint functions consuming the most . The flat profile is organized as a table sorted in decreasing order of seconds (time spent executing the itself, excluding time in called subroutines), followed by call count and then alphabetically by name. Key columns include:
  • % time: The percentage of total spent in the ( time relative to overall execution).
  • cumulative seconds: The cumulative time up to and including this , ordered by descending time.
  • self seconds: The direct execution time in the .
  • calls: The number of times the was invoked (or blank for unprofiled ).
  • self ms/call: Average time per call, in milliseconds.
  • total ms/call: Average total time ( plus ) per call, in milliseconds.
  • name: The name or .
This format helps users quickly identify hotspots, such as with high % time, without considering call hierarchies. For instance, in a sample output from a simple program, the flat profile might highlight a file I/O like open consuming 33.34% of runtime across 7208 calls, indicating a potential in repeated interactions.
     %   cumulative   self              self     total            
 time  seconds   seconds    calls  ms/call  ms/call  name    
 33.34      0.02     0.02     7208     0.00     0.00  open    
The call graph complements the flat profile by depicting the program's dynamic call structure as a of arcs between . Each entry begins with an index number (a consecutive for ), followed by the primary line showing the 's total time (self plus children), self time, and call count from its parent. Lines above the primary line represent callers, detailing how much time and calls originated from each, while lines below list callees (children) with arc descriptions like "calls=1/1" (actual calls out of estimated total). Inclusive time encompasses the and all its descendants, whereas exclusive time (self) excludes them. This visualization reveals not just individual hotspots but also how time propagates through the call stack, aiding in understanding indirect performance impacts. Cycles are collapsed into single entries during this analysis, with times propagated appropriately. Time propagation in the call graph attributes execution time from leaf functions upward to their callers based on call frequencies. Specifically, a function's total time is the sum of its self time and the propagated times from its children, weighted by the proportion of calls to each child; recursive cycles are treated as a single unit to avoid infinite loops in attribution. For example, if function A calls B 10 times and B's self time is 0.1 seconds, then 0.1 seconds propagates to A as part of B's contribution, added to A's total unless adjusted for other siblings. This mechanism ensures the call graph reflects the full cost of a function, including subroutine overhead, enabling users to trace bottlenecks back to root causes like excessive calls to costly routines. Users can customize output via command-line options to focus analysis. The -p option prints only the flat profile, suppressing the call graph, while -q prints only the call graph, omitting the flat profile; by default, gprof outputs both. The -z option includes functions with zero usage (never called or zero time) in the flat profile, useful for verifying completeness or spotting . These options, combined with symbol specifications (e.g., -p symspec), allow targeted reports for specific functions or patterns. In practice, interpreting these reports involves cross-referencing the flat profile for top consumers and the call graph for context; for the toy example above, the call graph might show open invoked heavily from a loop in main, confirming it as the primary bottleneck and guiding optimizations like batching I/O operations.

History

Berkeley Origins

Gprof was developed in the early 1980s at the University of California, Berkeley, by Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick, all affiliated with the Computer Science Division of the Electrical Engineering and Computer Science Department. The tool emerged from efforts to profile and optimize a code generator, addressing the challenges of evaluating abstractions in large, modular programs composed of small routines. This academic work built on the Unix tradition of execution profiling, aiming to provide insights into both routine-level execution times and inter-routine call relationships. The foundational description of gprof appeared in the 1982 paper "Gprof: A Call Graph Execution Profiler," presented at the 1982 SIGPLAN Symposium on Construction. A key innovation was its hybrid approach, combining instrumentation for precise tracing with sampling of the to estimate execution times, enabling the attribution of a routine's time to its callers in a hierarchical manner. This method allowed for low-overhead that reflected the program's logical structure, distinguishing it from purely statistical tools by incorporating dynamic call information. Gprof was released as part of the 4.2BSD Unix distribution in 1983, serving as an extension to the existing prof(1) tool for basic flat profiling. Its initial implementation was tailored specifically for the VAX architecture, with compiler support for inserting monitoring routines into C, Fortran 77, and Pascal programs at compile time. Integration with the BSD kernel was facilitated through configuration options like -p in config(8), allowing kernel profiling via utilities such as kgmon(8), which collected data into a mon.out file for post-processing with gprof. This setup enabled detailed analysis of kernel performance, such as optimizing pathname translation routines, while keeping overhead to 5-25% of execution time.

GNU Development

The GNU implementation of gprof was developed by Jay Fenlason in 1988 as a profiling tool compatible with the original Berkeley Unix version, specifically designed to work with the GNU C compiler. This effort addressed the need for performance analysis in the emerging GNU ecosystem, enabling developers to profile execution times and call graphs in programs compiled with GNU tools. Integration with the GNU Compiler Collection () was achieved through the -pg flag, which instruments during compilation to generate profiling data files compatible with gprof; this support has been available since early GCC versions, including the 1.x series released around the same period. As part of the GNU Binutils suite, gprof's development aligned with the broader binary utilities project, ensuring its distribution and maintenance under the GNU umbrella. Key enhancements in the GNU version focused on portability beyond BSD systems, allowing gprof to operate on diverse platforms and architectures supported by GNU tools. Improvements to output formatting provided more detailed and configurable reports, including enhanced visualizations and support for basic-block counting in the profiling . The version history of gprof is closely tied to releases, with incremental updates adding support for new architectures; for instance, the Binutils 2.x series from the early 1990s extended compatibility to platforms like and . From its inception, gprof has been released under the GNU General Public License (GPL), version 2 or later, promoting principles within the GNU project.

Limitations and Accuracy

Sampling Errors

Gprof's time measurements rely on statistical sampling of the at fixed s, introducing inherent inaccuracies due to the probabilistic nature of the process. The primary source of error stems from the variance in sample counts, where the expected relative error for a function's estimate is approximately $1/\sqrt{n}, with n representing the number of samples taken during its execution (calculated as the function's total divided by the sampling ). This Poisson-like means that shorter executions yield larger relative errors; for instance, with the sampling of 10 milliseconds (0.01 seconds), a 1-second run produces about 100 samples, resulting in an expected error of roughly 10% of the measured time. In contrast, longer runs, such as 100 seconds, increase n to 10,000, reducing the error to approximately 1%. A notable bias affects short functions, which execute in less time than the sampling interval and are thus likely to be underrepresented or entirely missed in the profile. If a function's execution duration is comparable to or shorter than 10 milliseconds, it may receive zero or few samples, leading to systematic underestimation of its time contribution, even if called frequently. gprof manual emphasizes that figures are unreliable when not substantially larger than the sampling period, highlighting this limitation for fine-grained code regions. These sampling errors propagate and amplify within the call graph, where self-times (directly sampled) are attributed to parents via post-processing that distributes child times upward along call arcs. Errors in a callee's estimated time directly influence the propagated times to its callers, potentially magnifying inaccuracies in higher-level functions, especially in deep call chains or cycles where times are aggregated without inter-cycle propagation. This attribution mechanism, while enabling a hierarchical view, compounds statistical variance from leaves to roots, reducing precision in inclusive time metrics for complex programs. To mitigate these errors, users can extend program runtime by increasing input sizes, thereby boosting sample counts and narrowing confidence intervals without altering the sampling mechanism. Accumulating data across multiple independent runs using the gprof -s option merges gmon.out files to effectively increase n, improving accuracy for the same total execution effort. Finer sampling intervals are possible through system-level adjustments, such as configuring higher-frequency timers (e.g., modifying clock ticks), though this introduces trade-offs like increased overhead and potential compatibility issues on certain platforms.

Overhead and Compatibility Issues

Gprof introduces notable performance overhead primarily through its mechanism, which inserts calls to the mcount() function at each routine entry to record data. This can result in execution slowdowns ranging from 30% to over 260% in call-intensive programs, such as those with frequent small function invocations or object-oriented designs, where the added cost distorts timings significantly. In contrast, the sampling component— which captures values at approximately 100 Hz via operating system interrupts—imposes minimal additional overhead, typically a few microseconds per sample, though it accumulates over extended runs and is more pronounced in signal-based implementations compared to kernel-assisted ones. Compatibility challenges further limit Gprof's applicability. It provides poor support for multi-threaded programs, as the mcount() implementation in libraries like is not thread-safe, leading to inaccurate or missing per-thread and potential conditions in call counts. Similarly, kernel-mode is unsupported, as Gprof targets user-space applications and lacks mechanisms for kernel . For fully dynamic shared libraries, symbol mismatches and segmentation faults can occur if executes before library initialization, often requiring static linking (-static or -static-libgcc) as a . Platform constraints are most evident outside systems. Gprof performs best on Unix environments such as and , where it integrates seamlessly with and the binutils suite. On Windows, support is limited to POSIX-emulating layers like or , which may introduce additional inaccuracies due to differing runtime behaviors. Embedded systems, such as , require custom adaptations like modified toolchains and no-OS environments to enable . To mitigate these issues, Gprof is best suited for single-threaded during , where overhead can be tolerated for initial identification; it should be avoided in or multi-threaded scenarios to prevent distortion or crashes.

Legacy and Modern Context

Historical Reception

Upon its release in the early , gprof received significant recognition within the programming languages community for introducing profiling, a method that attributes execution time across calling relationships in programs. The original paper presenting gprof, "gprof: A Execution Profiler" by Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick (presented at the 1982 SIGPLAN Symposium on Compiler Construction), was selected as one of 50 most influential papers in a of the ACM SIGPLAN Conference on (PLDI) spanning 1979 to 1999, based on nominations, , and for and excellence by a of past PLDI chairs. This inclusion highlighted its enduring contribution to tools, placing it among four standout papers from 1982. By the , gprof had achieved widespread adoption as a standard component of Unix toolchains, integrated into (BSD) systems and the GNU Project's binutils, enabling developers to routinely profile programs for optimization. It extended and largely superseded earlier flat profilers like AT&T's prof by providing hierarchical insights, influencing subsequent profiling methodologies in environments. This integration facilitated its use across academic, research, and commercial software development, particularly in for systems software. Contemporary literature praised gprof for its innovative simplicity in delivering actionable data without excessive complexity, despite acknowledged inaccuracies in handling shared subroutines and recursive calls. A analysis noted these limitations, such as erroneous time attribution in programs with common subroutines, but emphasized that gprof's straightforward design made it invaluable for initial investigations in production settings. Overall, through the and into the early , gprof remained a cornerstone tool in BSD and ecosystems, balancing utility with ease of use amid growing software complexity.

Successors and Alternatives

Over time, Gprof has become outdated for many modern applications due to its lack of native support for multi-threaded programs and its reliance on , which introduces significant overhead depending on the . These limitations make it unsuitable for shared libraries or concurrent code without custom modifications, leading to incomplete or inaccurate results in contemporary environments. A direct successor addressing these issues is gprofng, first integrated into the GNU Binutils suite with version 2.39 in August 2022 as the next-generation GNU profiler. Derived from Oracle's Sun Studio profiler lineage, gprofng uses sampling-based techniques without requiring program recompilation, enabling low-overhead analysis of production binaries written in C, C++, Java, or Scala. It provides full support for shared libraries and multi-threaded applications, generating call graphs and performance metrics that resolve Gprof's threading and instrumentation shortcomings. Beyond gprofng, several alternatives have gained prominence for their advanced capabilities. Perf, a kernel-based sampler, offers low-overhead via counters, capturing events like misses and predictions without modifications. Valgrind's Callgrind tool provides deterministic call-graph with detailed instruction-level data, ideal for and in single- or multi-threaded . VTune Profiler leverages for comprehensive , including threading efficiency and GPU utilization, making it suitable for workloads. Gprof remains included in the latest Binutils release (2.45.1 as of November 2025), but it is now primarily recommended for legacy single-threaded analysis where minimal setup is needed. For ongoing development, migration to gprofng is advised to handle modern binaries effectively while maintaining compatibility with ecosystems.

References

  1. [1]
    gprof(1) - Linux manual page - man7.org
    gprof produces an execution profile of C, Pascal, or Fortran77 programs. The effect of called routines is incorporated in the profile of each caller.Synopsis Top · Description Top · Options Top
  2. [2]
    Top (GNU gprof)
    ### Summary of Gprof
  3. [3]
    Gprof - | HPC @ LLNL - Lawrence Livermore National Laboratory
    Gprof is a performance analysis tool used to profile applications to determine where time is spent during program execution. Gprof is included with most ...Output · Flat Profile · Call Graph
  4. [4]
    GNU gprof - Sourceware
    This manual describes the GNU profiler, gprof, and how you can use it to determine which parts of a program are taking most of the execution time.4 Gprof Command Summary · 5 Interpreting Gprof 's... · 5.2 The Call Graph
  5. [5]
    Binutils - GNU Project - Free Software Foundation
    GNU Binutils are a collection of binary tools, including ld and as, used to compile and link programs on GNU systems.
  6. [6]
    [PDF] gprof: a Call Graph Execution Profiler1
    We provide a profile in which the execution time for a set of routines that implement an abstraction is collected and charged to that abstraction. The profile ...
  7. [7]
  8. [8]
  9. [9]
  10. [10]
  11. [11]
  12. [12]
    Instrumentation Options (Using the GNU Compiler Collection (GCC))
    Only arcs that are not on the spanning tree have to be instrumented: the compiler adds code to count the number of times that these arcs are executed.Missing: mcount | Show results with:mcount
  13. [13]
    GNU gprof - Implementation
    ... mcount , depending on the OS and compiler) as one of its first operations ... Therefore, the time measurements in gprof output say nothing about time that your ...
  14. [14]
  15. [15]
  16. [16]
  17. [17]
    Compiling (GNU gprof) - Sourceware
    If you are running the program on a system which supports shared libraries you may run into problems with the profiling support code in a shared library being ...
  18. [18]
    Executing (GNU gprof) - Sourceware
    Set the GMON_OUT_PREFIX environment variable; this name will be appended with the PID of the running program. ... gprof options executable-file gmon.out bb ...
  19. [19]
    Interpreting gprof's Output
    The functions are sorted by first by decreasing run-time spent in them, then by decreasing number of calls, then alphabetically by name. The functions `mcount' ...
  20. [20]
    404 Not Found
    **Summary:**
  21. [21]
    GNU gprof - Output Options
    The `--function-ordering' option causes gprof to print a suggested function ordering for the program based on profiling data. This option suggests an ordering ...Missing: mcount | Show results with:mcount
  22. [22]
    GNU gprof
    This manual describes the GNU profiler, gprof, and how you can use it to determine which parts of a program are taking most of the execution time.
  23. [23]
    Gprof: A call graph execution profiler - ACM Digital Library
    Gprof: A call graph execution profiler. SIGPLAN '82: Proceedings of the 1982 SIGPLAN symposium on Compiler construction.
  24. [24]
    [PDF] Bug fixes and changes in 4.2BSD July 28, 1983 - RogueLife.org
    Jul 28, 1983 · gprof. Is a new profiling tool which displays execution time for the dynamic call graph of a program. Gprof works on C, Fortran, and Pascal ...
  25. [25]
    [PDF] Using gprof to Tune the 4.2BSD Kernel
    May 21, 1984 · This paper describes how the gprof profiler accounts for the running time of called routines in the running time of the routines that call ...
  26. [26]
    GNU's Bulletin, vol. 1 no. 5 - GNU Project - Free Software Foundation
    Gprof replacement Foundation staffer Jay Fenlason has recently completed a profiler to go with GNU C, compatible with `gprof' from Berkeley Unix. We hope it ...
  27. [27]
    [PDF] GNU gprof - Sourceware
    gnu gprof was written by Jay Fenlason. Eric S. Raymond made some minor corrections and additions in 2003. Copyright c 1988-2025 Free Software Foundation, Inc.
  28. [28]
    Sampling Error (GNU gprof)
    ### Summary of Sampling Error in GNU gprof
  29. [29]
    Implementation (GNU gprof)
    ### Summary of Overhead and Quantitative Estimates in GNU gprof Implementation
  30. [30]
    [PDF] Low-Overhead Call Path Profiling of Unmodified, Optimized Code
    Apr 29, 2005 · We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-driven ...
  31. [31]
    Tutorial: Using GNU Profiling (gprof) with ARM Cortex-M
    Aug 23, 2015 · This tutorial explains how to profile an embedded application (no RTOS needed) on ARM Cortex-M devices with GNU gprof.
  32. [32]
    20 Years of the ACM SIGPLAN Conference on Programming ...
    ... 50 papers published in PLDI from 1979 through 1999. Our committee used impact and technical excellence as the primary selection criteria. We included with ...
  33. [33]
    Practical experience of the limitations of gprof - Varley - 1993
    Abstract. The Unix profiling tool Gprof produces an execution profile in which the time taken by each routine is added to that of its caller. This communication ...
  34. [34]
    Perf vs gprof: Comparing software performance profiling tools
    Dec 13, 2022 · In this article, I'll compare two profiling tools: perf, an older, well-known, tool vs. gprofng, a newer tool (released in March 2021).
  35. [35]
    Runtime profiling - HPC Wiki
    Jul 19, 2024 · Runtime profilers allow to measure the runtime distribution across the application code. Two profiler variants exist: Instrumentation based and sampling based.
  36. [36]
    gprofng: The Next Generation GNU Profiling Tool | linux - Oracle Blogs
    Jan 26, 2023 · In August 2022, gprofng was made available as part of the open source GNU binutils tool suite. The binutils tools can be downloaded here.
  37. [37]
    GNU gprofng
    Jul 19, 2024 · The gprofng tool is the next generation profiler for Linux. It consists of various commands to generate and display profile information.
  38. [38]
    perf: Linux profiling with performance counters
    Aug 10, 2024 · Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches ...perf: Linux profiling with... · Introduction
  39. [39]
    Callgrind: a call-graph generating cache and branch prediction profiler
    Callgrind is a profiling tool that records the call history among functions in a program's run as a call-graph.
  40. [40]
    Fix Performance Bottlenecks with Intel® VTune™ Profiler
    Intel VTune Profiler optimizes application performance, system performance, and system configuration for AI, HPC, cloud, IoT, media, storage, and more.