Fact-checked by Grok 2 weeks ago

Coremark

CoreMark is a standardized suite designed to evaluate the core performance of central processing units (CPUs) and microcontrollers (MCUs) in systems by executing a set of representative algorithms and producing a comparable single-number score. Developed by the Embedded Microprocessor Benchmark Consortium (EEMBC), was introduced in June 2009 as a free, open-source alternative to outdated benchmarks like , which suffered from synthetic code, heavy compiler optimization biases, and lack of standardization in reporting. The motivation behind its creation was to provide a meaningful, real-world metric for CPU efficiency in resource-constrained environments, focusing on common data structures and operations rather than misleading indicators like clock speed or floating-point operations per second. At its core, CoreMark incorporates three primary algorithms to test diverse aspects of processor functionality: list processing, which involves linked lists with pointer manipulations, sorting, and reversal to assess memory access and cache behavior; matrix manipulation, featuring integer matrix multiplications with constants, vectors, and other matrices to evaluate loop efficiency and instruction set accelerators; and state machine behavior, using switch and if-else statements to process input strings and count state transitions, probing control flow and branching performance. These algorithms are self-verifying through 16-bit cyclic redundancy checks (CRC) and require only about 2 KB of memory, making the benchmark suitable for devices ranging from 8-bit MCUs to 64-bit processors. To prevent compiler optimizations from skewing results, all data is initialized at runtime, and the timed execution avoids external library calls. Performance is quantified in CoreMarks, representing iterations completed per second, often normalized as CoreMarks/MHz to account for clock speed variations, with results requiring disclosure of versions, flags, and details for transparency. Certified scores, verified by EEMBC's lab, ensure adherence to strict rules, and the benchmark has become the for embedded CPU comparisons, with extensions like CoreMark-Pro adding parallelism and floating-point workloads, and integrations in power efficiency suites such as ULPMark. Widely adopted by vendors, CoreMark facilitates objective evaluations in applications from devices to automotive systems.

Overview

Definition and Purpose

CoreMark is an industry-standard developed by the Embedded Microprocessor Benchmark Consortium (EEMBC) to evaluate the performance of central processing units (CPUs) and microcontrollers (MCUs) in embedded systems. It serves as a simple, portable tool for measuring core CPU efficiency, independent of specific system characteristics such as or processor architecture. The primary purpose of CoreMark is to deliver a single, standardized metric that focuses on workloads common in applications, facilitating fair and repeatable comparisons across diverse platforms and compilers. By isolating core performance from external factors like memory subsystems and I/O operations, it provides a quick indicator of capabilities suitable for quick evaluations. Key design principles emphasize hardware performance over compiler optimizations, achieved through realistic mixtures of read/write, integer, and control operations that prevent pre-computable results and misleading synthetic metrics. The benchmark excludes external library calls, relying on compact ANSI C code limited to 16 KB in size, to ensure scores accurately reflect the processor's intrinsic abilities without dependencies on system-specific features. Introduced to overcome limitations in legacy benchmarks, CoreMark simulates practical embedded tasks involving data manipulation and control flow, such as list processing and matrix operations.

Development History

CoreMark was developed in 2009 by Shay Gal-On, then director of software development at the Embedded Microprocessor Benchmark Consortium (EEMBC), to address the limitations of existing embedded processor benchmarks and provide a modern standard for measuring CPU performance in resource-constrained environments. The benchmark emerged as a response to the inadequacies of older metrics like , which had become outdated for contemporary embedded systems. EEMBC launched 1.0 on June 1, 2009, making it the organization's first openly available benchmark, distributed freely under a permissive license to encourage widespread adoption and porting across diverse architectures. The initial release included comprehensive , such as a and webinar resources from September 2009, to facilitate implementation on various platforms while adhering to strict execution rules for consistent scoring. By 2010, had surpassed 2,000 downloads, establishing it as a de facto industry standard for performance evaluation. A key milestone occurred in 2012 when EEMBC highlighted its growing impact, with nearly 8,000 users having downloaded the benchmark, underscoring its role in standardizing measurements under EEMBC's oversight. EEMBC integrated into its certification processes through a dedicated lab, ensuring verified scores for official submissions and promoting reliability across processor vendors. CoreMark's evolution continued with ongoing updates to enhance portability, including adaptations for new compiler toolchains and architectures while maintaining its core simplicity. In 2015, EEMBC introduced as an advanced variant, incorporating multiple workloads for more comprehensive testing of multi-core and higher-end processors, though the original remained the baseline for basic embedded benchmarking. In October 2023, EEMBC merged with the (SPEC) to form SPEC's Embedded Group, which continues to maintain , certify scores, and support its periodic refinements for evolving embedded ecosystems, solidifying its position as a standardized tool for performance assessment.

Methodology

Algorithms Used

CoreMark employs three distinct algorithms designed to evaluate key aspects of embedded performance, focusing exclusively on operations without or input/output dependencies. These algorithms—list processing, matrix manipulation, and state machine—use fixed-size datasets to ensure portability across different architectures and compilers, with a 16-bit () integrated for self-verification of outputs. The processing simulates common access patterns and in applications by performing find and sort operations on a . It operates on a number of elements determined by pointer size and the available block, consisting of list headers and data items, where the list is initialized with a mix of sequential and non-sequential pointers to test efficiency and pointer manipulation. Operations include searching for specific values, reversing the list structure, and using a on 16-bit data and indices, all without relying on dynamic allocation to maintain portability. This targets computations and handling typical in resource-constrained systems. The is computed on the list data to verify integrity. Matrix manipulation assesses arithmetic throughput and by executing and on small matrices of fixed dimensions derived from the block. It uses two 16-bit input matrices and one 32-bit output matrix, with operations involving constants, vectors, or full matrices, including bit extractions to exercise data handling. These computations mimic or tasks in devices, emphasizing efficient math without specialized instructions unless naturally supported by the . The fixed matrix sizes prevent variability in execution time across platforms, and is applied to the results for validation. The state machine algorithm models behavioral control logic found in embedded systems, such as protocol parsing, by implementing a simple that processes a byte stream to detect valid comma-separated numbers. It transitions through nine states, counting visits and restoring any corrupted data, which tests branch prediction and conditional execution. The input stream is dynamically initialized to a fixed size, ensuring consistent workload while evaluating overhead in integer-based decision-making. CRC verifies the state machine outputs. The algorithms are executed sequentially in each iteration, with the benchmark running multiple iterations until the total execution time reaches at least 10 seconds to ensure statistical reliability; their combined iterations form the basis for the final score.

Execution and Porting Rules

CoreMark porting requires adapting the benchmark code to the target solely through modifications to the platform-specific porting layer files, such as core_portme.h, core_portme.c, and core_portme.mak, without altering the core algorithm logic in files like core_main.c, core_list_join.c, core_matrix.c, core_state.c, or core_util.c. This ensures portability across diverse embedded systems while maintaining the benchmark's integrity, as the implementation relies exclusively on standard with integer arithmetic and prohibits the use of external libraries or floating-point operations. For example, matrix manipulations and list processing routines must remain unchanged, preserving their original integer-based computations. During execution, CoreMark runs in an iterative loop, performing the core algorithms repeatedly until accumulating at least 10 seconds of wall-clock time to ensure statistical reliability, though longer runs (e.g., 30 seconds) are recommended for precision. The benchmark supports a default single dataset via a fixed block size of 2000 bytes, but this can be scaled by adjusting the block size or iteration count to suit varying capabilities and difficulty levels. Comprehensive reporting is mandatory, detailing the version, optimization flags, specifications (e.g., clock speed and configuration), and any parallel execution details to enable fair comparisons. Validation is enforced through built-in cyclic redundancy check (CRC) mechanisms that compute expected checksums for the list join, matrix multiplication, and state machine components using predefined seed values (e.g., 0, 0, 0x66 or 0x3415, 0x3415, 0x66) and buffer sizes; discrepancies indicate invalid implementations or illicit optimizations that bypass computational work. For official , vendors must submit their results, including and build artifacts, to the EEMBC lab, where they undergo rigorous to confirm adherence to run rules. Time measurement employs platform-specific timers integrated into the porting layer (e.g., via start_time() and stop_time() functions), capturing elapsed wall-clock time to derive rates, with scores normalized by megahertz (CoreMark/MHz) to mitigate biases from differing clock speeds in raw reporting. This normalization facilitates architecture-agnostic comparisons, focusing on efficiency rather than absolute speed.

Scoring System

The CoreMark score is calculated as the total number of iterations completed across all algorithms divided by the execution time in seconds, where each iteration represents a complete of the benchmark's workloads. This formula yields iterations per second, providing a direct measure of throughput while ensuring the benchmark runs for a minimum duration, typically at least 10 seconds, to minimize timing inaccuracies. To facilitate comparisons across processors operating at different clock speeds, scores are often normalized as CoreMarks per MHz by dividing the raw score by the processor's frequency in megahertz. Compiler optimizations can influence results, so reports include details on the compiler version and flags used, though the primary emphasis remains on performance rather than software tuning. CoreMark employs a single-number score for straightforward reporting and comparison, with full disclosure of parameters such as seed values (e.g., 0, 0, 0x66), buffer sizes (e.g., 2000 bytes), and platform specifics required for reproducibility. Certified scores, verified by the EEMBC Certification Lab, undergo rigorous analysis to confirm adherence to official run and reporting rules, distinguishing them from self-reported results. A score of 1.0 corresponds to the performance of a reference implementation executed on a baseline processor under standardized conditions, serving as the foundational metric against which higher values indicate superior performance.

Comparisons with Other Benchmarks

Versus Dhrystone

Dhrystone, developed in the 1980s, suffers from several limitations that undermine its reliability as a benchmark for modern embedded processors. Its code base is outdated and synthetic, making it highly susceptible to aggressive compiler optimizations, such as loop unrolling or inlining, which can bypass intended computational work and inflate scores disproportionately compared to real application performance. Additionally, Dhrystone heavily relies on string library functions like strcmp() and strcpy(), which account for 10-20% of execution time and primarily measure library optimization rather than core processor capabilities. Its scoring in DMIPS (Dhrystone MIPS) or VAX-equivalent MIPS is based on an obsolete reference machine (VAX 11/750), resulting in non-intuitive metrics that do not reflect contemporary embedded workloads involving real-time control or data processing. CoreMark was explicitly developed by EEMBC in 2009 as a replacement for Dhrystone to address these vulnerabilities, particularly its exposure to compiler tweaks that distort meaningful performance evaluation. Unlike Dhrystone, CoreMark employs non-optimizable algorithms—such as list processing, matrix manipulation, state machine behaviors, and CRC computations—that are driven by runtime values and include self-verification checks to prevent code elimination or pre-computation by compilers. It excludes external library calls from the timed execution portion, ensuring that all measured code is self-contained and focused on intrinsic processor performance. Furthermore, CoreMark uses an iteration-based scoring system (iterations per second, normalized to CoreMark/MHz) that emphasizes portable integer workloads representative of embedded tasks like data manipulation and control logic, without relying on architecture-specific conversions. These design choices highlight fundamental differences between the benchmarks: prioritizes synthetic, MIPS-like performance metrics that are prone to variability across tools and hardware, whereas delivers hardware-centric, directly comparable scores that better align with the needs of modern embedded systems. By enforcing strict run and reporting rules—such as specifying exact versions—CoreMark ensures reproducibility and fairness, mitigating the standardization gaps that plague results.

Versus Other Embedded Benchmarks

CoreMark-Pro extends the original CoreMark benchmark by incorporating additional workloads that include floating-point operations and multi-threaded execution, providing broader coverage of processor capabilities beyond basic performance. Specifically, while CoreMark consists of a single workload with four functions focusing on list processing, matrix manipulation, state machine operations, and calculations, CoreMark-Pro adds five workloads—such as compression, compression, XML parsing, SHA-256 hashing, and an enhanced memory-intensive variant—and four floating-point workloads, including FFT, linear algebra derived from LINPACK, improved Livermore loops, and a algorithm. This expansion allows CoreMark-Pro to evaluate memory subsystems and diverse performance characteristics in 32-bit to 64-bit microprocessors, contrasting with CoreMark's emphasis on core pipeline efficiency in simpler 8-bit to 64-bit devices. However, CoreMark remains preferable for basic testing due to its smaller footprint—requiring only 2 KB of code and 16 KB of data—making it simpler to port and execute on resource-limited microcontrollers. In comparison to EEMBC's ULPMark suite, particularly ULPMark-CoreMark, prioritizes raw performance metrics without incorporating power consumption analysis. ULPMark-CoreMark builds directly on by measuring iterations per milli-Joule, integrating alongside performance through optimized configurations at varying voltages, such as a performance-focused mode and energy-efficient modes at the lowest voltage or 3V baseline. This results in ULPMark providing a holistic view of active-power efficiency for ultra-low-power MCUs, whereas focuses exclusively on computational throughput in iterations per second, suitable for scenarios where power profiling is not required. Relative to academic and industry alternatives like MiBench, offers superior portability and a minimal resource footprint tailored for microcontrollers. MiBench, a suite of 16 application-specific benchmarks simulating embedded workloads such as automotive, , and consumer applications, demands more complex setups involving larger codebases and dependencies, making it less ideal for constrained MCU environments. , by contrast, employs a single synthetic program with straightforward algorithms that avoid external libraries like malloc, ensuring easy porting across architectures and emphasizing single-threaded core speed over detailed application simulations. As of 2025, Embench represents another open-source alternative, with versions like Embench IOT 2.0 and DSP 1.0 focusing on realistic, portable workloads for modern IoT and digital signal processing in embedded systems. Unlike CoreMark's synthetic algorithms, Embench draws from real-world applications to better reflect connected device behaviors, while maintaining low resource demands; however, CoreMark's standardized, certified scoring continues to make it a preferred choice for simple core performance comparisons in industry. Unlike the SPEC CPU benchmark suite, which is designed for desktops and servers to evaluate workloads, CoreMark is optimized for resource-constrained systems with its lightweight design and focus on MCU-relevant operations. SPEC CPU includes diverse and floating-point tests that require substantial and resources, often exceeding the capabilities of typical devices, while CoreMark's minimal 2 code size and avoidance of complex I/O or threading enable reliable execution on low-end without specialized setups.

Adoption and Results

Usage in Industry

CoreMark has been widely adopted by semiconductor vendors for evaluating and marketing microcontroller units (MCUs) in systems, with companies such as , , , , and submitting certified scores to EEMBC for validation and comparative purposes. These vendors leverage CoreMark to demonstrate processor performance in product datasheets and technical specifications, enabling fair comparisons across diverse architectures from 8-bit to 64-bit devices. The benchmark's strict porting rules, which emphasize platform-neutral implementation, have facilitated this broad uptake by ensuring consistent and verifiable results across vendor ecosystems. In practical applications, supports processor selection during the development of devices, automotive electronic control units (ECUs), and , where it provides a simple metric for assessing computational efficiency in resource-constrained environments. For instance, Renesas integrates into its MCU portfolios for automotive and industrial applications, using it to highlight performance in and other processing tasks. Its integration into compiler toolchains, such as via Embedded Toolchain and vendor-specific SDKs like those from and , allows for automated testing and optimization during software development. CoreMark is often combined with power consumption metrics in benchmarks like ULPMark-CoreMark, which extends its utility to low-power designs by measuring in active scenarios, a critical factor for battery-operated and wearable consumer products. The open-source release of on in 2009, with ongoing maintenance by EEMBC, has enabled its use in academic research and custom extensions, while over 800 scores—many certified—have been submitted to EEMBC since then (as of November 2025, 858 scores are listed), influencing iterative improvements in chip designs across the industry.

Example Performance Scores

CoreMark performance scores vary significantly across processor architectures, influenced primarily by factors such as clock speed, depth, and size. For instance, the Cortex-M0 processor achieves an official rating of 2.33 CoreMarks/MHz, while the Cortex-M0+ variant improves to 2.46 CoreMarks/MHz, reflecting enhancements in instruction execution efficiency. The Cortex-M4, with its more advanced and optional , attains 3.54 CoreMarks/MHz, enabling higher throughput in tasks. Higher-end processors like the series can exceed 10 CoreMarks/MHz under optimized conditions; for example, the Allwinner H616 based on Cortex-A53 cores reports 13.10 CoreMarks/MHz at 1.5 GHz. These per-MHz metrics allow fair comparisons independent of clock frequency, though total scores scale with it—for a typical 100 MHz using a Cortex-M4 core, total CoreMarks might range from 300 to 400, depending on implementation details like memory access latency. Only scores verified by the EEMBC Certification Lab are considered official and eligible for the CoreMark logo, ensuring adherence to strict run rules; a baseline certified score of 721 CoreMarks has been established for devices like certain Renesas RX family processors running at around 120 MHz. The public EEMBC database, which includes both certified and self-reported results, reveals variations due to compiler choices—for example, the Renesas RA4C1 (Cortex-M4) achieves 314 CoreMarks with Arm Compiler 6.16 but 324 CoreMarks with IAR Compiler at 80 MHz, a difference attributable to optimization levels and code generation efficiency. GCC and IAR compilers often show 5-10% discrepancies in similar setups, underscoring the importance of standardized reporting. As of recent uploads to the EEMBC database (as of November 2025), top scores for advanced microcontrollers demonstrate 's scalability; the Renesas RA8T2 using a Cortex-M85 core reaches 6,379 at 1 GHz (6.38 CoreMarks/MHz), far surpassing entry-level devices and highlighting gains from deeper pipelines and larger caches in modern embedded systems. These examples illustrate how quantifies performance evolution, with the scoring system (detailed elsewhere) applied consistently to yield these iteration-based results.
Processor ExampleCoreMarks/MHzTotal CoreMarksClock SpeedCompilerCertified?Source
ARM Cortex-M0 (generic)2.33N/AN/AN/AOfficial RatingARM Developer
Renesas RA4C1 (Cortex-M4)3.9331480 MHzArm 6.16NoEEMBC Scores
Allwinner H616 (Cortex-A53)13.1019,6561.5 GHzGCC 7.5.0NoEEMBC Scores
Renesas RA8T2 (Cortex-M85)6.386,3791 GHzArm CompilerNoEEMBC Scores
Baseline Device (e.g., Renesas RX)~6.01721~120 MHzN/AYesEEMBC CoreMark

References

  1. [1]
    CoreMark - CPU Benchmark - EEMBC
    EEMBC's CoreMark is a benchmark that measures the performance of microcontrollers (MCUs) and central processing units (CPUs) used in embedded systems.Scores · Download · FAQ · CoreMark-PRO
  2. [2]
    None
    ### Key Facts About CoreMark
  3. [3]
    eembc/coremark - GitHub
    CoreMark® is an industry-standard benchmark that measures the performance of central processing units (CPU) and embedded microcrontrollers (MCU).
  4. [4]
    CoreMark - CPU Benchmark - EEMBC
    CoreMark is a simple, yet sophisticated, benchmark that tests the functionality of a microctronoller (MCU) or central processing unit (CPU).Missing: definition | Show results with:definition
  5. [5]
    [PDF] MICROPROCESSOR - halfhill.com
    Jun 8, 2009 · In this case, because coremark is a single, small program, all the code was written by Shay Gal-on, eemBc's direc- tor of software ...
  6. [6]
    FOR IMMEDIATE RELEASE - EEMBC
    June 1, 2009 — The Embedded Microprocessor Benchmark Consortium today announced the availability of CoreMark 1.0, the first ...
  7. [7]
    EEMBC launches MIPS busting benchmark - New Electronics
    Jun 8, 2009 · The Embedded Microprocessor Benchmark Consortium (EEMBC) has launched CoreMark 1.0, its first openly available benchmark.
  8. [8]
    How to Port the CoreMark(R) Benchmark: A Webinar ... - YouTube
    Feb 2, 2020 · ... processors. How to Port the CoreMark(R) Benchmark: A Webinar from September 2009. 6.6K views · 5 years ago ...more. EEMBC. 62.Missing: guide | Show results with:guide
  9. [9]
    FOR IMMEDIATE RELEASE - EEMBC
    July 29, 2010 — The Embedded Microprocessor Benchmark Consortium (EEMBC) today announced that its CoreMark 1.0 benchmark has exceeded ...
  10. [10]
    FOR IMMEDIATE RELEASE - EEMBC
    Jul 9, 2012 · The best known of EEMBC's processor benchmarks is CoreMark, a freely licensed benchmark that has been downloaded by almost 8,000 users to ...
  11. [11]
    MCU Performance Benchmark – CoreMark - EEMBC
    This table lists all of the scores uploaded by members or licensees. It is not required that members or licensees upload scores, and in many cases the scores ...
  12. [12]
    CoreMark-PRO - CPU Benchmark - EEMBC
    CoreMark-PRO is a comprehensive, advanced processor benchmark that works with and enhances the market-proven industry-standard EEMBC CoreMark benchmark.
  13. [13]
    Benchmark Stresses Big Chips - EE Times
    Feb 26, 2015 · The suite of benchmarks for 32- and 64-bit processors expands the original CoreMark, a single performance test released in 2009 for ...
  14. [14]
    EEMBC BenchPress Newsletter Q1'2022
    We introduced the CoreMark benchmark in June of 2009, and it quickly became the de-facto replacement for Dhrystone on embedded devices.
  15. [15]
  16. [16]
    None
    ### Summary of Issues with Dhrystone Benchmark
  17. [17]
    CM - ULPMark-CoreMark - CPU Energy Benchmark - EEMBC
    ULPMark-CoreMark is EEMBC's first active-power embedded benchmark. ULPMark-CoreProfile and ULPMark-PeripheralProfile both focus on sleepy-node energy ...
  18. [18]
    A free, commercially representative embedded benchmark suite
    This paper examines a set of commercially representative embedded programs and compares them to an existing benchmark suite, SPEC2000.<|separator|>
  19. [19]
  20. [20]
    [PDF] AN720 - Silicon Labs
    The. CoreMark software provides a score that rates how fast the core and code is, providing a relative comparison between various toolchain options and settings ...
  21. [21]
    CoreMark integration - Technical Documentation
    Apr 30, 2025 · You can see the submitted scores in the official CoreMark database. If you want to submit your own score, follow steps in the CoreMark GitHub ...<|control11|><|separator|>
  22. [22]
    Cortex-M0 Product Support - Arm Developer
    The Cortex-M0 has an exceptionally small silicon area, low power and minimal code footprint, enabling developers to achieve 32-bit performance at an 8-bit price ...Missing: M4 | Show results with:M4
  23. [23]
    Cortex-M0+ Product Support - Arm Developer
    Arm Virtual Hardware. Characteristics. Performance efficiency: 2.46 CoreMark/MHz * and 0.99/1.30/2.58 DMIPS/MHz **. Arm Cortex-M0+ Implementation Data ***.<|control11|><|separator|>
  24. [24]
    Cortex-M4 Product Support - Arm Developer
    Arm Virtual Hardware. Characteristics. Performance efficiency: 3.54CoreMark/MHz * and (with FPU): 1.26 /1.67 /3.65DMIPS/MHz **. Arm Cortex-M4 Implementation ...
  25. [25]
  26. [26]