Coremark

CoreMark is a standardized benchmark suite designed to evaluate the core performance of central processing units (CPUs) and microcontrollers (MCUs) in embedded systems by executing a set of representative algorithms and producing a comparable single-number score.^[1] Developed by the Embedded Microprocessor Benchmark Consortium (EEMBC), CoreMark was introduced in June 2009 as a free, open-source alternative to outdated benchmarks like Dhrystone, which suffered from synthetic code, heavy compiler optimization biases, and lack of standardization in reporting.^[1]^[2] The motivation behind its creation was to provide a meaningful, real-world metric for CPU efficiency in resource-constrained environments, focusing on common data structures and operations rather than misleading indicators like clock speed or floating-point operations per second.^[2] At its core, CoreMark incorporates three primary algorithms to test diverse aspects of processor functionality: list processing, which involves linked lists with pointer manipulations, sorting, and reversal to assess memory access and cache behavior; matrix manipulation, featuring integer matrix multiplications with constants, vectors, and other matrices to evaluate loop efficiency and instruction set accelerators; and state machine behavior, using switch and if-else statements to process input strings and count state transitions, probing control flow and branching performance.^[2] These algorithms are self-verifying through 16-bit cyclic redundancy checks (CRC) and require only about 2 KB of memory, making the benchmark suitable for devices ranging from 8-bit MCUs to 64-bit processors.^[1]^[2] To prevent compiler optimizations from skewing results, all data is initialized at runtime, and the timed execution avoids external library calls.^[2] Performance is quantified in CoreMarks, representing iterations completed per second, often normalized as CoreMarks/MHz to account for clock speed variations, with results requiring disclosure of compiler versions, flags, and hardware details for transparency.^[1]^[2] Certified scores, verified by EEMBC's lab, ensure adherence to strict rules, and the benchmark has become the de facto standard for embedded CPU comparisons, with extensions like CoreMark-Pro adding parallelism and floating-point workloads, and integrations in power efficiency suites such as ULPMark.^[1]^[2] Widely adopted by semiconductor vendors, CoreMark facilitates objective evaluations in applications from IoT devices to automotive systems.^[1]

Overview

Definition and Purpose

CoreMark is an industry-standard benchmark developed by the Embedded Microprocessor Benchmark Consortium (EEMBC) to evaluate the performance of central processing units (CPUs) and microcontrollers (MCUs) in embedded systems.^[3]^[4] It serves as a simple, portable tool for measuring core CPU efficiency, independent of specific system characteristics such as endianness or processor architecture.^[4] The primary purpose of CoreMark is to deliver a single, standardized metric that focuses on integer workloads common in embedded applications, facilitating fair and repeatable comparisons across diverse hardware platforms and compilers.^[2]^[4] By isolating core performance from external factors like memory subsystems and I/O operations, it provides a quick indicator of processor capabilities suitable for quick evaluations.^[4] Key design principles emphasize hardware performance over compiler optimizations, achieved through realistic mixtures of read/write, integer, and control operations that prevent pre-computable results and misleading synthetic metrics.^[2]^[4] The benchmark excludes external library calls, relying on compact ANSI C code limited to 16 KB in size, to ensure scores accurately reflect the processor's intrinsic abilities without dependencies on system-specific features.^[4] Introduced to overcome limitations in legacy benchmarks, CoreMark simulates practical embedded tasks involving data manipulation and control flow, such as list processing and matrix operations.^[2]

Development History

CoreMark was developed in 2009 by Shay Gal-On, then director of software development at the Embedded Microprocessor Benchmark Consortium (EEMBC), to address the limitations of existing embedded processor benchmarks and provide a modern standard for measuring CPU performance in resource-constrained environments.^[5]^[6] The benchmark emerged as a response to the inadequacies of older metrics like Dhrystone, which had become outdated for contemporary embedded systems. EEMBC launched CoreMark 1.0 on June 1, 2009, making it the organization's first openly available benchmark, distributed freely under a permissive license to encourage widespread adoption and porting across diverse architectures.^[6]^[7] The initial release included comprehensive documentation, such as a porting guide and webinar resources from September 2009, to facilitate implementation on various embedded platforms while adhering to strict execution rules for consistent scoring.^[8] By July 2010, CoreMark had surpassed 2,000 downloads, establishing it as a de facto industry standard for embedded performance evaluation.^[9] A key milestone occurred in 2012 when EEMBC highlighted its growing impact, with nearly 8,000 users having downloaded the benchmark, underscoring its role in standardizing measurements under EEMBC's oversight.^[10] EEMBC integrated CoreMark into its certification processes through a dedicated lab, ensuring verified scores for official submissions and promoting reliability across processor vendors.^[11] CoreMark's evolution continued with ongoing updates to enhance portability, including adaptations for new compiler toolchains and architectures while maintaining its core simplicity. In 2015, EEMBC introduced CoreMark-Pro as an advanced variant, incorporating multiple workloads for more comprehensive testing of multi-core and higher-end processors, though the original CoreMark remained the baseline for basic embedded benchmarking.^[12]^[13] In October 2023, EEMBC merged with the Standard Performance Evaluation Corporation (SPEC) to form SPEC's Embedded Group, which continues to maintain CoreMark, certify scores, and support its periodic refinements for evolving embedded ecosystems, solidifying its position as a standardized tool for performance assessment.^[14]^[15]

Methodology

Algorithms Used

CoreMark employs three distinct algorithms designed to evaluate key aspects of embedded processor performance, focusing exclusively on integer operations without floating-point arithmetic or input/output dependencies. These algorithms—list processing, matrix manipulation, and state machine—use fixed-size datasets to ensure portability across different hardware architectures and compilers, with a 16-bit cyclic redundancy check (CRC) integrated for self-verification of outputs.^[2] The list processing algorithm simulates common memory access patterns and control flow in embedded applications by performing find and sort operations on a linked list. It operates on a number of elements determined by pointer size and the available memory block, consisting of list headers and data items, where the list is initialized with a mix of sequential and non-sequential pointers to test cache efficiency and pointer manipulation. Operations include searching for specific values, reversing the list structure, and sorting using a merge sort on 16-bit data and indices, all without relying on dynamic memory allocation to maintain portability. This algorithm targets integer computations and memory handling typical in resource-constrained systems. The CRC is computed on the list data to verify integrity.^[2]^[3] Matrix manipulation assesses arithmetic throughput and loop optimization by executing addition and multiplication on small integer matrices of fixed dimensions derived from the memory block. It uses two 16-bit input matrices and one 32-bit output matrix, with operations involving constants, vectors, or full matrices, including bit extractions to exercise data handling. These computations mimic signal processing or control tasks in embedded devices, emphasizing efficient integer math without specialized instructions unless naturally supported by the processor. The fixed matrix sizes prevent variability in execution time across platforms, and CRC is applied to the results for validation.^[2]^[1] The state machine algorithm models behavioral control logic found in embedded systems, such as protocol parsing, by implementing a simple finite state machine that processes a byte stream to detect valid comma-separated numbers. It transitions through nine states, counting visits and restoring any corrupted data, which tests branch prediction and conditional execution. The input stream is dynamically initialized to a fixed size, ensuring consistent workload while evaluating control flow overhead in integer-based decision-making. CRC verifies the state machine outputs.^[2]^[3] The algorithms are executed sequentially in each iteration, with the benchmark running multiple iterations until the total execution time reaches at least 10 seconds to ensure statistical reliability; their combined iterations form the basis for the final performance score.^[2]

Execution and Porting Rules

CoreMark porting requires adapting the benchmark code to the target architecture solely through modifications to the platform-specific porting layer files, such as core_portme.h, core_portme.c, and core_portme.mak, without altering the core algorithm logic in files like core_main.c, core_list_join.c, core_matrix.c, core_state.c, or core_util.c.^[3] This ensures portability across diverse embedded systems while maintaining the benchmark's integrity, as the implementation relies exclusively on standard ANSI C with integer arithmetic and prohibits the use of external libraries or floating-point operations.^[1] For example, matrix manipulations and list processing routines must remain unchanged, preserving their original integer-based computations.^[3] During execution, CoreMark runs in an iterative loop, performing the core algorithms repeatedly until accumulating at least 10 seconds of wall-clock time to ensure statistical reliability, though longer runs (e.g., 30 seconds) are recommended for precision.^[16] The benchmark supports a default single dataset via a fixed memory block size of 2000 bytes, but this can be scaled by adjusting the memory block size or iteration count to suit varying hardware capabilities and difficulty levels.^[3] Comprehensive reporting is mandatory, detailing the compiler version, optimization flags, hardware specifications (e.g., clock speed and memory configuration), and any parallel execution details to enable fair comparisons.^[4] Validation is enforced through built-in cyclic redundancy check (CRC) mechanisms that compute expected checksums for the list join, matrix multiplication, and state machine components using predefined seed values (e.g., 0, 0, 0x66 or 0x3415, 0x3415, 0x66) and buffer sizes; discrepancies indicate invalid implementations or illicit optimizations that bypass computational work.^[3] For official certification, vendors must submit their results, including source code and build artifacts, to the EEMBC certification lab, where they undergo rigorous verification to confirm adherence to run rules.^[11] Time measurement employs platform-specific timers integrated into the porting layer (e.g., via start_time() and stop_time() functions), capturing elapsed wall-clock time to derive iteration rates, with scores normalized by megahertz (CoreMark/MHz) to mitigate biases from differing clock speeds in raw reporting.^[3] This normalization facilitates architecture-agnostic comparisons, focusing on efficiency rather than absolute speed.^[1]

Scoring System

The CoreMark score is calculated as the total number of iterations completed across all algorithms divided by the execution time in seconds, where each iteration represents a complete cycle of the benchmark's workloads.^[3] This formula yields iterations per second, providing a direct measure of throughput while ensuring the benchmark runs for a minimum duration, typically at least 10 seconds, to minimize timing inaccuracies.^[1] To facilitate comparisons across processors operating at different clock speeds, scores are often normalized as CoreMarks per MHz by dividing the raw score by the processor's frequency in megahertz.^[1] Compiler optimizations can influence results, so reports include details on the compiler version and flags used, though the primary emphasis remains on hardware performance rather than software tuning.^[3] CoreMark employs a single-number score for straightforward reporting and comparison, with full disclosure of parameters such as seed values (e.g., 0, 0, 0x66), buffer sizes (e.g., 2000 bytes), and platform specifics required for reproducibility.^[3] Certified scores, verified by the EEMBC Certification Lab, undergo rigorous analysis to confirm adherence to official run and reporting rules, distinguishing them from self-reported results.^[11] A score of 1.0 corresponds to the performance of a reference implementation executed on a baseline processor under standardized conditions, serving as the foundational metric against which higher values indicate superior performance.^[2]

Comparisons with Other Benchmarks

Versus Dhrystone

Dhrystone, developed in the 1980s, suffers from several limitations that undermine its reliability as a benchmark for modern embedded processors. Its code base is outdated and synthetic, making it highly susceptible to aggressive compiler optimizations, such as loop unrolling or inlining, which can bypass intended computational work and inflate scores disproportionately compared to real application performance.^[17] Additionally, Dhrystone heavily relies on string library functions like strcmp() and strcpy(), which account for 10-20% of execution time and primarily measure library optimization rather than core processor capabilities.^[17] Its scoring in DMIPS (Dhrystone MIPS) or VAX-equivalent MIPS is based on an obsolete reference machine (VAX 11/750), resulting in non-intuitive metrics that do not reflect contemporary embedded workloads involving real-time control or data processing.^[17] CoreMark was explicitly developed by EEMBC in 2009 as a replacement for Dhrystone to address these vulnerabilities, particularly its exposure to compiler tweaks that distort meaningful performance evaluation.^[2] Unlike Dhrystone, CoreMark employs non-optimizable algorithms—such as list processing, matrix manipulation, state machine behaviors, and CRC computations—that are driven by runtime values and include self-verification checks to prevent code elimination or pre-computation by compilers.^[1] It excludes external library calls from the timed execution portion, ensuring that all measured code is self-contained and focused on intrinsic processor performance.^[1] Furthermore, CoreMark uses an iteration-based scoring system (iterations per second, normalized to CoreMark/MHz) that emphasizes portable integer workloads representative of embedded tasks like data manipulation and control logic, without relying on architecture-specific conversions.^[2] These design choices highlight fundamental differences between the benchmarks: Dhrystone prioritizes synthetic, MIPS-like performance metrics that are prone to variability across tools and hardware, whereas CoreMark delivers hardware-centric, directly comparable scores that better align with the needs of modern embedded systems.^[1] By enforcing strict run and reporting rules—such as specifying exact compiler versions—CoreMark ensures reproducibility and fairness, mitigating the standardization gaps that plague Dhrystone results.^[2]

Versus Other Embedded Benchmarks

CoreMark-Pro extends the original CoreMark benchmark by incorporating additional workloads that include floating-point operations and multi-threaded execution, providing broader coverage of processor capabilities beyond basic integer performance.^[12] Specifically, while CoreMark consists of a single integer workload with four functions focusing on list processing, matrix manipulation, state machine operations, and CRC calculations, CoreMark-Pro adds five integer workloads—such as JPEG compression, ZIP compression, XML parsing, SHA-256 hashing, and an enhanced memory-intensive variant—and four floating-point workloads, including FFT, linear algebra derived from LINPACK, improved Livermore loops, and a neural network algorithm.^[12] This expansion allows CoreMark-Pro to evaluate memory subsystems and diverse performance characteristics in 32-bit to 64-bit microprocessors, contrasting with CoreMark's emphasis on core pipeline efficiency in simpler 8-bit to 64-bit devices.^[12] However, CoreMark remains preferable for basic integer testing due to its smaller footprint—requiring only 2 KB of code and 16 KB of data—making it simpler to port and execute on resource-limited microcontrollers.^[2] In comparison to EEMBC's ULPMark suite, particularly ULPMark-CoreMark, CoreMark prioritizes raw performance metrics without incorporating power consumption analysis.^[18] ULPMark-CoreMark builds directly on CoreMark by measuring iterations per milli-Joule, integrating energy efficiency alongside performance through optimized configurations at varying voltages, such as a performance-focused mode and energy-efficient modes at the lowest voltage or 3V baseline.^[18] This results in ULPMark providing a holistic view of active-power efficiency for ultra-low-power MCUs, whereas CoreMark focuses exclusively on computational throughput in iterations per second, suitable for scenarios where power profiling is not required.^[18] Relative to academic and industry alternatives like MiBench, CoreMark offers superior portability and a minimal resource footprint tailored for microcontrollers.^[2] MiBench, a suite of 16 application-specific benchmarks simulating embedded workloads such as automotive, network, and consumer applications, demands more complex setups involving larger codebases and dependencies, making it less ideal for constrained MCU environments.^[19] CoreMark, by contrast, employs a single synthetic program with straightforward algorithms that avoid external libraries like malloc, ensuring easy porting across architectures and emphasizing single-threaded core speed over detailed application simulations.^[2] As of 2025, Embench represents another open-source alternative, with versions like Embench IOT 2.0 and DSP 1.0 focusing on realistic, portable workloads for modern IoT and digital signal processing in embedded systems.^[20] Unlike CoreMark's synthetic algorithms, Embench draws from real-world applications to better reflect connected device behaviors, while maintaining low resource demands; however, CoreMark's standardized, certified scoring continues to make it a preferred choice for simple core performance comparisons in industry.^[21] Unlike the SPEC CPU benchmark suite, which is designed for desktops and servers to evaluate high-performance computing workloads, CoreMark is optimized for resource-constrained embedded systems with its lightweight design and focus on MCU-relevant operations. SPEC CPU includes diverse integer and floating-point tests that require substantial memory and processing resources, often exceeding the capabilities of typical embedded devices, while CoreMark's minimal 2 KB code size and avoidance of complex I/O or threading enable reliable execution on low-end hardware without specialized setups.^[2]

Adoption and Results

Usage in Industry

CoreMark has been widely adopted by semiconductor vendors for evaluating and marketing microcontroller units (MCUs) in embedded systems, with companies such as ARM, Renesas Electronics, Texas Instruments, STMicroelectronics, and Intel submitting certified scores to EEMBC for validation and comparative purposes.^[11] These vendors leverage CoreMark to demonstrate processor performance in product datasheets and technical specifications, enabling fair comparisons across diverse architectures from 8-bit to 64-bit devices.^[1] The benchmark's strict porting rules, which emphasize platform-neutral implementation, have facilitated this broad uptake by ensuring consistent and verifiable results across vendor ecosystems.^[1] In practical applications, CoreMark supports processor selection during the development of IoT devices, automotive electronic control units (ECUs), and consumer electronics, where it provides a simple metric for assessing computational efficiency in resource-constrained environments.^[1] For instance, Renesas integrates CoreMark into its MCU portfolios for automotive and industrial applications, using it to highlight performance in motor control and other embedded processing tasks.^[22] Its integration into compiler toolchains, such as GCC via GNU ARM Embedded Toolchain and vendor-specific SDKs like those from Nordic Semiconductor and Silicon Labs, allows for automated testing and optimization during software development.^[23]^[24] CoreMark is often combined with power consumption metrics in benchmarks like ULPMark-CoreMark, which extends its utility to low-power designs by measuring energy efficiency in active processing scenarios, a critical factor for battery-operated IoT and wearable consumer products.^[18] The open-source release of CoreMark on GitHub in 2009, with ongoing maintenance by EEMBC, has enabled its use in academic research and custom extensions, while over 800 scores—many certified—have been submitted to EEMBC since then (as of November 2025, 858 scores are listed), influencing iterative improvements in chip designs across the industry.^[3]^[11]

Example Performance Scores

CoreMark performance scores vary significantly across processor architectures, influenced primarily by factors such as clock speed, pipeline depth, and cache size. For instance, the ARM Cortex-M0 processor achieves an official rating of 2.33 CoreMarks/MHz, while the Cortex-M0+ variant improves to 2.46 CoreMarks/MHz, reflecting enhancements in instruction execution efficiency.^[25]^[26] The ARM Cortex-M4, with its more advanced pipeline and optional floating-point unit, attains 3.54 CoreMarks/MHz, enabling higher throughput in signal processing tasks. Higher-end processors like the ARM Cortex-A series can exceed 10 CoreMarks/MHz under optimized conditions; for example, the Allwinner H616 based on Cortex-A53 cores reports 13.10 CoreMarks/MHz at 1.5 GHz.^[27]^[11] These per-MHz metrics allow fair comparisons independent of clock frequency, though total scores scale with it—for a typical 100 MHz microcontroller using a Cortex-M4 core, total CoreMarks might range from 300 to 400, depending on implementation details like memory access latency.^[11] Only scores verified by the EEMBC Certification Lab are considered official and eligible for the CoreMark logo, ensuring adherence to strict run rules; a baseline certified score of 721 CoreMarks has been established for reference devices like certain Renesas RX family processors running at around 120 MHz.^[1] The public EEMBC database, which includes both certified and self-reported results, reveals variations due to compiler choices—for example, the Renesas RA4C1 (Cortex-M4) achieves 314 CoreMarks with Arm Compiler 6.16 but 324 CoreMarks with IAR Compiler at 80 MHz, a difference attributable to optimization levels and code generation efficiency.^[11] GCC and IAR compilers often show 5-10% discrepancies in similar setups, underscoring the importance of standardized reporting.^[11] As of recent uploads to the EEMBC database (as of November 2025), top scores for advanced microcontrollers demonstrate CoreMark's scalability; the Renesas RA8T2 using a Cortex-M85 core reaches 6,379 CoreMarks at 1 GHz (6.38 CoreMarks/MHz), far surpassing entry-level devices and highlighting gains from deeper pipelines and larger caches in modern embedded systems.^[11] These examples illustrate how CoreMark quantifies performance evolution, with the scoring system (detailed elsewhere) applied consistently to yield these iteration-based results.^[11]

Processor Example	CoreMarks/MHz	Total CoreMarks	Clock Speed	Compiler	Certified?	Source
ARM Cortex-M0 (generic)	2.33	N/A	N/A	N/A	Official Rating	ARM Developer
Renesas RA4C1 (Cortex-M4)	3.93	314	80 MHz	Arm 6.16	No	EEMBC Scores
Allwinner H616 (Cortex-A53)	13.10	19,656	1.5 GHz	GCC 7.5.0	No	EEMBC Scores
Renesas RA8T2 (Cortex-M85)	6.38	6,379	1 GHz	Arm Compiler	No	EEMBC Scores
Baseline Device (e.g., Renesas RX)	~6.01	721	~120 MHz	N/A	Yes	EEMBC CoreMark