Fact-checked by Grok 2 weeks ago

ARM Cortex-A9

The ARM Cortex-A9 is a high-performance, power-efficient 32-bit core developed by , implementing the ARMv7-A architecture and designed for embedded applications in low-power, thermally constrained, and cost-sensitive devices. Introduced on March 31, 2008, with its initial revision (r0p0), it supports the ARM, Thumb, and Thumb-2 instruction sets, enabling versatile execution in single-core or multi-core configurations. Key features of the Cortex-A9 include a dual-issue, partially out-of-order 8-stage superscalar for enhanced instruction throughput, dynamic branch prediction, and configurable L1 caches of 16KB, 32KB, or 64KB per core, with support for an optional unified L2 cache up to 8MB. It incorporates the ARMv7 (MMU) for handling, TrustZone security extensions for protected execution environments, and optional NEON Advanced SIMD and Vector Floating-Point (VFPv3) units for and acceleration. The multiprocessor variant, known as Cortex-A9 MPCore, scales to up to four cores with cache coherency via the Accelerator Coherency Port (ACP) and a Snoop (SCU), facilitating (SMP) in systems requiring parallel performance. In terms of performance, the Cortex-A9 delivers over 50% improvement in single-core efficiency compared to its predecessor, the Cortex-A8, while maintaining low power consumption suitable for battery-operated devices; it also integrates CoreSight components for comprehensive debug and trace capabilities. Widely deployed since its launch, the core powers applications in smartphones, digital TVs, , and enterprise systems, with notable implementations in devices such as the 2, SPEAr1300, and OMAP4 SoCs. Its maturity and configurability as either speed-optimized or power-optimized IP have made it a foundational choice for -based system-on-chips (SoCs) in the late and early .

Introduction and History

Development Timeline

The was developed by as part of the ARMv7-A architecture family, succeeding the single-core Cortex-A8 and emphasizing multi-core scalability to address increasing performance needs in mobile devices. ARM officially announced the Cortex-A9 single-core and MPCore multi-core processors on October 8, 2007, at the ARM Developers' Conference in , highlighting their support for up to four cache-coherent cores based on the ARMv7 instruction set. The initial processor release occurred in 2008, with first silicon samples becoming available in late 2009; early demonstrations included 's multiprocessing implementation running OS at a private event in February 2009. Commercial availability began in 2010, as volume shipments of Cortex-A9-based silicon entered multiple market segments, including smartphones and embedded systems, with key partnerships such as enabling rapid adoption through early implementations like the U8500 platform.

Position in ARM Portfolio

The ARM Cortex-A9 serves as a high-performance, out-of-order processor core within the ARMv7-A architecture profile, designed specifically for applications processors in devices requiring robust computational capabilities while maintaining power efficiency. It introduced partial out-of-order execution to the ARM portfolio, marking a significant advancement over its predecessor, the Cortex-A8, which relied on an in-order pipeline and emphasized single-core implementations for simpler mobile applications. In contrast, the Cortex-A9 supported multi-core configurations, paving the way for its successor, the Cortex-A15, which further refined out-of-order processing with enhanced superscalar capabilities for even higher performance demands. Targeted at markets such as smartphones, tablets, and embedded systems, the Cortex-A9 balanced high performance with low power consumption, making it suitable for thermally constrained and cost-sensitive environments where multimedia and general-purpose computing were key. Within the broader ARMv7-A family, it positioned above lower-power options like the Cortex-A5, optimized for minimal area and energy use in basic embedded tasks, and the Cortex-A7, which focused on efficiency for entry-level devices with performance comparable to the A9 but in a smaller footprint. ARM offered the Cortex-A9 under a flexible licensing model, providing it as synthesizable () in format for custom integration across various process nodes, or as pre-optimized hard macros tailored for specific processes to accelerate time-to-market and ensure guarantees. This approach enabled scalability, including dual-core configurations, to meet diverse system requirements without overhauling the core design.

Core Architecture

Processor Microarchitecture

The ARM Cortex-A9 processor employs an out-of-order superscalar microarchitecture to deliver high performance in and applications, implementing the ARMv7-A with support for the Thumb-2 instruction set for efficient code density. This design incorporates dynamic scheduling, allowing instructions to execute out of program order when dependencies permit, thereby maximizing resource utilization and reducing stalls in the execution . The integer pipeline consists of up to 8 stages, enabling efficient handling of while balancing power and area constraints typical of ARM's application processors. A key aspect of the is its support for dual-issue in operations, where up to two instructions can be dispatched per cycle from a variable-length that processes the mixed 16- and 32-bit Thumb-2 encodings. This partially out-of-order model applies primarily to execution, with load/ operations also benefiting from dynamic reordering to overlap memory accesses effectively. Branch prediction is facilitated by a hybrid mechanism featuring a global history table, implemented as a 2-level dynamic predictor with a configurable Global History Buffer (GHB) with 1024, 2048, 4096, 8192, or 16384 entries, a Target Address (BTAC), and a return stack to anticipate and minimize misprediction penalties. The core's scalability allows configuration as a single or in multi-core setups, such as the dual-core variant in the Cortex-A9 MPCore, where between cores is maintained through the AMBA AXI interconnect protocol. This flexibility enables designers to tailor the for varying performance needs while integrating with AMBA-based system buses for instruction, data, and peripheral access.

Pipeline and Execution Units

The ARM Cortex-A9 features an 8-stage integer designed for , enabling superscalar processing with up to two instructions issued per cycle in optimal conditions. The stages consist of fetch, where instructions are retrieved from the instruction cache; decode, which can process up to two instructions simultaneously; rename, for to handle dependencies; dispatch, allocating instructions to appropriate queues; issue, scheduling ready instructions to execution units; execute, performing the computations; writeback, returning results to the register file; and retire, committing instructions in program order while handling exceptions. This structure supports to minimize stalls from branches and dependencies. The execution units include two integer arithmetic logic units (ALUs) for handling address calculations and general-purpose operations, a dedicated multiply-accumulate (MAC) unit for multiplication and accumulation tasks, and a load/store unit capable of one load and one store operation per cycle. These units allow for concurrent processing of up to four instructions in a cycle, including two ALU operations, one memory access, and one branch, enhancing throughput in integer workloads. Floating-point operations are supported through an integrated VFPv3 unit, which features a separate pipeline for scalar floating-point instructions compliant with IEEE 754. The VFPv3 unit achieves one double-precision fused multiply-accumulate (FMA) operation every two cycles, providing efficient support for single- and double-precision arithmetic. In multi-core configurations, the Snoop Control Unit (SCU) manages by implementing a snooping protocol that ensures data consistency across up to four cores through directed snoop requests and responses. Power efficiency is enhanced via , which disables clocks to inactive pipeline stages and units, and , allowing individual cores to enter low-power states while supporting dynamic voltage and .

Memory Hierarchy

The ARM Cortex-A9 processor features a multi-level optimized for high-performance embedded applications, comprising Level 1 (L1) caches tightly integrated with the core, an optional external Level 2 () unified cache, a two-level (TLB) for address translation, and a (MMU) for support. This design balances low-latency access with scalability in single- and multi-core configurations, leveraging the ARMv7-A architecture. The L1 caches are Harvard-style, with separate and caches that are configurable in size to , , or per cache. Both are 4-way set-associative with 32-byte cache lines, enabling efficient prefetching and branch target buffering integration. The cache operates in write-back mode to minimize bus traffic, supporting write-allocate policies for cacheable regions. The cache is a unified, external structure implemented via the ARM PrimeCell PL310 controller, configurable from 128 KB to 8 in 128 KB increments and typically organized as 16-way set-associative. It connects to the core through dedicated AXI master interfaces, providing shared access in multi-core setups and supporting exclusive caching modes to avoid duplication between L1 and L2 levels. The TLB architecture uses a two-level to reduce MMU lookup overhead. The first level includes separate micro-TLBs: a 32-entry fully associative micro-TLB and a configurable - or 64-entry micro-TLB. The second-level main TLB is unified for and , implemented as a configurable 2-way set-associative of 64 to 512 entries plus four fully associative lockable entries, allowing selective retention of critical translations. The MMU provides comprehensive virtual-to-physical address translation and protection, supporting 4 KB small pages as the base granule, along with larger section (1 MB) and supersection (16 MB) mappings in the standard ARMv7 configuration. In multi-core variants, the Cortex-A9 employs AMBA AXI interfaces—typically two 64-bit AXI masters per core—for all external memory accesses, with the (SCU) ensuring by snooping AXI transactions and broadcasting invalidations across cores. This AXI4-compatible setup supports system-level interconnects while maintaining low-latency for up to four cores.

Key Features

SIMD and Vector Processing

The ARM Cortex-A9 incorporates the advanced SIMD extension as part of its ARMv7-A architecture, providing a dedicated media processing engine for operations. The unit is 128-bit wide, enabling of multiple data elements within this vector length, and features a consisting of 32 64-bit registers (equivalent to 16 full 128-bit s) that support both and floating-point operations. These registers are shared with the VFPv3 unit, allowing seamless integration between scalar and vector floating-point computations. operations handle unsigned and signed data types from 8-bit to 64-bit, including polynomial arithmetic over , while floating-point support focuses on single-precision (32-bit) formats, with limited double-precision scalar capabilities. NEON instructions enable efficient arithmetic, such as VADD for element-wise addition and VMUL for multiplication, operating on with up to 16 elements (e.g., sixteen 8-bit integers or four 32-bit floats per 128-bit ). These instructions incorporate modes to prevent by clamping results to the representable range, and rounding modes for precise shifts and conversions, enhancing accuracy in tasks. Integration with VFPv3 extends this to vectorized floating-point operations, including fused multiply-add (VFMA) instructions that compute a*b + c in a single operation without intermediate rounding, reducing error accumulation in chained computations. This fusion applies to both scalar and vector forms, supporting up to four single-precision elements per instruction. In terms of performance, the unit can achieve up to 8 single-precision floating-point operations per cycle when leveraging the Cortex-A9's dual-issue capability, where two NEON instructions (e.g., a multiply followed by an add) are dispatched simultaneously to the execution pipelines. This throughput is realized in multimedia acceleration scenarios, such as H.264 video decoding, where NEON handles and inverse transforms on multiple pixel blocks in parallel, and graphics , including shading and . These capabilities make NEON particularly suited for applications requiring efficient handling of audio, video, and image data streams.

Integer and Floating-Point Operations

The ARM Cortex-A9 processor implements scalar integer operations as part of the ARMv7-A architecture, supporting both the traditional 32-bit ARM instruction set and the Thumb-2 instruction set, which combines 16-bit and 32-bit instructions to achieve better code density while maintaining performance comparable to ARM instructions. All scalar integer operations feature conditional execution, enabling instructions to execute only if specified conditions (such as equality or greater-than) are met, which helps minimize branching and improve efficiency. Additionally, the architecture includes media-oriented instructions for (DSP) tasks, such as SMLAD, which performs two 16-bit signed multiplies followed by a 32-bit addition, useful for audio and image processing applications. Cycle timings for operations vary by type but emphasize low for common . Basic data-processing instructions like ADD and complete in a single cycle, allowing high throughput in sequential computations. Multiply operations, such as MUL for 32-bit results, typically require 3-5 cycles depending on size and whether accumulation is involved, balancing with . instructions, including signed (SDIV) and unsigned (UDIV), take longer at 10-14 cycles to ensure accurate results, reflecting the complexity of the iterative algorithm used. These timings assume in-order execution without interlocks; in the Cortex-A9 can further optimize overall by scheduling dependent operations. For floating-point operations, the Cortex-A9 integrates an optional Vector Floating-Point (VFPv3) unit that handles single-precision (32-bit) and double-precision (64-bit) computations in compliance with the standard, providing robust support for scientific and graphics workloads. The VFPv3 unit includes fused multiply-accumulate (FMA) operations, which combine and into a single rounded result to reduce error accumulation in iterative calculations. Floating-point and subtraction require 3 cycles, enabling efficient scalar math in loops, while operations range from 14 cycles for single-precision to 28 cycles for double-precision, due to the reciprocal approximation method employed. These timings position the VFPv3 as a high-performance when enabled, though it can be disabled for power savings in integer-only applications. The Cortex-A9 also supports the optional extension, which accelerates execution by allowing direct interpretation of most bytecodes as a third execution state alongside and modes, though it is rarely utilized in modern implementations due to advancements in .

Security and Virtualization Support

The Cortex-A9 incorporates ARM TrustZone technology, which provides -enforced isolation between a secure world for sensitive operations, such as cryptographic processing, and a normal world for general-purpose . This separation is achieved through a dedicated secure state in the , where the secure world maintains exclusive access to protected resources while the normal world operates under restricted privileges. All bus transactions originating from the include a Non-Secure (NS) bit, which tags accesses as secure or non-secure, enabling peripherals and memory systems to enforce isolation at the hardware level. Virtualization support in the Cortex-A9 is provided via optional extensions to the ARMv7-A , allowing for efficient operation through two-stage memory address . In this setup, stage-1 maps addresses to intermediate physical addresses (IPAs) within a guest operating system, while stage-2 , managed by the , maps IPAs to physical addresses, supporting up to 40-bit IPAs when the extensions are enabled. These features enable secure partitioning of resources among multiple , with the running in a non-secure to oversee guest isolation without compromising performance. World switching between secure and normal states is facilitated by Secure Monitor Calls (SMC), which trigger an exception to enter the , a privileged state dedicated to handling transitions and maintaining . The processor's interrupt controller integrates TrustZone by routing to either secure or non-secure handlers based on configuration bits, such as the FIQ enable bit, ensuring that secure interrupts remain protected from normal-world software. This dedicated handling prevents unauthorized access and supports real-time secure operations. The Cortex-A9 supports a (PAE) up to 40 bits when configured, expanding the addressable memory space beyond the standard 32 bits to accommodate large systems, such as those with up to 1 TB of . This extension is optional and implementation-defined, allowing integrators to select it for applications requiring extensive physical memory mapping. Integration with the (MMU) extends these capabilities by supporting separate page tables for secure and non-secure worlds, where the bit determines which translation table is active during address resolution. In virtualization scenarios, the MMU applies both stages of , with secure page tables isolated to prevent tampering, thereby reinforcing TrustZone's model across virtualized environments.

Implementations

Single-Core Configurations

The ARM Cortex-A9 , also known as the uniprocessor variant, is implemented as a standalone high-performance core without multi-core clustering, targeting and applications requiring scalable performance. ARM offers this configuration in both synthesizable and hard macro forms to facilitate integration into system-on-chips (SoCs) on advanced process nodes. Hard macros are available on 40 nm and 28 nm processes, enabling optimized area and power for production designs. In terms of operating frequencies, the single-core Cortex-A9 achieves up to 2.5 GHz in speed-optimized hard macro implementations on 28 nm, supporting demanding workloads while maintaining compatibility with ARMv7-A architecture. Typical clock speeds in deployments range from 1 to 2 GHz, balancing and thermal constraints in battery-powered devices. Power consumption for a single core is approximately 500 mW at 1 GHz in power-optimized variants, contributing to energy-efficient operation. Configuration flexibility is a key aspect of single-core setups, allowing designers to tailor the processor to specific needs. L1 caches can be configured as 16 KB, 32 KB, or 64 KB for both instruction and data sides, with four-way set associativity. An optional unified L2 cache, managed via the L2C-310 controller, supports sizes up to 8 MB for improved . Additional options include Jazelle hardware acceleration for direct execution and ThumbEE extensions for in dynamic environments. ARM delivers the single-core Cortex-A9 as (IP) suitable for standalone use, often integrated via the uniprocessor package that excludes multi-core interconnects. This design enables efficient instruction throughput, supporting the high clock rates observed in these configurations.

Multi-Core Variants

The ARM Cortex-A9 MPCore implements multi-core configurations to enable (), with support for up to four cores in a single cluster for enhanced parallelism while maintaining . The dual-core variant is the most prevalent implementation, favored in many designs for its balance of performance gains and power efficiency, as quad-core setups can increase thermal and energy demands without proportional benefits in typical workloads. In dual-core MPCore setups, the two Cortex-A9 processors share a unified configurable up to 8 via the PL310 controller, which provides low-latency access and supports speculative linefills to optimize . The Snoop Control Unit (SCU) ensures coherency among the L1 of the cores using a snoop-based that broadcasts operations to maintain consistency across the . This SCU also arbitrates accesses and handles evictions, integrating with the cores' AXI interfaces for efficient memory transactions. Cache coherency in multi-core Cortex-A9 systems follows a MESI-like for intra-cluster L1 interactions, extended by AMBA AXI Coherency Extensions () to support the AXI interconnect and enable coherent external accesses. The integrated Generic Interrupt Controller (GIC) version 1.0 distributes interrupts across cores, supporting up to 224 shared peripheral interrupts (SPIs) with per-core private interrupts for timers and watchdogs, facilitating efficient task scheduling in environments. Performance scaling in dual-core configurations demonstrates near-linear gains in threaded applications, with representative implementations achieving almost 2x the single-core throughput while consuming only about 40% more power, highlighting the architecture's efficiency for parallel workloads.

Integration in SoCs

The ARM Cortex-A9 core was widely into system-on-chips (SoCs) for mobile and applications during the early , leveraging its ARMv7-A to enable efficient multi-core processing in power-constrained devices. NVIDIA's Tegra 2, released in 2010, featured a dual-core Cortex-A9 configuration clocked at 1 GHz, marking one of the first mobile s with symmetric multi-processing support for enhanced performance in graphics-intensive tasks. This powered early tablets such as the and , combining the CPU with an integrated GPU for multimedia applications. Samsung's Exynos 4210, introduced in 2011 and manufactured on a , incorporated a dual-core Cortex-A9 setup operating at 1.4 GHz, paired with a Mali-400 MP4 GPU to deliver improved graphics rendering for smartphones. It was prominently used in the , supporting playback and multitasking in mobile environments. Apple's A5 , also launched in 2011 on a (later revised to 32 nm), utilized a dual-core Cortex-A9 design clocked at 800 MHz in its iPhone 4S variant, with a higher 1 GHz speed in the iPad 2 configuration; this implementation included custom optimizations for power efficiency alongside a PowerVR SGX543MP2 GPU. The A5 enabled seamless integration in devices, facilitating features like and improved graphics in games. Texas Instruments' OMAP 4 series, spanning models like the OMAP4430 and OMAP4460 from 2011 onward, employed dual-core Cortex-A9 processors scalable up to 1.5 GHz, targeted at both consumer mobile devices and systems. These SoCs included dedicated accelerators for and video, making them suitable for applications in smartphones like the Motorola Droid RAZR and automotive . An example of a quad-core implementation is the NXP 6Quad, released in 2012 on a 40 nm process, featuring four Cortex-A9 cores at 1.0 GHz with integrated / graphics acceleration. It has been widely adopted in , automotive, and embedded systems for applications requiring higher parallelism. Other notable integrations included low-cost SoCs for budget tablets, such as Rockchip's RK3066 from 2012, which featured a dual-core Cortex-A9 at up to 1.6 GHz with a Mali-400 GPU to support affordable media consumption devices. While some early entrants like Allwinner's targeted similar markets, it used a single Cortex-A8 core instead, highlighting the Cortex-A9's role in bridging performance and cost in emerging .

Applications and Performance

Device Adoption

The ARM Cortex-A9 processor powered several first-generation 4G smartphones, including the featuring 2. These devices marked early adoption in high-speed mobile connectivity, enabling advanced multimedia and multitasking capabilities in the ecosystem. In the tablet market, the Cortex-A9 saw significant uptake through the , which utilized the custom A5 with a dual-core Cortex-A9 configuration, contributing to over 30 million units sold during its lifecycle and establishing tablets as mainstream consumer devices. Similarly, the employed the 2 with dual-core Cortex-A9, enhancing portability and performance for media consumption in early tablets. The processor also appeared in set-top boxes and early smart televisions, notably powering Google TV platforms such as LG's L9 chipset-based models, which integrated a dual-core Cortex-A9 for seamless streaming and app integration. These implementations brought internet-connected features to home entertainment systems, with LG's early Google TV devices like the 47LM6700 series exemplifying the shift toward home interfaces. In automotive and embedded applications, the Freescale (now NXP) i.MX6 series, based on single- to quad-core Cortex-A9 configurations, was widely used in systems for features like , media playback, and . The i.MX6's supported rugged environments, powering dashboards in vehicles from manufacturers adopting OS precursors. The Cortex-A9 reached its market peak as the dominant processor in the 2011-2013 ecosystem, with widespread shipments across licensees enabling billions of devices in smartphones, tablets, and systems. This era solidified its role in driving the explosion of .

Benchmark Comparisons

The Cortex-A9 processor exhibits substantial performance gains over the Cortex-A8, delivering more than 50% higher overall performance in single-core setups due to its and dual-issue pipeline. In workloads, it achieves roughly twice the performance of the Cortex-A8 at equivalent clock speeds, while tasks utilizing SIMD extensions show up to three times the throughput, benefiting from enhanced vector processing and reduced pipeline stalls. Benchmark results from 2 indicate dual-core Cortex-A9 configurations scoring approximately 800-1000 points, placing them on par with the N450 in contemporary applications. Compared to the later Cortex-A15, the A9 is 30-50% slower in CPU-intensive tasks per clock cycle but consumes less power, making it suitable for efficiency-focused designs. Power efficiency stands out at around 1000 DMIPS per watt in 28 nm processes, as evaluated via metrics, with the core rated at 2.5 DMIPS/MHz.
BenchmarkCortex-A9 (Single-Core, ~1 GHz)Comparison Context
Dhrystone2.5 DMIPS/MHzBaseline for power-normalized efficiency in 28 nm.
Geekbench 2 (Dual-Core)~800-1000Comparable to Intel Atom N450 multi-threaded loads.
NEON acceleration further boosts multimedia benchmarks, contributing to the A9's edge in vector-heavy workloads over in-order designs like the A8.

Legacy and Modern Relevance

The ARM Cortex-A9 significantly contributed to ARM's dominance in the market by introducing scalable multi-core configurations that balanced and efficiency for battery-constrained devices. Its MPCore variant, supporting up to four cache-coherent cores, enabled high-performance applications in early smartphones and tablets, setting the stage for advanced heterogeneous architectures like big.LITTLE. This multi-core innovation allowed ARM to capture a substantial share of the growing market, influencing the shift toward clustered in portable . As of 2025, the Cortex-A9 continues to find relevance in and applications, particularly where cost and long-term stability outweigh the need for cutting-edge performance. For instance, NXP's 6DualPlus processor, featuring dual Cortex-A9 cores, remains actively available for multimedia-enabled , IoT devices, and automotive systems like e-cockpits. Similarly, Artila's Matrix-770 serves as an Ubuntu Core-based IIoT gateway for networking, leveraging the A9's reliability in low-to-mid-range solutions. These uses highlight its persistence in sectors such as gateways and alternatives to higher-end single-board computers like , where mature ecosystems ensure ongoing viability. ARM has not declared the Cortex-A9 end-of-life, maintaining support through long-term maintenance agreements, with implementations like NXP's 6 series projected to receive updates until at least 2035. While new licensing for the A9 has diminished since the mid-2010s in favor of ARMv8-based designs, existing deployments benefit from sustained vendor support, ensuring compatibility and patches for systems. The Cortex-A9 profoundly shaped subsequent multi-core ARM designs by pioneering cache-coherent multiprocessing in the high-performance segment, facilitating seamless scaling in symmetric multi-processing environments. Its adherence to the ARMv7-A architecture provides backward code compatibility with ARMv8 processors through the AArch32 execution state, allowing legacy A9 software to run on modern 64-bit ARM systems without major rewrites. However, it has been outpaced by ARMv8 cores in power efficiency; for example, the Cortex-A53 delivers comparable single-threaded performance to the A9 while consuming approximately 40% less area and energy, making newer cores preferable for demanding applications. Despite this, the A9 retains cost-effectiveness for low-end embedded tasks, where its proven integration and lower licensing overhead justify continued use over more advanced alternatives. Successors like the Cortex-A53 have built upon this foundation, emphasizing efficiency in entry-level multi-core scenarios.

References

  1. [1]
    Cortex-A9 Product Support - Arm Developer
    Get help with your questions about the Cortex-A9 with our documentation, downloads, training videos, and product support content and services.
  2. [2]
    Cortex-A9 Technical Reference Manual r4p1 - Arm Developer
    This is the Technical Reference Manual for the Cortex-A9 processor.Missing: core | Show results with:core
  3. [3]
    The Cortex-A9 processor - Arm Developer
    The Cortex-A9 processor can be configured with up to four cores delivering peak performance when required. Configurability and flexibility makes the Cortex-A9 ...
  4. [4]
    A Walk Through the Cortex-A Mobile Roadmap - Arm Developer
    Nov 19, 2013 · Cortex-A9. Shortly after the introduction of Cortex-A8, ARM introduced our first multi-core ARMv7 CPU, the cortex-a9. The Cortex-A9 made use ...Missing: announcement 2006
  5. [5]
    Cortex-A9 makes good on ARM's multicore promise - EE Times
    Oct 8, 2007 · The new architecture adds to ARM's established multiprocessor capability with an accelerator coherence port supporting hardware accelerators and ...
  6. [6]
    Cortex-A9 makes good on ARM's multicore promise - EDN
    Oct 8, 2007 · The move, announced at the ARM Developers' Conference in Santa Clara, Calif., follows up on a July announcement of multiprocessing extensions to ...
  7. [7]
    The Birth & Evolution of Cortex-A9, and What's Coming Next…
    Oct 26, 2013 · The initial release in 2008 transformed rapidly into volume shipments and Cortex-A9 MPCore based silicon is available since 2010 across close ...Missing: first date
  8. [8]
    ST-Ericsson set to show multiprocessing Cortex-A9 running ...
    Feb 16, 2009 · ST-Ericsson said the demonstration, running on an ARM Cortex-A9 based system would be done at a “private event.” ST-Ericsson claimed that by ...Missing: 2006 tape- out
  9. [9]
    ST-Ericsson's U8500 brings dual-core 1.2GHz ARM Cortex-A9 to the ...
    Feb 15, 2010 · ST-Ericsson's powerhouse U8500 system-on-chip has come a major step closer to appearing in mainstream devices with today's newly announced ...
  10. [10]
    [PDF] Cortex-A9 Technical Reference Manual - Arm
    The Cortex-A9 processor is a high-performance, low-power, ARM macrocell with an L1 cache ... Out of order execution is not always possible. Some ...
  11. [11]
    The Birth & Evolution of Cortex-A9, and What's Coming Next…
    Oct 26, 2013 · Just over 6 years ago, in October 2007, ARM introduced a product that had the potential to change the world. Not just in one market (like ...Missing: timeline history
  12. [12]
    [PDF] ARM Cortex processors - Public - August 2017
    Cortex-A5/A7. Smallest and lowest power. Armv7-A. Cortex-A15/A17. Infrastructure ... low power. Cortex-M0+. Highest energy efficiency. Cortex-M4. Mainstream.
  13. [13]
    The Top 5 Things to Know about Cortex-A7 - Arm Developer
    Nov 1, 2013 · The Cortex-A7 is fully compatible with the Cortex-A15 processor (which is the highest performance ARMv7-A processor). The Cortex-A7 processor ...
  14. [14]
    [PDF] Cortex-A9 Technical Reference Manual - Arm
    First release for r1p0. 30 September 2009. D. Non-Confidential Restricted Access. First release for r2p0. 27 November 2009. E. Non-Confidential. Second release ...
  15. [15]
    Processor properties - Arm Developer
    ... Development Studio · Arm ... Cortex-A7. Cortex-A8. Cortex-A9. Cortex-A12. Cortex-A15. Release date. Dec 2009. Oct 2011. July 2006. March 2008. June 2013. April ...
  16. [16]
    Cortex-A9 Technical Reference Manual r4p1 - Arm Developer
    The Prefetch Unit implements 2-level dynamic branch prediction with a Global History Buffer (GHB), a Branch Target Address Cache (BTAC) and a return stack.Missing: hybrid | Show results with:hybrid
  17. [17]
    The ARM Cortex-A9 Processors - Design And Reuse
    Oct 8, 2007 · There are many examples of applications that demand the qualities of low cost and efficient performance: connected mobile computers and other ...
  18. [18]
    Cortex-A9 Floating-Point Unit Technical Reference Manual r4p1
    **Summary of Floating-Point Pipeline for Cortex-A9 VFPv3 (Throughput for Double-Precision FMA or Multiply-Accumulate Operations):**
  19. [19]
    About the Cortex-A9 NEON MPE - Arm Developer
    The Cortex-A9 NEON MPE extends the Cortex-A9 functionality to provide support for the ARM v7 Advanced SIMD and Vector Floating-Point v3 (VFPv3) instruction ...<|separator|>
  20. [20]
    Fused Multiply-Add extension - Arm Developer
    The Fused Multiply-Add extension optionally extends the VFPv3 and the NEON architectures. It provides VFP and NEON instructions that perform multiply and ...Missing: Cortex- A9
  21. [21]
    Cortex-A9 NEON MPE instructions - Arm Developer
    Table 3.1 shows the instructions supported by the Cortex-A9 NEON MPE, and the instruction set that they are in, either Advanced SIMD or VFP.Missing: vectors | Show results with:vectors
  22. [22]
    Efficient SIMD and Algorithmic Optimization Techniques for H264 ...
    Jul 18, 2016 · This paper explains the two novel optimization techniques conducted on H.264 decoder (Baseline profile), Cortex A9 platform, to get the best performance.
  23. [23]
    Appendix B. Instruction Cycle Timings - Cortex-A9 - Arm Developer
    This chapter describes the cycle timings of integer instructions on Cortex-A9 processors. It contains the following sections: About instruction cycle timing.
  24. [24]
    Cortex-A9 Technical Reference Manual r4p1
    ### Summary of Jazelle Extension in Cortex-A9
  25. [25]
    Cortex-A9 MPCore Technical Reference Manual r4p1 - Arm Developer
    This is a technical reference manual for the Cortex-A9 MPCore, intended to assist in its use, and is for a developed product.
  26. [26]
    TrustZone for Cortex-A – Arm®
    TrustZone is a system-wide security approach with hardware-enforced isolation, creating a secure world for trusted boot and OS, and a non-secure world.
  27. [27]
    ARM produces hard Cortex A9 for high performance | Electronics ...
    ARM has produced a hard macro version of its Cortex-A9 processor which has been sold as soft IP since 2007. The idea is to give users of Cortex A9 a high ...
  28. [28]
    TSMC's 28nm Based ARM Cortex-A9 Test Chip Reaches Beyond ...
    May 3, 2012 · “At 3.1 GHz this 28HPM dual-core processor implementation is twice as fast as its counterpart at TSMC 40nm under the same operating conditions, ...Missing: count | Show results with:count
  29. [29]
    ARM Announces 2 GHz Dual-Core Cortex-A9 - BDTI
    Sep 23, 2009 · On September 21st ARM announced a new high-speed, hard macro implementation of the Cortex-A9 architecture, called “Osprey.” (A hard macro is ...
  30. [30]
    Cortex-A9 Technical Reference Manual r2p0 - Arm Developer
    ARM, Thumb, and ThumbEE instruction set support. TrustZone ... three voltage domains. optional Preload Engine. optional Jazelle hardware acceleration.
  31. [31]
    [PDF] Cortex-A9 MPCore Technical Reference Manual
    First release for r1p0. 2 October 2009. D. Non-Confidential Restricted Access. First release for r2p0. 27 November 2009. E. Non-Confidential Unrestricted Access.
  32. [32]
    [PDF] The Benefits of Multiple CPU Cores in Mobile Devices | NVIDIA
    Dec 1, 2010 · Dual-core ARM Cortex A9 Architecture ... voltage, the dual core CPU consumes lower power than a single core CPU for the same.
  33. [33]
    [PDF] Whitepaper NVIDIA® Tegra™ Multi-processor Architecture
    Dual-Core ARM Cortex A9 CPU: NVIDIA Tegra features the world's first dual core CPU for mobile applications in addition to support for Symmetric Multi-Processing ...
  34. [34]
    Exynos 4210 - Samsung - WikiChip
    Mar 28, 2018 · General Specs. Family, Exynos. Series, Exynos 4. Frequency, 1,400 MHz. Microarchitecture. ISA, ARMv7 (ARM). Microarchitecture, Cortex-A9.
  35. [35]
  36. [36]
  37. [37]
    A5 - The Apple Wiki
    Sep 22, 2025 · The original processor is the S5L8940. Manufactured by Samsung, the processor itself is dual-core. The processor (H4P) is clocked at 850MHz in ...
  38. [38]
    [PDF] OMAP™ 4 mobile applications platform - Texas Instruments
    SMP parallel processing for higher performance and efficiency – TI's OMAP 4 applications processor is one of the first dual-core, ARM Cortex-A9 MPCore based.
  39. [39]
    [PDF] OMAP™ 4 mobile applications platform - Texas Instruments
    Feb 19, 2009 · Device Features. • Dual-core ARM Cortex-A9 MPCore SMP general-purpose processors for higher performance and efficiency. • IVA 3 Hardware ...<|separator|>
  40. [40]
    AllWinner A10 Soc / Processor - NotebookCheck.net Tech
    Dec 31, 2012 · It contains a 1.2 GHz ARM Cortex A8 core (ARMv7), a ARM Mali 400 (single core) graphics card and a video processing unit. It is produced in 55nm ...
  41. [41]
    Low End Mac's Guide to Apple A-Series Processors
    Sep 20, 2015 · 2011: Apple A5. Apple A5 CPU Based on the dual-core ARM Cortex-A9 processor, Apple's 1 GHz A5 was introduced with the iPad 2 in March 2011.Missing: details | Show results with:details
  42. [42]
    Samsung Galaxy Tab 10.1 LTE - Device Specification
    Samsung Galaxy Tab 10.1 LTE - Specifications ; CPU: 2x 1.0 GHz ARM Cortex-A9, ; Cores: 2 ; GPU: ULP GeForce, 333 MHz, ; Cores: 8 ; RAM: 1 GB, 300 MHz
  43. [43]
    Google TV Gets New Legs with LG ARM TV - Linux.com
    Jun 14, 2012 · LG's new ”LG47” is the first Google TV device powered by an ARM processor: LG's new L9 chipset, based on a 1GHz, dual-core Cortex-A9 CPU. In ...
  44. [44]
    i.MX 6 Processors - Multicore, Arm ® Cortex - NXP Semiconductors
    High-performance i.MX 6 applications processors deliver exceptional multimedia capabilities and scalable integration for automotive and industrial designs.i.MX6D · i.MX6ULL · i.MX6Q · i.MX 6UltraLite
  45. [45]
    SABRE|Automotive-Infotainment|i.MX6 - NXP Semiconductors
    30-day returnsThe SABRE for automotive infotainment based on the i.MX 6 series of processors enables rapid deployment of automotive consumer user experiences.
  46. [46]
    ARM Cortex-A57 and A53 vs Cortex A8, A9, A15 and A7 - ITPro
    Oct 5, 2022 · The A7 is 50 per cent more powerful than the A8, while the A15 is 40 per cent more powerful than the A9. A smaller manufacturing process means ...
  47. [47]
    ARM Cortex series (A8/A9/A15/A7) NEON multimedia ... - EEWorld
    Jul 13, 2016 · NEON is a SIMD data processing architecture. The 256-byte register stack contains 32 64-bit wide registers or 16 128-bit wide registers. All ...<|separator|>
  48. [48]
  49. [49]
    The ARM Cortex-A9 Can Beat Out The Intel Atom - Phoronix
    Sep 3, 2012 · Here's some interesting test results recently uploaded to OpenBenchmarking.org that compares the performance of ARM Cortex A8 and Cortex A9 ...Missing: comparison integer multimedia<|separator|>
  50. [50]
    ARM Cortex A9 vs ARM Cortex A15 - What to expect, and what's the ...
    May 22, 2012 · ARM Cortex A9 vs ARM Cortex A15 - What to expect, and what's ... 2.5 GHz per core, something we'll probably be able to see around mid ...
  51. [51]
    First 28nm ARM Cortex-A9 Processor Optimization Pack now ...
    Feb 23, 2012 · New ARM Physical IP Delivers Optimized Performance and Energy-Efficiency for 28nm Applications, such as Mobile.Missing: macro 40 nm 22<|control11|><|separator|>
  52. [52]
    ARM Cortex-A9 Overview - element14 Community
    Jun 1, 2012 · Supporting the configuration of 16, 32 or 64KB four way associative L1 caches, with up to 8MB of L2 cache through the optional L2 cache ...<|control11|><|separator|>
  53. [53]
    [PDF] Competitive, Synthesizable, Parameterized RISC-V Processor
    Jun 13, 2015 · We have demonstrated BOOM running Linux, SPEC CINT2006, and CoreMark. BOOM, configured similarly to an ARM Cortex-A9, achieves 3.91 CoreMarks/ ...
  54. [54]
    ARM's big.LITTLE Concept - Semiconductor Engineering
    Nov 8, 2012 · It is also claimed to have “PC-class” performance. The Cortex-A53 is said to deliver the performance of a Cortex-A9, is 40%+ smaller in the ...
  55. [55]
    [PDF] The ARM Cortex-A9 Processors - BioRobotics
    Using a convenient synthesizable flow and IP deliverables, the Cortex-A9 processor provides an ideal upgrade path for existing ARM11™ processor-class ...<|control11|><|separator|>
  56. [56]
    i.MX 6DualPlus Applications Processors | Dual Arm Cortex-A9 for ...
    30-day returnsThe i.MX 6DualPlus is a high performance applications processor with two Arm Cortex-A9 cores and advanced 3D graphics and multimedia features.
  57. [57]
    Product of the Week: Artila's Matrix-770 Ubuntu-Core-Based Cortex ...
    The Matrix-770 Ubuntu-core-based Cortex-A9 IIoT Gateway from Artila Electronics is an industrial IoT (IIoT) gateway designed to provide the previously mentioned ...
  58. [58]
    i.MX 6SoloX Applications Processors | Arm Cortex-A9 Cortex-M4
    The i.MX 6SoloX processor utilizes both Arm Cortex-A9 and Cortex-M4 cores to enable secure, connected homes and vehicles within the IoT.
  59. [59]
    The future of 32-bit Linux - LWN.net
    Dec 4, 2020 · MX6 (Cortex-A9) family up to 20 years ending in 2035, and Microchip recently announced their SAMA7G54 (Cortex-A7) that may live even longer. On ...
  60. [60]
    Support: Product Status – Arm®
    Our product portfolio evolves in response to advances in technology, industry demands, strategic acquisitions, product maturity and the development of new ...Missing: position | Show results with:position<|control11|><|separator|>
  61. [61]
    [PDF] ARM Cortex-A Series Programmer's Guide for ARMv8-A - CS140e
    Mar 24, 2015 · For the avoidance of doubt, ARM makes no representation with respect to, and has undertaken no analysis to identify or understand the scope and ...
  62. [62]
    ARM's Cortex A53: Tiny But Important - Chips and Cheese
    May 28, 2023 · According to ARM's keynote presentation, Cortex A53 can deliver the same performance as the 2-wide, out-of-order Cortex A9 while being 40% ...
  63. [63]
    [PDF] Which ARM Cortex Core Is Right for Your Application - Silicon Labs
    The Cortex-A9 has been ... It can be clocked up to 600 MHz (delivering 2.45 DMIPS/MHz), has an 8-stage pipeline with dual-issue, pre-fetch and branch.
  64. [64]
    [PDF] Arm Cortex-A Processor Comparison Table
    The Cortex-A series of applications processors provide a range of solutions for devices undertaking complex compute tasks, such as hosting a rich operating ...