Fact-checked by Grok 2 weeks ago

ARM Cortex-A76

The ARM Cortex-A76 is a high-performance, 64-bit CPU core developed by , implementing the Armv8.2-A architecture with support for extensions including Armv8.1-A, Armv8.3-A (LDAPR only), Armv8.4-A (SDOT/UDOT), Armv8.5-A (PSTATE SSBS), Cryptographic Extension, and RAS Extension. Announced on May 31, 2018, it features a superscalar, out-of-order based on DynamIQ technology, designed to deliver laptop-class single-threaded performance while maintaining smartphone-level power efficiency for demanding tasks in mobile and edge devices. The core supports execution at all exception levels (EL0-EL3) and AArch32 at EL0 only, with ISA compatibility for A64, A32, and T32 instruction sets. Key architectural elements include a non-blocking, high-throughput L1 cache system with 64 KB instruction and 64 KB data caches, a private L2 cache configurable from 128 KB to 512 KB per core, and an optional shared L3 cache up to 4 MB. It incorporates advanced features such as decoupled branch prediction, a 4-wide decode unit, a fourth-generation for instructions and data, and dual-issue 128-bit and floating-point units that double the throughput of prior CPUs. The core supports up to four CPUs per DynamIQ cluster, 40-bit physical addressing, LPAE for 40-bit virtual addressing, for reliability, and interfaces like AMBA or for , along with GICv4 interrupts, Armv8-A timers, CoreSight v3 debug, and ETMv4.2 trace capabilities. In terms of performance, the Cortex-A76 provides a 35% uplift in single-threaded performance over its predecessor, the Cortex-A75, and achieves 40% better performance within the same power envelope, enabling extended battery life for complex workloads like (with 4x improvement in low-precision tasks) and productivity applications. Optimized for 7 nm and advanced process nodes, it targets premium smartphones, laptops, automotive systems (including ASIL-D safety compliance via the Cortex-A76AE variant), and other edge-to-cloud devices requiring high efficiency and compute intensity.

Development

Announcement

The ARM Cortex-A76 CPU core was announced by on May 31, 2018, marking the introduction of their latest premium processor design for high-performance and applications. The unveiling occurred alongside Computex 2018 in , where Arm emphasized the core's role in enabling "laptop-class performance with efficiency" through advancements in the DynamIQ . Internally codenamed "," the Cortex-A76 implements the ARMv8.2-A instruction set and is optimized for 7 nm manufacturing processes, supporting clock speeds up to 3.0 GHz. During the announcement, positioned the Cortex-A76 as the successor to the Cortex-A75, highlighting its 64-bit-only mode execution for enhanced and efficiency in . Key performance claims included a 35% uplift in single-threaded performance over the A75 at the same power envelope, or up to 40% improved power efficiency at equivalent performance levels, based on internal evaluations using TSMC's . Additionally, touted 4x faster inference and improvements in complex workloads like web browsing compared to the previous generation, underscoring its focus on and sustained performance for . Arm indicated that the Cortex-A76 would enter production availability in the second half of , with commercial silicon integration expected in devices launching in the second half of 2019, enabling broader adoption in smartphones, tablets, and high-end routers. The announcement also coincided with reveals of complementary IP, including the Mali-G76 GPU and Mali-V76 video processor, to form a cohesive for next-generation SoCs.

Design Objectives

Development of the Cortex-A76 began in 2013. The ARM Cortex-A76 was developed with the primary objective of bridging the performance gap between and , delivering high-end computational capabilities while maintaining the power efficiency essential for battery-constrained devices. Announced on May 31, 2018, as part of ARM's client CPU , the core aimed to support the transition to 7nm process nodes and enable always-connected experiences in the era of connectivity. This design philosophy addressed the slowing pace of by focusing on architectural innovations that provide substantial single-threaded performance gains without proportionally increasing power consumption. Key targets included achieving a 35% uplift in instructions per clock (IPC) compared to the preceding Cortex-A75, emphasizing superscalar out-of-order execution and advanced branch prediction to handle complex workloads more effectively. The microarchitecture was re-engineered to prioritize energy efficiency for sustained tasks, such as productivity applications and emerging machine learning at the edge, while extending battery life in mobile scenarios. ARM emphasized that these improvements would allow the Cortex-A76 to run desktop-like applications seamlessly on smartphones and laptops, fostering a unified computing experience across devices. In terms of applications, the Cortex-A76 was optimized for premium mobile SoCs targeting smartphones, laptops, and automotive systems, with variants like the Cortex-A76AE incorporating features for autonomous . The design sought to balance raw with and power envelopes typical of mobile platforms, enabling features like always-on processing and high-fidelity graphics without compromising responsiveness. Overall, these objectives positioned the core as a foundational element for next-generation , where efficiency and scalability are paramount.

Architecture

Microarchitecture Overview

The ARM Cortex-A76 is a high-performance, 64-bit CPU core implementing the Armv8.2-A , featuring a ground-up redesigned out-of-order superscalar optimized for sustained high performance in mobile and applications. It is designed for integration within Arm's DynamIQ technology, allowing flexible multi-core configurations in shared units (DSUs) with up to four Cortex-A76 cores per cluster. The core supports 40-bit physical addressing for up to 1 TB of memory and includes with separate 64 KB instruction and 64 KB data L1 caches, each 4-way set-associative and virtually indexed, physically tagged. A private L2 cache per core, configurable from 128 KB to 512 KB, provides low-latency access with 9-cycle load-to-use latency, while an optional shared L3 cache in the DSU ranges from 512 KB to 4 MB. The front-end of the emphasizes high throughput and efficient handling through a operating independently of the fetch , enabling the predictor to run at double the fetch bandwidth to mask misprediction penalties. The fetch delivers 4 to 8 , supported by multi-level target caches and a indirect predictor to maximize accuracy and throughput in complex code paths. Following fetch, the includes Arm's first 4-wide decode , capable of renaming and dispatching up to 8 micro-operations per cycle to the engine, which features a deep reorder buffer for handling dependencies and speculation. This design contributes to a 35% increase in single-threaded performance compared to the predecessor Cortex-A75. In the execution backend, the Cortex-A76 employs quad-issue execution with three simple units (ALUs) for basic operations and one complex ALU handling multi-cycle instructions like division and multiplication, enabling high throughput for scalar workloads. Floating-point and advanced SIMD () processing is powered by dual 128-bit pipelines, doubling the /FP bandwidth over prior designs and delivering up to 4x performance for low-precision inference tasks. The load/store unit supports deep memory-level parallelism with a sophisticated fourth-generation hardware , optimizing for bandwidth-intensive applications while maintaining 4-cycle L1 load-to-use ; it interfaces via AMBA or protocols for system-level . Security features include Arm TrustZone, optional cryptography extensions (, , PMULL), and (Reliability, Availability, Serviceability) support, with protection available for caches and interconnects. Overall, these elements enable a 35% performance uplift and 40% power efficiency improvement over the Cortex-A75 at iso-process and frequency.

Pipeline Design

The ARM Cortex-A76 implements a high-performance, superscalar, pipeline optimized for both integer and floating-point workloads in power-constrained environments. This design supports the ARMv8.2-A architecture and integrates with DynamIQ technology for flexible multi-core configurations. The pipeline emphasizes sustained performance through advanced speculation and parallelism, targeting applications from to edge servers. At its core, the pipeline spans 13 stages, balancing depth for high clock frequencies—up to 3.0 GHz on 7 nm processes—with latency management for efficient instruction throughput. The front-end operates as a 4-wide superscalar unit, with the fetch delivering up to 8 instructions per cycle from a 64 KiB L1 instruction cache and decoding variable-length ARM instructions into micro-operations (uops). This includes macro-op fusion to reduce decode pressure and improve density for common instruction sequences. Once decoded, uops enter a register rename stage before dispatch to a reorder buffer supporting a 128-entry instruction window, enabling dynamic reordering to tolerate dependencies and hide latencies. The back-end features an 8-wide dispatch to specialized execution pipelines, including three simple integer ALUs, one complex integer ALU, two load/store units, a execution unit, and two vector/floating-point units for and Advanced SIMD operations. Load/store units connect to a 64 KiB L1 data cache, exploiting memory-level parallelism with dual ports for concurrent accesses. Retirement occurs in-order at up to 4 , ensuring architectural state consistency while the out-of-order engine maximizes utilization. This structure allows the core to sustain high , with reported uplifts of 35% in single-threaded performance over the Cortex-A75. Branch prediction plays a critical role in maintaining momentum, employing a predictor separate from the fetch unit to precompute and directions ahead of time. It incorporates a multilevel (BTB) with 2x the of the fetch unit, supporting indirect branches and improving accuracy on complex —reducing misprediction rates compared to prior generations. Mispredictions incur an 11-cycle penalty, mitigated by the deep pipeline's window. Overall, these elements enable the Cortex-A76 to deliver desktop-like with mobile efficiency, as evidenced in implementations like Qualcomm's Snapdragon 855.

Memory Hierarchy

The ARM Cortex-A76 employs a multi-level optimized for low and high in high-performance and systems, featuring private per-core L1 and caches alongside an optional shared L3 cache to support efficient access patterns in multi-core configurations. This design balances the demands of sustained with power efficiency, enabling the core to handle complex workloads while minimizing stalls from dependencies. At the first level, each Cortex-A76 core includes a private KB instruction (L1I) and a KB (L1D), both implemented as 4-way set associative structures with -byte lines to facilitate rapid access and prefetching of and . The L1 caches support write-back and write-allocate policies, with a load-to-use of 4 cycles, allowing the engine to overlap memory operations effectively and reduce bubbles. Additionally, the L1 incorporates prefetching mechanisms that detect common access patterns, such as sequential or stride-based loads, to proactively fetch into the and further mitigate impacts on performance-critical applications. The second-level cache (L2) is private to each core and configurable in size from 128 KB to 512 KB, operating as a 16-way set associative, inclusive unified that backs the L1 caches with a latency of approximately 9 cycles for load-to-use operations. This L2 structure provides a 256-bit read interface from the and a matching write interface, supporting up to two 128-bit loads or stores per cycle to sustain the 's dual-issue execution capabilities while ensuring coherence through AMBA or protocols in multi-core systems. The inclusive design simplifies management by automatically invalidating L1 lines upon L2 eviction, contributing to predictable behavior in cache-coherent environments. An optional shared L3 cache, ranging from 512 to 4 , can be implemented at the level to serve multiple Cortex-A76 cores, offering a of 26 to 31 cycles and enhancing for shared data access in scenarios like multi-threaded applications. This level integrates with the system's interconnect fabric to maintain and supports for reliability in enterprise-grade deployments. The (MMU) complements the with dedicated translation lookaside buffers (TLBs) to accelerate virtual-to- translations. The L1 instruction TLB (ITLB) and data TLB (DTLB) are each 48-entry fully associative arrays, natively supporting page sizes of 4 KB, 16 KB, 64 KB, 2 MB, 32 MB, and 512 MB for efficient handling of large mappings common in 64-bit ARMv8.2-A environments. These are backed by a unified TLB with 1280 entries organized as 5-way set associative, which aggregates misses from the L1 TLBs and interfaces with the walker to minimize translation overhead during fills or direct accesses. The TLB design incorporates support for large physical address extensions (LPAE) up to 40 bits, ensuring scalability for systems with expansive footprints.

Key Features

Instruction Set Extensions

The ARM Cortex-A76 core implements the , supporting the 64-bit execution state with the fixed-length 32-bit A64 instruction set, as well as the 32-bit AArch32 execution state using the A32 () and T32 () instruction sets. The AArch32 support is limited to EL0 (user mode) execution level. These base instruction sets provide the foundation for general-purpose computing, including scalar integer operations, advanced SIMD () for vector processing, and floating-point arithmetic via the VFPv4 architecture. The core incorporates several extensions to the ARMv8-A base, enhancing performance in areas such as operations, , reliability, and consistency. The ARMv8.1-A extension adds access instructions under the Large Extensions (LSE) , including load-add (LDADD), load-clear (LDCLR), load-set (LDSET), and swap (SWP) variants for byte, halfword, word, and doubleword sizes in AArch64. These instructions enable lock-free programming and improve scalability in multi-core environments by providing single-copy atomicity without requiring exclusive monitors. Additionally, ARMv8.1-A introduces advanced SIMD instructions for half-precision (FP16) floating-point operations and support for 4KB descriptors in AArch32. Building on ARMv8.1-A, the ARMv8.2-A extension includes mandatory support for half-precision floating-point in the scalar and Advanced SIMD units, with instructions like FCVT (convert between FP16 and other formats) and FMUL (multiply FP16). It also adds enhancements for large systems, such as improved virtualization and memory management, though the Cortex-A76 does not implement optional components like Scalable Vector Extension (SVE). The ARMv8.4-A extension adds instructions to Advanced SIMD (e.g., UDOT and SDOT for unsigned and signed 8-bit dot products), which accelerate matrix multiplications and are particularly beneficial for workloads. The ARMv8.5-A extension provides support for the PSTATE Speculative Store Bypass Safe (SSBS) bit, which helps mitigate speculative store bypass vulnerabilities. An optional Cryptographic Extension, based on the ARMv8-A Cryptography feature, integrates hardware acceleration directly into the Advanced SIMD unit with new A64, A32, and T32 instructions. These include AES instructions (AESE for encrypt, AESD for decrypt, AESMC for mix columns), SHA-1 instructions (SHA1C, SHA1M, SHA1H, SHA1SU0, SHA1SU1), SHA-256 instructions (SHA256H, SHA256H2, SHA256SU0, SHA256SU1), polynomial multiplication (PMULL and PMULL2 for carryless multiply used in GCM mode), and CRC-32 computation (CRC32B, CRC32H, CRC32W, CRC32X, CRC32CB, CRC32CH, CRC32CW, CRC32CX). Optional sub-features add SHA-3 (EOR3, RORV, XAR, BCAX, BDEP, BEXT, BGRP, BSL, BIF) and Chinese SM3/SM4 algorithms. This extension significantly boosts throughput for encryption and hashing in security-critical applications. The (RAS) extension, introduced in ARMv8.2-A, adds the Error Synchronization Barrier (ESB) instruction across A32, T32, and A64 to ensure error records are visible before proceeding, along with new system registers (e.g., ERRIDR_EL1 for error identification, ERXFR_EL1 for external error forwarding). These facilitate hardware error detection, reporting, and recovery, enhancing system robustness in and high-reliability environments. Finally, the core provides partial support for through the Load-Acquire RCpc (Release Consistent processor consistent) instructions, specifically LDAPR, LDAPRB, LDAPRH, and LDAPRX. These load-acquire operations offer weaker ordering guarantees than full acquire semantics, allowing reordering with subsequent stores to different addresses for improved performance in concurrent programming while maintaining compatibility with . Full ARMv8.3-A features like pointer authentication are not supported.

Security and Virtualization

The ARM Cortex-A76 core, based on the ARMv8-A architecture, provides robust hardware support for security through TrustZone technology, which enforces isolation between secure and non-secure execution environments at the exception level EL3 (Secure Monitor). This enables the implementation of a (TEE) for protecting sensitive data and operations, such as cryptographic keys and secure boot processes, from untrusted software in the normal world. TrustZone extends to peripherals, interrupts, and memory, allowing system-wide partitioning configurable by the secure monitor. Additionally, the optional Cryptographic Extension accelerates common security algorithms, including encryption/decryption in modes like ECB, , and GCM, as well as , SHA-256, and SHA-512 hashing, enabling efficient handling of secure communications and checks. For virtualization, the Cortex-A76 implements the full ARMv8-A extensions, supporting EL2 () mode to manage multiple guest operating systems with isolated virtual address spaces and resources. The (MMU) facilitates this through stage-2 address translations, enabling efficient while maintaining protection against guest-to-guest interference. The core also includes the Virtualization Host Extensions (VHE) from ARMv8.1-A, which allow the host OS to execute at EL2 with near-native performance by reducing unnecessary traps and context switches for host instructions, such as system calls. This VHE support, combined with Address Space ID (ASID) management at EL2, optimizes overhead in multi-tenant environments like or server applications. These features integrate seamlessly in DynamIQ Shared Unit (DSU) configurations, where multiple Cortex-A76 cores can share and contexts, supporting scalable deployments in devices requiring both isolation and efficiency, such as smartphones and edge servers.

Performance and Efficiency

Benchmark Results

The ARM Cortex-A76 demonstrated significant performance advancements over its predecessor, the Cortex-A75, particularly in and floating-point workloads. In SPECint2006 benchmarks, the A76 achieved a 25% improvement in performance compared to the A75 when evaluated at the same process node and frequency. Similarly, SPECfp2006 results showed a 35% uplift in floating-point operations under identical conditions. These gains were validated through early implementations, such as Huawei's 980 , where the A76-based cores delivered 1.89 times the performance and 2.04 times the floating-point performance of the Cortex-A73 in the Snapdragon 835 at 2.6 GHz versus 2.45 GHz. Efficiency metrics further highlighted the A76's design strengths, with ARM reporting up to 40% better power efficiency at equivalent levels to the A75, enabling sustained operation in and scenarios without excessive thermal constraints. Memory subsystem enhancements contributed substantially, as LMBench tests indicated a 90% increase in over the A75, reducing bottlenecks in data-intensive tasks. In real-world integrations like Qualcomm's Snapdragon 855, which clocked A76 cores up to 2.84 GHz, single-threaded 4 scores reached approximately 3,500, representing a 45% leap over the Snapdragon 845's A75 configuration, while multi-threaded scores approached 11,000.
BenchmarkCortex-A76 (vs. A75)Implementation ExampleSource
SPECint2006 ()+25%Iso-process/
SPECfp2006 (Floating-Point)+35%Iso-process/
(LMBench)+90%N/A
SPECint2006 (vs. A73)1.89x 980 @ 2.6 GHz
SPECfp2006 (vs. A73)2.04x 980 @ 2.6 GHz
Geekbench 4 Single-Core (vs. A75)+45%Snapdragon 855 @ 2.84 GHz
Overall, these results positioned the A76 as a foundational for devices in , balancing high throughput with constraints typical of -powered systems. ARM's internal modeling projected significant uplifts in SPEC suites across early adopters.

Power Consumption

The ARM Cortex-A76 is engineered for high performance within constrained envelopes typical of and systems, achieving significant efficiency gains through microarchitectural optimizations such as improved branch prediction, wider execution pipelines, and enhanced prefetching mechanisms that reduce waste from stalls and misses. These design choices enable the core to deliver laptop-class computational throughput while adhering to smartphone-level power budgets, supporting extended life in devices like premium mobiles and always-connected PCs. Compared to its predecessor, the Cortex-A75, the A76 provides a 40% improvement in power efficiency at equivalent performance levels, allowing for 40% higher performance within the same power allocation. This uplift stems from targeted reductions in area and power overheads in the engine and , alongside integration with ARM's DynamIQ technology, which facilitates heterogeneous clustering with low-power cores like the Cortex-A55 for workload-specific and voltage scaling. In practice, such efficiencies contribute to over 20 hours of battery life in ARM-based devices running productivity applications. The core's power profile benefits from advanced features including fine-grained power domains for the and floating-point units, as well as support for ARM's Maximum Power Mitigation Mechanism (MPMM), which uses activity monitors to dynamically cap power draw during thermal events without full throttling. When implemented on 7nm process nodes at frequencies up to 3 GHz, these elements ensure the A76 maintains competitive energy-per-instruction metrics, particularly for inference tasks, where it achieves 4x the performance of prior generations at iso-power. Overall, the design prioritizes sustainable efficiency for sustained workloads, balancing peak performance with low leakage and active power dissipation.

Implementations and Usage

Licensing Model

The ARM Cortex-A76 core is licensed by as semiconductor intellectual property () to semiconductor manufacturers, fabless design companies, and system integrators for incorporation into custom system-on-chip () designs. This licensing enables licensees to configure the core within Arm's DynamIQ Shared Unit (DSU) for scalable, clusters, supporting integration with other Arm such as GPUs, interconnects, and controllers via standard AMBA interfaces. The primary licensing pathway for the Cortex-A76 is Arm Flexible Access, a subscription-based model that provides broad, low-barrier entry to Arm's IP portfolio, including the Cortex-A series. Under this program, eligible parties—ranging from startups and institutions to established enterprises—gain unlimited and to , models, and tools without upfront fees, with costs deferred until or production. Qualifying startups and academic users receive zero-cost for prototyping and evaluation, while incurs per-project fees or royalties scaled to volume, promoting innovation in mobile, automotive, and applications. Arm also supports traditional licensing options, such as perpetual or time-bound subscriptions, which involve negotiated upfront payments for rights followed by per-unit royalties upon commercialization. These models allow for customized configurations and are tailored to high-volume producers, ensuring compliance with Arm's specifications while permitting limited modifications under separate agreements. All licenses emphasize royalty-based to align with Arm's ecosystem-driven business strategy.

Adopted SoCs and Devices

The ARM Cortex-A76 core saw widespread adoption in high-end mobile system-on-chips (SoCs) starting in late , primarily for premium smartphones seeking improved performance and efficiency over previous generations. Early implementations focused on DynamIQ-compatible configurations combining A76 performance cores with Cortex-A55 efficiency cores, enabling balanced big.LITTLE architectures for demanding tasks like and processing. These SoCs marked a shift toward laptop-class CPU capabilities in mobile devices while maintaining power constraints suitable for battery-powered platforms. HiSilicon's Kirin 980 was the first commercial to integrate the Cortex-A76, announced in September 2018 and fabricated on a . It features a quad-cluster setup with two high-performance A76 cores at 2.6 GHz, two mid-performance A76 cores at 1.92 GHz, and four A55 cores at 1.8 GHz, delivering up to 75% better single-threaded performance compared to the prior Kirin 970. This powered flagship devices, including the Mate 20, Mate 20 Pro, and Honor View 20, emphasizing advancements in AI via its dual design. Qualcomm's Snapdragon 855, also on 7 nm and launched in December 2018, adopted a similar tri-cluster approach with one prime A76 core at 2.84 GHz, three performance A76 cores at 2.42 GHz, and four A55 cores at 1.8 GHz under the 485 branding. This configuration provided a 45% CPU uplift over the Snapdragon 845, supporting video and enhanced . It was integrated into numerous Android flagships, such as the series, OnePlus 7, and , driving widespread availability in global markets. Samsung's 9820, introduced in February 2019 on an 8 nm process, blended custom M4 cores with A76 for its premium lineup, using two M4 cores at 2.73 GHz, two A76 cores at 2.2 GHz, and four A55 cores at 1.95 GHz. This hybrid design aimed for optimized multimedia and gaming performance, appearing in regional variants of the Galaxy S10 and Note 10 series, particularly in and . Subsequent iterations extended A76 usage to mid-range and 5G SoCs. For instance, the HiSilicon Kirin 990 (2019, 7 nm+ EUV) upgraded to two A76 cores at 2.86 GHz and two at 2.09 GHz alongside four A55 cores, incorporating an integrated modem; it drove Huawei's 30 Pro and P40 series with superior ISP capabilities for photography. Qualcomm's Snapdragon 720G (2020, 8 nm) targeted affordable devices with two A76 cores at 2.3 GHz and six A55 cores at 1.8 GHz, featured in phones like the 6 Pro and Note 9S. MediaTek's Helio G99, announced in May 2022 on a 6 nm process, features two A76 cores at 2.2 GHz and six A55 cores at 2.0 GHz with a Mali-G57 MC2 GPU, aimed at budget gaming smartphones; it powers devices such as the Poco M5 and Narzo 50 series. Beyond smartphones, the A76 found applications in embedded and development platforms. Rockchip's RK3588 (2022, 8 nm) includes four A76 cores at up to 2.4 GHz and four A55 cores, optimized for and multimedia with a 6 and 8K video support; it powers single-board computers (SBCs) such as the Radxa Rock 5B, Orange Pi 5, and BPI-M7, used in , media players, and prototyping. The BCM2712 , used in the 5 released in October 2023 on a 16 nm process, integrates four A76 cores at 2.4 GHz with a VII GPU, targeted at hobbyist, educational, and general-purpose computing applications. Allwinner's A733, launched in late 2024 on a 12 nm process, combines two A76 cores at 2.0 GHz and six A55 cores at 1.8 GHz with an optional 3 and a E902 core, supporting up to 16 GB for tasks in tablets and laptops, such as the Teclast P50Ai. In programmable hardware, Intel's Agilex 5 D-Series FPGAs (2023) incorporate two A76 cores in their hard processor system (HPS) alongside two A55 cores, enabling customizable designs for industrial and applications.
SoCManufacturerCore ConfigurationProcess NodeLaunch YearExample Devices/Platforms
[email protected] GHz + 2×[email protected] GHz + 4×A557 nm2018Huawei Mate 20 Pro, Honor View 20
[email protected] GHz + 3×[email protected] GHz + 4×A557 nm2018Samsung Galaxy S10, OnePlus 7
Exynos 9820[email protected] GHz + 2×[email protected] GHz + 4×A558 nm2019Samsung Galaxy S10 (Exynos variant)
[email protected] GHz + 2×[email protected] GHz + 4×A557 nm+2019Huawei Mate 30 Pro, P40 Pro
[email protected] GHz + 6×[email protected] GHz8 nm2020 6 Pro, Xiaomi Redmi Note 9S
Helio G99[email protected] GHz + 6×[email protected] GHz6 nm2022Xiaomi M5, Narzo 50
RK3588[email protected] GHz + 4×[email protected] GHz8 nm2022Radxa Rock 5B, Orange Pi 5
BCM2712[email protected] GHz16 nm2023 5
A733Allwinner[email protected] GHz + 6×[email protected] GHz12 nm2024Teclast P50Ai
Agilex 5 HPS2×A76 + 2×A55N/A (FPGA)2023Agilex 5 D-Series FPGA development kits

References

  1. [1]
    Arm Cortex‑A76 Core Technical Reference Manual
    The Cortex®‑A76 core implements the Arm®v8‑A architecture and some architecture extensions. It also supports interconnect, interrupt, timer, debug, and trace ...
  2. [2]
    For Small Screens to Large: Introducing a New Suite of IP for ...
    May 31, 2018 · Arm Cortex-A76 CPU: New microarchitecture enabling 35 percent more performance year-over-year for increased productivity; Arm Mali-G76 GPU: ...
  3. [3]
    Cortex-A76 | Laptop-class Performance with Mobile Efficiency - Arm
    Arm Cortex-A76 is a high-performance processor that delivers laptop-class performance with mobile efficiency, advancing computing for smartphones and ...
  4. [4]
    Cortex-A76 Product Support - Arm Developer
    Specifications ; Microarchitecture · Physical addressing (PA), 40-bit ; Memory system and external interfaces, L1 I-Cache / D-Cache, 64KB ; Memory system and ...
  5. [5]
    Cortex-A76: Laptop-Class Performance With Mobile Efficiency
    May 31, 2018 · Moreover, Cortex-A76 supports dual-issue native 16B (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU.
  6. [6]
    Announcing the Arm Cortex-A76AE - Arm Community
    Sep 26, 2018 · Introducing the world's first autonomous-class processor with integrated safety. As the name suggests, the Cortex-A76AE is based on the recently ...
  7. [7]
    Cortex-A76 - Microarchitectures - ARM - WikiChip
    Cortex-A76 and future cores roadmap. Development of the Cortex-A76 started in 2013. Arm formally announced Enyo during Computex on May 31 2018. Process ...
  8. [8]
    Arm emits Cortex-A76 – its first 64-bit-only CPU core (in kernel mode)
    May 31, 2018 · Arm's latest top-end processor core design – the Cortex-A76 unveiled today – is its first CPU to only run kernel-level code that is 64-bit.
  9. [9]
    ARM unveils Cortex-A76 and Mali-G76 - higher performance, better ...
    Jun 1, 2018 · Both are optimized for machine learning and will be built on a 7 nm process. Each year new flagships come out with new chipsets, ...
  10. [10]
    Cortex-A76, Mali-G76, and ML chip designs pump up AI
    Jun 13, 2018 · Arm recently announced a new Cortex-A76 architecture that is claimed to boost the processing of AI and ML algorithms on edge computing devices ...
  11. [11]
    Arm announces the Cortex-A76 CPU, the Mali-G76 GPU, and the ...
    Jun 1, 2018 · The company has announced three products. The first is the Cortex-A76 CPU. Then we have the Mali-G76 GPU, which is followed by the Mali-V76 VPU (Video ...<|separator|>
  12. [12]
    Accelerating Mobile and Laptop Performance - Arm Newsroom
    Aug 16, 2018 · The Cortex-A76 is delivering an unprecedented 35% gain in performance compared to the previous generation, without compromising Arm's efficiency ...Missing: objectives | Show results with:objectives
  13. [13]
    Arm Cortex-A76AE: A new age of high-performance processing with ...
    Sep 26, 2018 · Cortex-A76AE has been designed to act as the decision engine in next generation ADAS and Autonomous Vehicle systems. It delivers a 30% uplift in ...Missing: objectives | Show results with:objectives
  14. [14]
    3.5.3. Functional Description of the Arm Cortex-A76 Core - Intel
    Hard Processor System Technical Reference Manual: Agilex™ 5 SoCs. Download PDF. ID 814346. Date 7/19/2024. Version. 25.3 (latest), 25.1 · 24.3.1 · 24.3 ...<|separator|>
  15. [15]
    [PDF] State-of-the-art Processor Design Robert Mullins
    Our superscalar pipeline cannot process instructions faster than they are supplied, so maintaining a good instruction fetch rate is very important. Potential ...<|control11|><|separator|>
  16. [16]
    The Neoverse N1 CPU: No-Compromise Performance - AnandTech
    Feb 20, 2019 · Arm Announces Neoverse N1 & E1 Platforms & CPUs ... The N1 CPU shares the same pipeline organisation we've seen on the Cortex A76.
  17. [17]
    Arm Cortex-A76 Software Optimization Guide
    Arm Cortex-A76 Software Optimization Guide. Software Optimization Guide. This document is only available in a PDF version. Click Download to view.
  18. [18]
    Arm Cortex-A76 And Mali-G76 Architectures For Next-Gen Mobile ...
    Rating 4.5 · Review by Chris GoettingMay 31, 2018 · Arm has revealed Cortex-A76 CPU, Mali-G76 GPU, and V76 VPU designs to fuel the next generation mobile devices.
  19. [19]
    Data prefetching - Arm Cortex‑A76 Core Technical Reference Manual
    The memory system acts by taking actions that aim to reduce the latency of the memory access when they occur. ... L1 instruction cache and L2 cache. Instruction ...
  20. [20]
    Arm Cortex-A76 CPU deep dive - Android Authority
    May 31, 2018 · The Cortex-A76 is a ground-up microarchitecture redesign which emphasizes improving peak performance and, perhaps more importantly, sustaining it in compact ...
  21. [21]
    Main functions - Arm Cortex‑A76 Core Technical Reference Manual
    Main functions ; Instruction L1 TLB, 48 entries, fully associative. ; Data L1 TLB, 48 entries, fully associative. ; L2 TLB, 1280 entries, 5-way set associative.Missing: size | Show results with:size
  22. [22]
    Arm Cortex-A76 Core Technical Reference Manual r3p1
    Characteristic, Note. 5-way, set associative, 1280-entry cache. Stores: VA to PA mappings for 4KB, 16KB, 64KB, 2MB, 32MB, 512MB, and 1GB block sizes.
  23. [23]
    The Armv8.1 architecture extension - Arm Developer
    The Armv8.1 architecture extension is an extension to Armv8.0. It adds mandatory and optional architectural features. Some features must be implemented together ...
  24. [24]
    The ARMv8-A architecture and its ongoing development
    Dec 2, 2014 · Instruction set enhancements. ARMv8.1 includes the following additions to the A64 instruction set: A set of AArch64 atomic read-write ...
  25. [25]
    The Armv8.2 architecture extension - Arm Developer
    The Armv8.2 architecture extension is an extension to Armv8.1. It adds mandatory and optional architectural features. Some features must be implemented together ...
  26. [26]
    About the Cryptographic Extension - Cortex-A76 - Arm Developer
    This document describes the optional cryptographic features of the Cortex-A76 core. It includes descriptions of the registers used by the Cryptographic ...
  27. [27]
    ldapr (a64) - Arm Compiler armasm User Guide
    This instruction is supported in the Arm®v8.3-A architecture and later. It is optionally supported in the Armv8.2-A architecture with the RCpc extension. Usage.
  28. [28]
    TrustZone for Cortex-A – Arm®
    Arm TrustZone technology offers an efficient, system-wide approach to security with hardware-enforced isolation built into the CPU.
  29. [29]
    Arm Cortex‑A76 Core Technical Reference Manual
    ### Summary of Supported Standards and Specifications for Cortex-A76
  30. [30]
    Arm Cortex‑A76 Core Cryptographic Extension Technical Reference ...
    Release Information ; 0301-00, 16 November 2018, Non-Confidential ; 0400-00, 30 April 2019, Non-Confidential ; 0401-00, 31 July 2020, Non-Confidential ; 0401-01, 30 ...
  31. [31]
    AArch64 behavior - Cortex-A76 - Arm Developer
    The Cortex-A76 is Armv8 compliant, supports AArch64, has address translation with 4KB, 16KB, or 64KB granules, and supports VHE.
  32. [32]
    Virtualization host extensions - Arm Developer
    This guide describes the virtualization support in the Armv8-A AArch64 ... The DynamIQ processors (Cortex-A55, Cortex-A75 and Cortex-A76) support Virtualization ...
  33. [33]
    HiSilicon Kirin 980 Benchmarks Show Cortex A76's ... - CNX Software
    Nov 23, 2018 · SPECS2006 confirmed the good performance with 1.89x higher integer score, and 2.04x faster for the floating-point benchmark. If we adjust the ...
  34. [34]
    Arm Flexible Access
    Arm Flexible Access provides no-cost or low-cost access to Arm IP, tools, and training, with fees only due at manufacture. It has three tiers, and early ...Missing: A76 | Show results with:A76
  35. [35]
    Licensing Arm Technology and Subscriptions
    Arm offers licensing for a wide range of IPs and tools, enabling companies of all sizes to develop chips and platforms for diverse global markets.
  36. [36]
    Kirin 980 Chipset | HiSilicon Official Site
    Kirin 980 supports the world's fastest LPDDR4X mobile phone WiFi chipset to reach peak downloads of 1.7 Gbit/s. First 7 nm SoC. Built on Arm Cortex-A76 ...
  37. [37]
    HiSilicon Kirin 980 SoC - Benchmarks and Specs - Notebookcheck
    Nov 1, 2018 · The performance cluster contains two Cortex-A76 cores with up to 2.6 GHz, the balanced cluster contains also two Cortex-A76 cores but with only ...
  38. [38]
    Here's what's new in the Qualcomm Snapdragon 855
    Dec 5, 2018 · At its heart, the Qualcomm Snapdragon 855 is a 7nm octa-core processor. ... Arm's latest Cortex-A76 CPU part. Qualcomm boasts a 45 percent CPU ...
  39. [39]
    Snapdragon 855 - Qualcomm - WikiChip
    Snapdragon 855 is a high-performance 64-bit ARM LTE system on a chip designed by Qualcomm and introduced in late 2018.Clock domains · Cache · Graphics · Camera
  40. [40]
    Samsung's Exynos 9820 to come with tri-cluster, 2+2+4 CPU
    Jul 11, 2018 · The 2+2+4 core configuration stands for 2x Mongoose M4 cores, 2x high-performance Cortex-A76 or Cortex-A75 cores and 4x low-power and efficient ...
  41. [41]
    Kirin 990 Chipset | HiSilicon Official Site
    Key Features ; CPU · 2x Cortex-A76@2.86 GHz · 2x Cortex-A76@2.09 GHz · 4x Cortex-A55@1.86 GHz ; GPU · 16-core Mali-G76 ; AI · HUAWEI Da Vinci Architecture · 1x Ascend ...
  42. [42]
    HiSilicon Kirin 990 SoC - Benchmarks and Specs - Notebookcheck
    Dec 22, 2019 · The Kirin 990 is produced in the modern 7nm process at TSMC. Codename, Cortex-A76/-A55. Clock Rate, <=2860 MHz. Number of Cores / Threads, 8 / 8.
  43. [43]
    [PDF] White Paper - Intel Agilex® 5 FPGA D-Series
    The Arm. Cortex-A76 cores in the HPS operate at clock speeds as fast as 1.8 GHz and the Arm Cortex-A55 cores operate at clock speeds as fast as 1.5 GHz. This ...Missing: microarchitecture | Show results with:microarchitecture