Fact-checked by Grok 2 weeks ago

ARM Cortex-A57

The ARM Cortex-A57 is a high-performance, 64-bit processor core based on the Armv8-A architecture, featuring 1-4 symmetrical multiprocessing (SMP) cores per cluster with per-core L1 instruction and data caches alongside a shared L2 unified cache, designed primarily for demanding mobile and system-on-chip (SoC) applications.^[1]^[2] First announced in October 2012, with first tape-out in April 2013 on TSMC's 16 nm FinFET process, the Cortex-A57 introduced advanced 64-bit computing capabilities to ARM's portfolio, supporting both AArch64 (native 64-bit execution) and AArch32 (backward-compatible with Armv7 32-bit mode).^[3]^[4]^[1] It incorporates key features such as ARM TrustZone for security, Neon advanced SIMD extensions for multimedia processing, VFPv4 floating-point unit, and hardware virtualization support, enabling efficient handling of complex workloads like gaming, video decoding, and multitasking in smartphones and tablets.^[1]^[5] To optimize power efficiency in heterogeneous computing environments, the Cortex-A57 was frequently paired with the low-power Cortex-A53 in big.LITTLE configurations, allowing dynamic core switching based on workload demands for balanced performance and battery life.^[1] Multicore coherence is achieved through AMBA 5 CHI or AMBA 4 ACE protocols, supporting scalable clusters for broader SoC designs, while debug and trace capabilities are provided via CoreSight components.^[1] Although succeeded by newer cores like the Cortex-A72 for even higher efficiency, the A57 remains notable for pioneering 64-bit ARM processing in consumer devices, powering early implementations in products such as NVIDIA's Tegra X1 SoC.^[1]

Introduction

Overview

The ARM Cortex-A57 is a high-performance, 64-bit CPU core compatible with the ARMv8-A architecture, designed for demanding applications in mobile devices, embedded systems, and servers. Announced by ARM Holdings on October 30, 2012, as part of the Cortex-A50 series, it introduced 64-bit computing capabilities to ARM's processor lineup while maintaining backward compatibility with 32-bit ARMv7 software. Known internally by the codename Atlas, the core targets scenarios requiring significant computational power with energy efficiency, serving as the "big" component in heterogeneous computing setups.^[6]^[7]^[8] The Cortex-A57 supports configurations of 1 to 4 cores per cluster in a symmetrical multiprocessing (SMP) arrangement, with the option for multiple coherent clusters connected via AMBA 5 CHI or AMBA 4 ACE interfaces. It employs a 3-way superscalar, out-of-order execution pipeline to achieve high instruction throughput. In practical implementations, cores can operate at clock speeds up to 2.5 GHz or higher depending on the manufacturing process, such as TSMC's 16nm FinFET+. This design enables scalability for multi-core systems while optimizing for power-constrained environments.^[1]^[9] Key integration features include mandatory NEON advanced SIMD and DSP extensions for vector processing, a VFPv4 floating-point unit for enhanced numerical computations, hardware virtualization support for efficient guest OS management, ARM TrustZone for secure execution environments, and the Thumb-2 instruction set for compact code density. The core is particularly suited for big.LITTLE heterogeneous architectures, pairing with efficiency-focused cores like the Cortex-A53 to dynamically balance performance and power across workloads in mobile and embedded platforms.^[1]^[6]

Development History

The development of the ARM Cortex-A57 was initiated as part of ARM Holdings' strategic transition to the 64-bit ARMv8-A architecture, aimed at enhancing performance to rival x86 processors in emerging markets such as smartphones, tablets, and servers while preserving the low-power characteristics essential for mobile computing.^[6] This shift addressed the growing demand for higher computational capabilities in battery-constrained devices and data centers, where 64-bit processing enabled better handling of large datasets and multitasking.^[10] Key milestones in the Cortex-A57's development included its public unveiling on October 30, 2012, at ARM TechCon, alongside the Cortex-A53 as the first implementation of ARM's 64-bit processor series.^[6] The core achieved its first tape-out in April 2013 through a collaboration with TSMC on 16nm FinFET technology, marking an early validation of its design on advanced nodes.^[4] First silicon became available in late 2014, with sampling of initial implementations like Samsung's Exynos 5433 SoC, followed by full production ramp-up in 2015 as partners integrated the core into commercial products.^[11] The primary design goals centered on delivering desktop-class performance levels suitable for demanding applications, while upholding power efficiency critical for mobile platforms, through an emphasis on superscalar out-of-order execution to boost instructions per cycle (IPC).^[6] This approach targeted a threefold increase in single-threaded performance over contemporary 32-bit superphone processors, without proportionally raising power consumption, to support scalable configurations up to multi-core clusters.^[6] The Cortex-A57 was developed internally by ARM Holdings' engineering team, leveraging process nodes ranging from 28nm to 16nm for optimized yield and efficiency, with close collaborations involving partners like TSMC for fabrication tape-outs and early adopters such as Qualcomm and NVIDIA to refine integration for real-world deployment.^[4] These partnerships facilitated rapid prototyping and validation, ensuring compatibility with existing ARM ecosystems. Initial target markets for the Cortex-A57 focused on high-end mobile system-on-chips (SoCs) for premium smartphones and tablets, with deliberate extensions to server environments, exemplified by AMD's adoption in its "Seattle" processor platform announced in 2013 for energy-efficient data center applications.^[12]

Microarchitecture

Pipeline and Execution Units

The ARM Cortex-A57 features a 15-stage integer pipeline designed for high-performance out-of-order execution, enabling efficient handling of complex workloads while supporting both 64-bit AArch64 and 32-bit AArch32 instruction sets.^[13] The pipeline begins with a fetch stage that retrieves up to three instructions per cycle from the instruction stream, followed by a multi-stage decode process that can handle up to three instructions simultaneously, including register renaming to resolve dependencies and eliminate hazards like write-after-read and write-after-write. Subsequent stages include dispatch, where instructions are allocated to appropriate queues, and issue, which dynamically schedules up to three micro-operations per cycle using reservation stations for out-of-order processing. Execution occurs across specialized units, with results collected in a reorder buffer to ensure in-order retirement for architectural correctness, supporting speculative execution to minimize stalls.^[5]^[14] The core's execution units are optimized for a 3-way superscalar design, with three integer pipelines comprising two symmetric arithmetic logic units (ALUs) for basic operations like add, subtract, and bitwise logic—each with a 1-cycle latency—and a third pipeline dedicated to integer multiply-accumulate operations and additional ALU tasks, including an iterative divider for division instructions. A dedicated branch execution unit handles control flow resolution, while a load/store unit manages memory access instructions, capable of issuing one load and one store per cycle to support efficient data movement. For floating-point and vector processing, the Cortex-A57 includes two asymmetric FP/NEON pipelines: one for simpler scalar and SIMD operations (F0) and another for complex tasks like fused multiply-add, divides, and cryptography extensions (F1), implementing the full VFPv4 floating-point unit with double-precision support and 128-bit Advanced SIMD (NEON) capabilities across 32 vector registers.^[5]^[15]^[14] This out-of-order architecture allows for a reordering window supporting up to 40 instruction bundles in flight (each capable of holding multiple instructions), with dynamic scheduling via reservation stations to maximize unit utilization and hide latencies, such as the 5-cycle latency for 64-bit integer multiplies (with throughput of 1 per cycle). Integer operations generally exhibit low latency to sustain high instruction throughput, while FP/NEON units provide balanced scalar and vector performance, enabling dual-issue of many 128-bit NEON instructions under optimal conditions. The design prioritizes parallelism within the 3-wide issue width, ensuring the pipeline can dispatch a mix of integer, load/store, and FP instructions each cycle without requiring software reordering.^[15]^[14]

Memory Hierarchy and Caches

The ARM Cortex-A57 processor features a multi-level memory hierarchy designed to balance performance and power efficiency in 64-bit ARMv8-A systems. At the lowest level, each core includes separate L1 caches for instructions and data. The L1 instruction cache is 48 KB in size, organized as 3-way set-associative with 64-byte cache lines, and supports optional dual-bit parity protection on both data and tag RAMs to detect errors.^[16] The L1 data cache is 32 KB, implemented as 2-way set-associative with the same 64-byte line size, and includes optional error-correcting code (ECC) protection per 32 bits for data integrity.^[16] These L1 caches are virtually indexed and physically tagged, enabling low-latency access during instruction fetch and load/store operations. The Level 2 (L2) cache serves as a unified, inclusive store that backs the L1 data cache, ensuring that all L1 data cache contents are also present in L2 to facilitate coherence and eviction handling. Configurable in size as 512 KB, 1 MB, or 2 MB per cluster, the L2 cache is 16-way set-associative with 64-byte lines and provides ECC protection per 64 bits. In multi-core configurations, the L2 cache is shared among up to four cores within a cluster, promoting efficient data sharing while maintaining per-core L1 privacy. The Cortex-A57 does not incorporate an on-chip L3 cache; instead, it relies on external system-level memory controllers for higher-level caching and main memory access. Translation Lookaside Buffers (TLBs) in the Cortex-A57 manage virtual-to-physical address translations efficiently. Each core has a dedicated L1 instruction TLB with 48 fully associative entries and an L1 data TLB with 32 fully associative entries, both supporting common page sizes such as 4 KB, 64 KB, and 1 MB. A shared L2 TLB, unified for instructions and data, provides 1024 entries organized as 4-way set-associative and is accessible across all cores in the cluster to reduce translation overhead in multi-processor scenarios. For multi-cluster coherence, the Cortex-A57 supports the Coherent Hub Interface (CHI), an AMBA 5 protocol that enables scalable cache coherency across clusters using directory-based mechanisms. This interface handles snoop requests and ensures consistency without an integrated L3, deferring larger-scale sharing to the system interconnect and memory controllers. The processor operates within a 48-bit physical address space as defined by the ARMv8-A architecture, allowing access to up to 256 TB of physical memory.

Branch Prediction and Other Features

The ARM Cortex-A57 incorporates a two-level dynamic branch predictor based on global history to anticipate branch outcomes and reduce pipeline stalls from control flow changes. This predictor works in conjunction with a Branch Target Buffer (BTB) that caches branch instructions and their targets for quick lookup, featuring a 64-entry L1 BTB for low-latency access and a larger L2 BTB ranging from 2048 to 4096 entries to handle a broader set of branches. An indirect predictor with 512 total entries, supporting up to 16 targets per indirect branch, addresses challenges in predicting jumps with variable destinations, such as in virtual function calls or switch statements. Complementing these, a 32-entry return address stack predicts function returns by storing call sites, while a static predictor handles cases not covered dynamically, assuming taken branches for backward conditionals and untaken for forward ones.^[15]^[17] This branch prediction system enables speculative execution to overlap branch resolution with ongoing instruction processing, but incurs a misprediction penalty of 15 to 19 cycles when a forecast proves incorrect, depending on the pipeline depth affected and the branch type. The design prioritizes accuracy to minimize such flushes, leveraging global history patterns for effective performance in diverse workloads, including server and mobile applications. Branch predictor maintenance instructions, such as BPIALL for invalidating all entries or BPIMVA for virtual address-specific invalidation, allow software to flush the predictor when needed, such as during context switches.^[18]^[17] In addition to branch handling, the Cortex-A57 includes hardware virtualization extensions at the EL2 exception level, which trap and emulate sensitive operations for guest operating systems, facilitating secure multi-tenant environments as defined in the ARMv8-A architecture. TrustZone security extensions enable isolation between a secure world for trusted code and a non-secure world for general applications, enforced through dedicated registers like SCR_EL3 to protect cryptographic keys and sensitive data from unauthorized access. For enhanced media processing, the core integrates Advanced SIMD (NEON) units with 128-bit vector registers across 32 lanes, allowing single instructions to perform parallel operations on multiple data elements, such as in vectorized floating-point or integer computations for audio, video, and graphics acceleration.^[19] Debugging and tracing are supported via CoreSight infrastructure, including the Embedded Trace Macrocell (ETM) compliant with version 4 architecture, which functions as the Program Trace Macrocell (PTM) to capture real-time instruction execution traces without interrupting program flow. This enables non-intrusive profiling and debugging, with trace data output through AMBA Trace Bus (ATB) interfaces and integration with cross-triggering for multi-core synchronization. The Performance Monitors Unit (PMU) version 3 further aids analysis by counting events like branch mispredictions and cache accesses, configurable via dedicated registers for software optimization.

Implementations

Commercial Chips and SoCs

The ARM Cortex-A57 core was integrated into several high-profile system-on-chips (SoCs) for mobile, embedded, and server applications, marking its debut in commercial products during the mid-2010s. These implementations typically paired the high-performance A57 cores in big.LITTLE configurations with efficiency-oriented Cortex-A53 cores, leveraging the 64-bit ARMv8 architecture for enhanced computing capabilities in smartphones, tablets, gaming consoles, and data center hardware.^[20]^[21] Qualcomm Snapdragon 810, announced in April 2014 and entering commercial availability in early 2015, featured four Cortex-A57 cores clocked up to 2.0 GHz alongside four Cortex-A53 cores at 1.5 GHz, fabricated on a 20 nm process node.^[20]^[22] This SoC powered flagship smartphones such as the HTC One M9 and Sony Xperia Z5, integrating the Adreno 430 GPU for graphics processing and supporting advanced features like 4K video capture.^[23]^[24] NVIDIA Tegra X1, released in 2015 on a 20 nm process, incorporated four Cortex-A57 cores capable of reaching up to 2.2 GHz, combined with four Cortex-A53 cores in a big.LITTLE setup.^[21]^[25] It found applications in consumer electronics like the Nintendo Switch handheld console, where the A57 cores were clocked at 1.02 GHz for balanced power efficiency, as well as in automotive infotainment systems.^[26] Some variants of the Tegra X1 employed a hybrid configuration with two custom Denver 2 cores replacing two A57 cores to optimize single-threaded performance.^[27] Samsung Exynos 5433, introduced in 2014 and built on a 20 nm process, utilized four Cortex-A57 cores at 1.9 GHz paired with four Cortex-A53 cores at 1.3 GHz.^[28]^[29] This SoC debuted in devices including the Samsung Galaxy Note 4 phablet and Galaxy Alpha smartphone, with the Mali-T760 GPU handling graphics duties and enabling 64-bit computing for improved multitasking.^[30] It was later extended to tablets like the Galaxy Note Edge and Galaxy Tab S2.^[30] Samsung Exynos 7420, announced in 2015 and fabricated on a 14 nm FinFET process, featured four Cortex-A57 cores at up to 2.1 GHz alongside four Cortex-A53 cores at 1.5 GHz.^[31] This SoC powered devices such as the Samsung Galaxy S6 and S6 Edge smartphones, integrating a Mali-G7200 GPU and supporting features like Quick Charge 2.0. AMD Opteron A1100 series, codenamed Seattle and released in January 2016 on a 28 nm process, offered configurations with four or eight Cortex-A57 cores, targeting server and data center workloads.^[32]^[33] The design included up to 8 MB of shared L3 cache, dual-channel DDR4 memory support with ECC, PCIe 3.0 interfaces, and integrated 10 Gigabit Ethernet for scalable enterprise applications.^[34]^[33]

Licensing and Variants

The ARM Cortex-A57 processor core was licensed by ARM Holdings to semiconductor partners for integration into custom system-on-chips (SoCs), following ARM's standard intellectual property (IP) model that includes upfront licensing fees and ongoing royalties based on the number of units shipped by the licensee.^[35] The core was offered in flexible formats, including synthesizable register-transfer level (RTL) descriptions for custom optimization and hard macros for faster implementation on specific process nodes.^[5] By 2014, ARM had secured over 50 licensing agreements for the ARMv8-A architecture encompassing the Cortex-A57 and Cortex-A53 cores, with adoption spanning more than 20 partners focused on high-performance applications.^[36] The majority of implementations targeted high-end mobile devices, while extensions supported server and embedded systems through configurations compatible with big.LITTLE heterogeneous processing.^[37] The standard Cortex-A57 variant supported one to four cores per cluster, with provisions for multi-cluster configurations up to eight cores when paired with low-power Cortex-A53 cores in big.LITTLE setups for balanced performance and efficiency.^[2] Custom implementations included modifications by partners like NVIDIA, which used hybrid configurations in the Tegra X1 SoC.^[27] Implementations of the Cortex-A57 spanned multiple process nodes, starting with early designs on 28 nm for initial validation, transitioning to mainstream 20 nm production for mobile SoCs, and advancing to 16 nm FinFET and 14 nm nodes for improved density and efficiency in later products.^[38] ^[39] The Cortex-A57 has been succeeded by newer cores like the Cortex-A72 and Cortex-A73.

Performance Characteristics

Benchmark Results

The ARM Cortex-A57 core delivered competitive performance in mid-2010s mobile benchmarks, showcasing its out-of-order execution capabilities in integer and floating-point workloads. In standard CPU tests, it achieved instructions per cycle (IPC) ratings of 2.5 to 3.0 in typical integer tasks, reflecting its wide issue width and advanced branch prediction. Floating-point performance reached up to 8 GFLOPS per core in double-precision operations, enabling efficient handling of vectorized computations in applications like multimedia processing.^[40] For broader synthetic benchmarks, the Nvidia Tegra X1 SoC, featuring four Cortex-A57 cores at up to 2 GHz, recorded Geekbench 4 single-core scores of about 1500 and multi-core scores near 5000 in quad-core configurations.^[41]^[42] Similarly, the Snapdragon 810 achieved AnTuTu scores of roughly 70,000 in 2015-era tests, establishing a baseline for high-end Android devices of that period.^[20] The core excelled in JavaScript and browser workloads, completing the SunSpider benchmark in approximately 345 ms on optimized setups, highlighting its strengths in dynamic code execution.^[43] However, real-world sustained performance was often limited by thermal throttling in mobile SoCs, where clock speeds dropped under prolonged loads to manage heat. Within the ARM family, the Cortex-A57 offered roughly 2x the single-threaded performance of the preceding Cortex-A15 in comparable tasks, driven by its 64-bit architecture and improved superscalar design.^[43]^[44]

Benchmark	Metric	Example Score (Cortex-A57 Implementation)	Clock Speed	Source
Geekbench 4	Single-core	~1500	2 GHz (Tegra X1)	NotebookCheck Tegra X1 Benchmarks^[41]
Geekbench 4	Multi-core (quad)	~5000	2 GHz (Tegra X1)	LanOC Shield TV Review^[42]
AnTuTu (v6)	Total	~70,000	2 GHz (Snapdragon 810)	Ubergizmo Snapdragon 810 Preview^[45]
SunSpider 1.0	Total time	~345 ms	2 GHz (Snapdragon 810)	SlashGear Snapdragon 810 Benchmarks^[43]

Power Efficiency and Thermal Design

The ARM Cortex-A57 core, while delivering high performance, exhibits power characteristics suited to mobile and server applications, with thermal design power (TDP) varying by configuration and workload. In mobile SoCs like the Qualcomm Snapdragon 810, the four Cortex-A57 cores operate at up to 2.0 GHz and contribute to a CPU power draw of several watts under load, reflecting the core's out-of-order execution complexity that increases dynamic power demands compared to simpler in-order designs. In server-oriented implementations, such as the AMD Opteron A1100 series, an eight-core Cortex-A57 SoC maintains a 32 W TDP profile, enabling efficient operation in datacenter environments with shared caches and interconnects.^[46] Power efficiency is enhanced through integration with the big.LITTLE heterogeneous architecture, where Cortex-A57 "big" cores handle demanding tasks while offloading lighter workloads to more efficient Cortex-A53 "LITTLE" cores, reducing average power consumption across mixed usage scenarios. This configuration, as seen in the Snapdragon 810 with four A57 cores at 2.0 GHz paired with four A53 cores at 1.5 GHz, allows the operating system scheduler to dynamically allocate tasks via the CCI-400 interconnect, mitigating the A57's higher energy footprint during idle or low-intensity operations. Thermal management poses challenges for the Cortex-A57 due to its aggressive performance targets, particularly in sustained workloads, leading to notable heat generation and throttling in early implementations. Devices based on the Snapdragon 810, such as the HTC One M9 and LG G Flex 2, experience rapid clock reductions on A57 cores—from peaks near 2.0 GHz to as low as 0.85–1.2 GHz—within 2–10 minutes of intensive use to prevent overheating, often switching to A53 cores for stability. In contrast, Samsung's Exynos 7420 implementation sustains higher clocks longer but still throttles after brief peaks.^[47] The core incorporates several architectural mitigations to optimize power and thermal performance, including dynamic voltage and frequency scaling (DVFS) for adjusting operating points based on workload demands, extensive clock gating to disable unused circuitry—such as the Advanced SIMD and floating-point unit—and dedicated power domains that isolate integer and floating-point execution units for independent control. These features enable granular power savings, with clock gating reducing dynamic dissipation during idle phases and DVFS supporting seamless transitions across frequency bins without system instability.^[48]^[49] Process node selection significantly influences the Cortex-A57's efficiency, with 20 nm implementations like the Snapdragon 810 providing substantial improvements over 28 nm designs. Silicon results indicate that 20 nm enables up to 45% better performance per watt compared to prior-generation cores like the Cortex-A15 on 28 nm, thanks to reduced leakage and denser transistor integration, though thermal density remains a consideration in multi-core clusters.

Comparisons and Legacy

Versus Other ARM Cores

The ARM Cortex-A57 represents a significant evolution from the ARM Cortex-A15, transitioning from the 32-bit ARMv7-A architecture to the 64-bit ARMv8-A architecture while maintaining an out-of-order execution model. The A57 delivers 20% to 40% higher instructions per cycle (IPC) compared to the A15, enabling roughly double the integer performance in many workloads due to enhanced branch prediction, wider execution units, and improved memory access patterns. Despite these gains, the A57 maintains a similar power envelope to the A15 in baseline configurations, targeting high-performance mobile applications but requiring careful thermal management to avoid throttling under sustained loads.^[50]^[51] In big.LITTLE configurations, the Cortex-A57 pairs with the Cortex-A53 to optimize for heterogeneous workloads, where the A57 handles bursty, high-performance tasks such as gaming or video processing, while the in-order A53 manages efficiency-critical background activities like email or web browsing. The A57 achieves approximately 2 to 2.5 times the IPC of the A53, translating to significantly higher peak throughput, but at the cost of 3 to 5 times greater power consumption, making it unsuitable for prolonged low-intensity operations. This division allows systems to achieve significant improvements in overall energy efficiency, often exceeding 50% over prior 32-bit designs, by dynamically switching cores based on demand.^[1]^[11] As the direct successor to the Cortex-A57, the Cortex-A72 refines the high-performance out-of-order design by widening the dispatch width and optimizing the pipeline for lower latency, resulting in about 20% improved power efficiency at equivalent performance levels. The A72's configurable wider issue queue supports more aggressive speculation with fewer mispredictions, while the A57 features a deeper pipeline that increases branch misprediction penalties, leading to higher average latency in control-intensive code compared to the A72's balanced approach. These enhancements in the A72 enable sustained performance closer to the A57's peaks without excessive thermal constraints.^[52] Overall, the Cortex-A57's design emphasizes peak performance for short bursts over sustained efficiency, a trade-off that distinguishes it from the more balanced profiles of both its predecessor and successor, influencing its adoption in early 64-bit mobile SoCs where raw compute outweighed long-term power budgeting.^[15]

Architectural Influence and Successors

The ARM Cortex-A57 laid the groundwork for subsequent high-performance cores in ARM's portfolio, with the Cortex-A72 emerging in 2015 as its direct successor. Building directly on the A57's wide out-of-order execution pipeline, the A72 refined microarchitectural elements such as the decoder and cache structures to deliver around 20% higher performance at equivalent power levels in various workloads while optimizing energy use by approximately 15% at equivalent frequencies on a 28 nm process. This iteration emphasized sustained performance within mobile power envelopes, scaling to 2.5 GHz while maintaining efficiency.^[52]^[53] The A57's influence extended to later designs like the Cortex-A73 (2016) and Cortex-A75 (2017), which shifted toward more balanced efficiency by partially moving away from the A57's resource-intensive out-of-order approach—the A73 adopted in-order execution for better thermal headroom, while the A75 reintroduced refined out-of-order capabilities with 20-30% gains over the A73 in integer and floating-point tasks at similar power levels. These evolutions addressed the A57's emphasis on peak throughput, prioritizing longer sustained operation in heterogeneous big.LITTLE configurations.^[54]^[55] As a pioneer of 64-bit ARMv8-A processing in consumer devices, the Cortex-A57 enabled the transition to full 64-bit support in Android 5.0 Lollipop, facilitating richer applications and larger memory addressing in premium smartphones from 2015 onward. However, its aggressive out-of-order design highlighted thermal challenges in mobile silicon, often requiring throttling in early SoCs like the Snapdragon 810 to manage heat, which influenced subsequent cores to prioritize efficiency over raw peak speed.^[56]^[57] In comparisons to x86 architectures, the A57 matched or surpassed Intel's Silvermont cores in performance per watt for single-threaded mobile tasks, thanks to its efficient 64-bit pipeline, though it trailed in multi-threaded server environments due to narrower execution resources; this edge spurred ARM's server ambitions, with A57-based chips like AMD's Opteron A1100 series marking early 64-bit ARM entries into data centers around 2016.^[58]^[59] The core's deployment accelerated 64-bit ARM adoption, powering premium devices that contributed to the 50th ARMv8-A license announced in September 2014 and widespread integration in smartphones by 2016. Its speculative execution mechanisms, however, rendered it susceptible to Spectre variant attacks revealed in 2018, which exploited branch prediction to leak data across security boundaries, prompting firmware mitigations across affected ARM implementations.^[36]^[60] By 2025, the Cortex-A57 has become obsolete for new consumer and high-end designs, displaced by Armv9 architectures offering superior efficiency and security, yet it persists in legacy embedded systems and niche servers, including the Nintendo Switch's Tegra X1 SoC for ongoing gaming support.^[15]

References

[1]
Cortex-A57 Product Support - Arm Developer
The Cortex-A57 is a high-performance Armv8-A processor with 1-4 cores, L1/L2 caches, and supports AArch32/64, TrustZone, and Neon.
[2]
About the Cortex-A57 processor - Arm Developer
The Cortex-A57 processor is a high-performance, low-power processor that implements the ARMv8-A architecture. It has one to four cores in a single processor ...
[3]
ARM and TSMC Tape Out First ARM Cortex-A57 Processor
Apr 2, 2013 · Hsinchu, Taiwan and Cambridge, UK – April 2, 2013 – ARM and TSMC (TWSE: 2330, NYSE: TSM) today announced the first tape-out of an ARM® Cortex™- ...
[4]
ARM Cortex-A57 MPCore Processor Technical Reference Manual
This is the ARM Cortex-A57 MPCore Processor Technical Reference Manual, version r1p3, which is a developed product.
[5]
ARM Launches Cortex-A50 Series, the World's Most Energy-Efficient ...
Oct 30, 2012 · ARM announced the new ARM Cortex-A50 processor series based upon the ARMv8 architecture, extending ARM's leadership in performance and low power.
[6]
https://www.arm.com/company/news/2012/10/arm-launches-cortex-a50-series-the-worlds-most-energy-efficient-64-bit-processors
[7]
Meet Atlas, ARM's New Superchip for Smartphones...and Servers
Oct 30, 2012 · The key to the A57 -- which until Tuesday was known by its code-name Atlas -- is that it can be used for 64-bit computing. This means that it ...
[8]
ARM's 64-bit big.LITTLE at 2.5 GHz+? Yes, please with TSMC's ...
Jul 29, 2014 · While the “big” ARM Cortex-A57 could hit a frequency of over 2.5GHz ... maximum frequency. There are always some power considerations ...
[9]
ARM busts out server-to-superphone superchips - The Register
Oct 30, 2012 · ARM has rolled out a new series of processors – the Cortex-A50 Series – that it says will find their way into everything from smartphones to mega-data centers.
[10]
First Samsung Cortex-A57, A53 chips arrive with big performance ...
Feb 12, 2015 · The Cortex-A53 and A57 are significantly faster than the old 32-bit Cortex-A7 and Cortex-A15 that they replace, but the Cortex-A53's improved ...<|separator|>
[11]
AMD Unveils Server Strategy and Roadmap
Jun 18, 2013 · The 64-bit CPU, code named "Seattle," is based on ARM Cortex™-A57 cores and is expected to provide category-leading throughput as well as ...
[12]
https://ir.amd.com/news-events/press-releases/detail/45/amd-unveils-server-strategy-and-roadmap
[13]
Cortex-A57 Software Optimization Guide - Arm Developer
This document provides high-level information about the Cortex-A57 pipeline, instruction performance characteristics, and special performance considerations ...
[14]
Cortex A57, Nintendo Switch's CPU - Chips and Cheese
Dec 12, 2023 · Cortex A57 is a 3-wide out-of-order core with large maximum reordering capacity, but relatively small schedulers and other supporting structures.
[15]
https://chipsandcheese.com/p/cortex-a57-nintendo-switchs-cpu
[16]
Program flow prediction - Arm Developer
The Cortex-A57 processor contains program flow prediction hardware, also known as branch prediction. With program flow prediction disabled, all taken branches ...Missing: details | Show results with:details
[17]
ARM Cortex-A57 - 7-Zip LZMA Benchmark
ARM Cortex-A57. AMD Opteron A1170 (ARM Cortex-A57), 2.0 GHz, 28 nm. RAM ... Branch misprediction penalty = 16-19 cycles. 64-bytes range cross penalty = 1 ...
[18]
The Cortex-A57 processor includes the following features
Superscalar, variable-length, out-of-order pipeline. Dynamic branch prediction with Branch Target Buffer (BTB) and Global History Buffer (GHB) RAMs, a ...Missing: cluster | Show results with:cluster
[19]
https://developer.arm.com/documentation/ddi0488/latest/introduction/features
[20]
Nvidia Announces Maxwell-Powered Tegra X1 SoC At CES
Jan 5, 2015 · Nvidia raises the mobile SoC performance bar once again with Tegra X1. Originally codenamed Erista, the X1 includes new CPU, GPU, and ISP components.
[21]
Qualcomm Snapdragon 810: Detailed Specifications and ...
Snapdragon 810 is a SmartPhone Flagship mobile processor manufactured by Qualcomm. It was released on April 2014. The SoC is manufactured using the 20 nm ...
[22]
Snapdragon 810 to power 60+ device models in 2015 - Qualcomm
Feb 8, 2015 · Over 60 premium mobile device models from a number of the world's leading smartphone makers will be powered by the Qualcomm Snapdragon 810 processor.
[23]
Smartphones with Qualcomm Snapdragon 810 processor - Kimovil
Sony Xperia Z5 price comparison · Sony Xperia Z5 Compact prices · stores to buy HTC 10 Evo · HTC One (M9) prices · where to buy LG G Flex 2 · stores to buy LG G Flex ...
[24]
NVIDIA Tegra X1 - a closer look - Android Authority
Jan 6, 2015 · The Tegra X1 supports 60fps 4K H.265, H.264, VP9 and VP8 video encode and decode, improving on the 30fps 4K limit of the Tegra K1. External ...Missing: 2 | Show results with:2
[25]
Nintendo Switch 2 vs. Switch 1: Every Feature Compared - CNET
Jun 5, 2025 · CPU/GPU. Original Switch: The original Nintendo Switch runs on an Nvidia custom Tegra X1 processor split into four ARM Cortex A57 CPU cores ...
[26]
https://www.cnet.com/tech/gaming/nintendo-switch-2-vs-switch-1-gamers-start-your-specs-breakdowns/
[27]
Samsung Exynos 5433 Octa SoC - NotebookCheck.net Tech
Oct 16, 2014 · The Samsung Exynos 5433 Octa (also called Exynos 7 Octa) is a 64-bit capable system-on-a-chip (SoC) for smartphones and tablets.
[28]
Exynos 5433-powered Samsung Galaxy Note 4 benchmarked
Aug 20, 2014 · The new Exynos chipset moves to 64-bit processing with four Cortex-A57 cores and four Cortex-A53. Those are the successors to the 32-bit A15 ...
[29]
Exynos 5433 - Samsung - WikiChip
Mar 6, 2022 · Utilizing devices[edit] · Samsung Galaxy Note 4 · Samsung Galaxy Note Edge · Samsung Galaxy Tab S2.
[30]
AMD Opteron A1100 - Now "Released" - ServeTheHome
Jan 14, 2016 · Here is AMD's slide highlighting the key points. 8x ARM Cortex-A57 cores, 8MB L3 cache, 128GB RAM support thanks to the 64-bit architecture, ...
[31]
AMD Opteron A1100 Server SoCs Feature 4 to 8 ARM Cortex A57 ...
Jan 29, 2014 · 4 or 8 core ARM Corte-A57 processors; Up to 4 MB of shared L2 and 8 MB of shared L3 cache; Configurable dual DDR3 or DDR4 memory channels with ...
[32]
AMD Announces Opteron A1100 Series 64-bit ARM Processors for ...
Jan 14, 2016 · Up to eight ARM Cortex-A57 cores with 4MB shared Level 2 and 8MB of shared Level 3 cache · 2x 64-bit DDR3/DDR4 channels supporting up to 1866 MHz ...
[33]
How ARM Makes Money - The ARM Diaries, Part 1 - AnandTech
Jun 28, 2013 · ARM's revenue comes entirely from IP licensing. It's up to ARM's licensees/partners/customers to actually build and sell the chip.
[34]
ARM signs 50th 64-bit ARMv8-A licensing agreement - KitGuru
Sep 3, 2014 · According to ARM, the cumulative 50 licenses are now spread across ARMv8-A architecture and ARM Cortex-A57 and Cortex-A53 processors, which ...
[35]
Momentum Builds For the Next Generation of ARM Processors
Sep 1, 2014 · ARM and Cadence partner to implement industry's first Cortex-A57 64-bit processor on TSMC 16nm FinFET process · ARM and Synopsys collaborate to ...Missing: statistics | Show results with:statistics
[36]
ARM and TSMC Tape-Out First ARM Cortex-A57 Processor on ...
Apr 2, 2013 · ARM and TSMC's collaboration produces optimized, power-efficient Cortex-A57 processors and libraries to support early customer implementations ...
[37]
First ARM Cortex-A57 processor taped out by TSMC, ready for fab
Apr 2, 2013 · The new chip will use TSMC's 16nm FinFET technology (though the transistors will be 20nm for the A57) and will bring up to three times the CPU ...Missing: 2014 2015
[38]
BLAS API - ARMv8A ARM Cortex A57 - blasfeo
Test machine: ARM Cortex A57 @ 2.15 GHz, theoretical maximum throughput of 8.6 (17.2) Gflops in double (single) precision.
[39]
Nvidia Tegra X1 vs Qualcomm Snapdragon 888 4G - Notebookcheck
Geekbench 4.4 - Geekbench 4.1 - 4.4 64 Bit Single-Core. 100%. X1 +. 1510 Points (15%). Model, CPU, GPU, RAM, Value. NVIDIA Shield Android TV · X1 · Tegra X1 ...
[40]
Nvidia Shield TV 2019 - Performance - LanOC Reviews
Mar 27, 2020 · Namely, I was curious about how the Tegra X1 from the original Shield TV compared to the new 2019 model. For this, there are a limited ...
[41]
Snapdragon 810 Benchmarked: 5 Things You Need To Know
Feb 12, 2015 · SunSpider finished its test of browser Javascript performance in 344.5ms, while Quadrant starts to show its age with a ridiculous score of ...
[42]
Relative Performance of ARM Cortex-A 32-bit and 64-bit Cores
Apr 9, 2015 · Anandtech has a table with about the same numbers, including Cortex A9 > Cortex A53, except for Cortex A17, so I'll update the chart with 4.0 ...
[43]
Snapdragon 810 Fully Revealed: Benchmarks, Architecture
Feb 12, 2015 · Let's start with the most recent information: our hands-on benchmark scores, using with a reference development device and the latest firmware available.Missing: Cortex- A57
[44]
[PDF] AMD Opteron™A1100 SOC Series
Low Power Enterprise Solution – Providing exceptional performance- per-watt, AMD Opteron A1100 SOCs support thermal design power. (TDP) profiles as low as 25W.
[45]
In-depth with the Snapdragon 810's heat problems - Ars Technica
Apr 23, 2015 · Throttling processor speed in smartphones, tablets, and laptops to avoid overheating is completely normal, but the 810 runs especially warm.Missing: mW TDP
[46]
ARM Cortex-A57 MPCore Processor Technical Reference Manual ...
This book is the Technical Reference Manual (TRM) for the Cortex-A57 MPCore processor.
[47]
Dynamic Voltage and Frequency Scaling - Arm Developer
A way in which software running on the ARM core can reliably modify the clock speed and supply voltage of the core, without causing problems in the system.Missing: A57 gating
[48]
A Walk Through the Cortex-A Mobile Roadmap - Arm Developer
Nov 19, 2013 · ARMv7 incorporated 3 key elements: the NEON single instruction multiple data (SIMD) unit, ARM trustZone security extensions, and the thumb2 ...
[49]
ARM Shares Updated Cortex-A53/A57 Performance Expectations
May 22, 2014 · Although the -A57 will do so at higher power, power efficiency may be better depending on the workload thanks to the added performance.Missing: goals class
[50]
Comparing ARM Cortex-A72 and ARM Cortex-A57 - Arm Community
Apr 28, 2015 · The A72 delivers CPU performance that is 50x greater than leading smartphones from five years ago and will be the anchor in premium smartphones for 2016.
[51]
ARM reveals more Cortex-A72 info, promises excellent efficiency
Apr 28, 2015 · Compared to the Cortex-A57, the new CPU core will deliver a performance boost of 16-50%, depending on the scenario. Memory performance will be ...<|control11|><|separator|>
[52]
ARM's newest CPU design wants to make throttling a thing of the past
May 29, 2016 · ARM says that Cortex A73 CPUs ought to be able to perform at peak levels for much longer than A57 or A72 CPUs could—it's not clear what ...
[53]
Arm Cortex-A75: ground-breaking performance for intelligent solutions
May 29, 2017 · Cortex-A75 improves performance 20% over the Cortex-A73 when compared at same frequencies. This additional compute capability, combined with ...Missing: less | Show results with:less
[54]
ARM Sets New Standard for the Premium Mobile Experience
Feb 3, 2015 · Cambridge, UK, Feb. 3, 2015 - ARM today announced a suite of IP that will enable a new standard for premium experiences on 2016 mobile ...
[55]
More rumors surface regarding Snapdragon 810 overheating issues
Jan 8, 2015 · ... Snapdragon 810 is suffering from crippling overheating issues. Apparently, this problem is caused by the high-performance Cortex-A57 cores ...Missing: mW TDP
[56]
ARM Cortex-A57 and Intel Silvermont – most efficient mobile cores ...
Feb 19, 2015 · However, at its maximum clock-rate, the x86 design uses slightly more power, putting a bit behind in power efficiency. Apple's Cyclone+ and ...
[57]
AMD to Accelerate the ARM Server Ecosystem With the First ARM ...
Jan 28, 2014 · The AMD Opteron A1100 Series processors support: 4 or 8 core ARM Cortex™-A57 processors; Up to 4 MB of shared L2 and 8 MB of shared L3 cache ...
[58]
Arm CPU Security Bulletin: Spectre/Meltdown
Jul 23, 2025 · In 2018, Google disclosed research findings identifying a new class of security attacks named Spectre and Meltdown. The attacks were based ...