Fact-checked by Grok 2 weeks ago

Explicitly parallel instruction computing

Explicitly parallel instruction computing (EPIC) is a microprocessor instruction set architecture paradigm that enables compilers to explicitly specify instruction-level parallelism, allowing multiple operations to execute concurrently without relying on complex hardware scheduling mechanisms typical of superscalar designs. Developed through a collaboration between Hewlett-Packard (HP) and Intel starting in 1994, EPIC forms the basis of the IA-64 instruction set used in the Itanium processor family, aiming to achieve high performance in 64-bit computing for servers and workstations by overcoming limitations in traditional RISC and CISC architectures, such as branch mispredictions and memory latency. EPIC evolved from very long instruction word (VLIW) concepts but incorporates advanced features like predication, which uses predicate registers to conditionally execute instructions and reduce branches, and , including control and data speculation to break dependences and expose more parallelism. These mechanisms, supported by optimizations, allow EPIC processors to issue multiple independent operations per cycle—often bundled into 128-bit instructions comprising three 41-bit operations—potentially scaling to wide-issue machines with minimal hardware complexity. The architecture also includes innovations such as rotating register files for efficient loop handling, branch registers for decoupled , and mechanisms like the Memory Conflict Buffer to manage speculative loads safely. Introduced publicly in 1997 at the Microprocessor Forum, was implemented in Intel's Merced processor (later ) released in 2001, with subsequent generations like Itanium 2 improving performance through enhanced and predication support. Studies on EPIC prototypes, such as the IMPACT project at the University of Illinois, demonstrated average speedups of 83% across benchmarks by integrating these features, highlighting its potential for in integer and floating-point workloads. Despite its technical innovations, EPIC's adoption was limited due to ecosystem challenges, though it influenced subsequent research in compiler-directed parallelism and explicit .

Historical Development

Origins in VLIW Architectures

Very Long Instruction Word (VLIW) architectures represent an early approach to exploiting (ILP) by relying on the to explicitly specify multiple independent operations within a single, extended instruction format, allowing the hardware to execute them concurrently without complex runtime scheduling hardware. In VLIW designs, the performs static scheduling, analyzing dependencies across basic blocks or traces to pack operations into fixed-length instruction words, typically ranging from 128 to 256 bits or more, which encode several operations (e.g., arithmetic, load/store) targeted to specific functional units. This contrasts with superscalar architectures, where dynamic hardware dispatches instructions at runtime; in VLIW, the absence of such dispatch logic simplifies the processor datapath but shifts the burden entirely to optimizations like trace scheduling. The conceptual foundations of VLIW emerged from research at Yale University in the late 1970s and early 1980s, led by Joseph A. Fisher, who initially explored global microcode compaction techniques to generate horizontal microcode for emulators like the CDC-6600. Fisher's seminal 1981 paper introduced trace scheduling, a global compaction algorithm that identifies likely execution paths (traces) through the control flow graph and schedules operations along them, enabling parallelism beyond basic block boundaries while inserting compensation code for less frequent paths. This work directly inspired VLIW, culminating in the ELI-512 prototype developed at Yale in the early 1980s, an academic simulator and code generator for an idealized VLIW machine capable of executing up to 512 RISC-level operations in parallel, demonstrating the feasibility of compiler-driven ILP extraction. By the mid-1980s, these ideas transitioned to commercial implementations, with Multiflow Computer releasing the TRACE series starting with the TRACE-14 in 1987 as the first VLIW minisupercomputer, with configurations supporting up to 28 operations per cycle in the TRACE-28 model. Concurrently, Cydrome's Cydra 5, also launched in 1987, introduced a heterogeneous multiprocessor design with a 256-bit VLIW numeric processor supporting seven parallel operations, emphasizing departmental supercomputing for numerical applications. Core principles of VLIW emphasize compiler responsibility for all parallelism detection and scheduling, with fixed instruction formats dictating that misaligned operations be padded with no-operation (NOP) instructions to maintain slot alignment across functional units, ensuring lockstep execution. Without dynamic hardware mechanisms for dependency resolution or reordering, VLIW performance hinges on accurate static analysis, but early designs suffered notable limitations: the absence of branch predication mechanisms often required code duplication along conditional paths to fill instruction slots, leading to significant code bloat—sometimes doubling or tripling program size for branch-intensive code. Additionally, sensitivity to compiler inaccuracies, such as suboptimal trace selection or unpredicted data dependencies, could result in underutilized slots and reduced ILP, as the hardware lacked adaptability to runtime variations. Binary incompatibility further hindered adoption, as varying numbers of functional units, slot widths, and latencies across VLIW implementations (e.g., Multiflow's 28 slots versus Cydrome's 7) rendered executables non-portable without recompilation. These rigidities in VLIW, particularly around and portability, later motivated extensions like Explicitly Parallel Instruction Computing (), which aimed to enhance flexibility while retaining compiler-driven parallelism.

Formation of by HP and Intel

In June 1994, (HP) and announced a to co-develop a next-generation 64-bit processor architecture, driven by the recognized limitations of contemporary RISC designs in fully exploiting (ILP) for . This partnership sought to create a scalable solution for enterprise servers and scientific workloads, where traditional superscalar processors struggled with dynamic scheduling overheads that limited ILP extraction. HP's contributions stemmed from its 1990s internal research projects on VLIW-inspired architectures, influenced by earlier work from VLIW companies such as Multiflow and Cydrome, including the 1988 hiring of key experts Bob Rau and Michael Schlansker from Cydrome to advance compiler techniques for parallelism. In 1997, Schlansker and Rau coined the term "Explicitly Parallel Instruction Computing" (EPIC) during their collaborative efforts with Intel, framing it as an evolution of VLIW that emphasized explicit compiler-hardware cooperation to specify parallelism more flexibly than VLIW's rigid lockstep execution model. A seminal 1997 presentation and subsequent whitepaper by HP and Intel detailed EPIC's principles, highlighting its roots in VLIW as the foundation for explicit parallelism indication. The core design goals of included overcoming VLIW's inflexibility by allowing compilers to annotate independent instructions for parallel execution, incorporating 64-bit addressing to handle vast memory requirements in high-performance systems, and ensuring inherent scalability through massive register files and branch prediction aids. specifically advanced predication concepts, building on conditional nullification features from its architecture to reduce branch penalties via if-conversion, while provided microarchitectural expertise derived from the i860's RISC innovations and the Pro's pipeline. These efforts culminated in the evolution of into the formal instruction set architecture specification, publicly revealed by and in May 1999.

Core Architectural Principles

Instruction Bundling and Parallelism Specification

In Explicitly Parallel Instruction Computing (EPIC), instructions are grouped into fixed 128-bit bundles to facilitate the explicit specification of parallelism. Each bundle consists of three 41-bit instructions, known as syllables, and a 5-bit field, totaling 128 bits. This structure ensures that instructions are fetched and aligned in a predictable manner, allowing the hardware to process them as without complex dynamic . The 5-bit template in each bundle defines the execution unit types for the three syllables—such as M for operations, I for integer operations, F for floating-point, B for branches, L for , A for arithmetic, or X for no operation—and indicates the presence of stops for . There are eight basic patterns, with variations that signal execution within the bundle (no stops) or sequential execution across stops, enabling the to pack independent operations without relying on dependency checks. Stops, denoted in as ;;, mark boundaries between instruction groups, ensuring that instructions across a stop are serialized while those within a group can proceed concurrently if data-independent. This mechanism provides flexibility beyond traditional (VLIW) formats by allowing instruction groups to span multiple bundles. EPIC's approach to parallelism is explicit, with the responsible for annotating independent instructions within bundles for simultaneous issue to multiple functional units, in contrast to dynamic out-of-order scheduling in superscalar processors. By leveraging the and stop information, the hardware can dispatch all instructions in a group in parallel, provided no true data dependencies exist, thereby shifting the burden of (ILP) extraction to compile-time analysis. This enables theoretical ILP of up to 6-9 operations per cycle in implementations like the processor family, depending on the number of available execution units. For example, template 0 (MII) might bundle a memory load in the first slot with two parallel integer ALU operations in the second and third slots, such as { .mii ld8 r1 = [r2] ; add r3 = r4, r5 ; add r6 = r7, r8 ;; }, where the add operations execute concurrently with the load if independent, demonstrating the compiler's role in ILP extraction. EPIC instructions follow a 41-bit format, comprising a 6-bit , source and destination registers, and immediate values where applicable, supporting operations across various unit types. The architecture provides 128 general-purpose registers (GRs), with registers r32 through r127 forming a rotating that facilitates software pipelining by automatically renaming registers across loop iterations, reducing the need for explicit and enhancing ILP without additional hardware complexity.

Predication and Speculation Mechanisms

In Explicitly Parallel Instruction Computing () architectures, predication enables conditional execution of instructions without relying on branches, using a dedicated set of 64 one-bit registers (PR0 to PR63) to qualify operations. Each instruction can specify a qualifying (qp) from these registers, such that if the value is 1 (true), the instruction executes normally; otherwise, it is suppressed and treated as a no-op. For instance, the syntax (p1) add r1 = r2 + r3 executes the addition only if register p1 is true, allowing the to express directly through predicates rather than explicit jumps. The predication mechanism operates by transforming traditional constructs into predicated instruction blocks during compilation, a process known as if-conversion. The identifies suitable —typically short, predictable ones—and replaces them with parallel paths where instructions from both branches are issued together, guarded by complementary (e.g., p1 for the then-path and ~p1 for the else-path). then executes the entire block, nullifying unnecessary instructions based on predicate values, which facilitates the formation of hyperblocks—large, straight-line sequences of operations that maximize (ILP) by overlapping control-dependent code. This approach shifts control decisions from runtime branches to compile-time annotations, minimizing disruptions from branch mispredictions. Complementing predication, incorporates multiple forms of to handle uncertainties in , data dependencies, and memory addressing, enabling aggressive reordering of instructions. speculation allows code following a to execute early, guided by compiler-provided hints, while data speculation permits loads to occur before potentially stores, and address speculation involves tentative memory address calculations. Recovery from speculative failures is managed through deferred , using Not-a-Thing () bits in registers to mark invalid results and an Advanced Load Address Table (ALAT) to track speculative loads for later validation. Key instructions support these speculative operations, such as the advanced load ld8.a (or ld.a), which speculatively fetches 8-bit and registers the in the ALAT without immediate faulting on errors. occurs via the check load ld8.c (or ld.c), which compares the actual load against the ALAT entry and either confirms success or triggers a deferred exception if a conflict (e.g., an intervening ) is detected. Predicates integrate seamlessly with these instructions—for example, a predicated check can conditionally validate —ensuring safe execution even in uncertain environments while avoiding costly rollbacks. These mechanisms collectively enhance EPIC's ability to extract ILP by mitigating and hazards. Benchmarks demonstrate that predication eliminates a substantial portion of branches, with if-conversion removing up to 29% of mispredicted branches in SPEC2000 workloads, while combined predication and yield an average 79% performance improvement over non-speculative baselines, achieving up to 2.85 (). Predicated instructions are packaged within instruction bundles to maintain explicit parallelism, but the focus remains on condition resolution.

Major Implementations

The Itanium Processor Family

The Itanium processor family represented Intel's commercial implementation of the Explicitly Parallel Instruction Computing () architecture, developed in collaboration with (HP) and targeted at enterprise servers and high-performance computing () environments. The inaugural processor, codenamed Merced and launched in June 2001, operated at clock speeds of 733 MHz and 800 MHz with 2 MB or 4 MB of L3 cache, marking the first production EPIC design with a 6-wide issue capability. Merced featured six specialized execution units—two arithmetic logic units (A-units), two floating-point units (F-units), two units (M-units) for loads and stores, along with (B-units) and extended (X-units)—enabling parallel execution of up to six as scheduled by the . The included 128 general-purpose 64-bit registers for and operations, 128 82-bit floating-point registers, 64 one-bit registers for conditional execution, and 8 registers to support explicit without traditional dynamic scheduling. Its EPIC-specific pipeline emphasized in-order issue with no hardware , relying instead on -directed instruction bundling and predication to exploit ism. Subsequent generations evolved the design for higher performance and multi-core scalability. The McKinley processor, released in 2002, increased the clock speed to 1 GHz and incorporated improved branch prediction mechanisms to reduce misprediction penalties in EPIC bundles. , introduced in 2003, reached 1.5 GHz speeds. Montecito arrived in 2006 as Intel's first dual-core , utilizing a with integrated dual-core execution and Technology for better multi-threaded performance. Tukwila, launched in 2010, scaled to quad-core configurations on a and introduced the QuickPath Interconnect for faster inter-processor communication in multi-socket systems. Poulson followed in 2012 with eight cores on a , featuring a 12-wide issue , enhanced multithreading, and over 3.1 billion transistors to boost HPC throughput. The final model, Kittson (part of the 9700 series), debuted in the second quarter of 2017 with up to eight cores at 2.66 GHz and 32 MB cache, and continued optimizations before production wound down. Early benchmarks demonstrated Itanium's strengths in native EPIC code, with the Itanium 2 achieving SPECfp_base2000 scores of 1356 at 1 GHz—reflecting 20-30% fewer branches and 40% fewer memory operations than equivalent Alpha 21264 binaries—yielding approximately 1.5-2x speedup over DEC Alpha processors in select HPC workloads like floating-point intensive simulations. However, the x86 compatibility mode, implemented via software emulation (IA-32 Execution Layer), introduced substantial overhead, often reducing performance to 50-70% of native x86 execution on contemporary processors. Production of the family, a joint effort between and , peaked at around 100,000 units shipped annually in 2004 before declining amid market shifts. announced the discontinuation of the 9700 series in 2019, accepting final orders until January 30, 2020, with shipments ceasing on July 29, 2021, effectively ending the joint fabrication and development partnership.

Alternative EPIC-Inspired Designs

Following the commercial launch of , research into principles persisted in academic and specialized industrial settings, adapting explicit parallelism for niche applications like and embedded systems. These designs often hybridized 's instruction bundling and predication with VLIW elements or reconfigurable hardware to address limitations in fixed-bundle approaches. By the , approximately 5-10 prototypes and extensions were documented in IEEE and ACM proceedings, demonstrating 's versatility beyond general-purpose servers. The Elbrus processor family, developed by Russia's since the early 2000s, represents a prominent non-Western -inspired implementation targeted at military and workloads. The Elbrus-2000, introduced in 2001, employs a 64-bit architecture with in-order execution, featuring instruction bundling for explicit parallelism and a Predicate Logic Unit (PLU) that converts control dependencies into predicated operations to eliminate branches. This design supports up to 22.6 billion 8-bit operations per second at 300 MHz, with six arithmetic-logic units (ALUs) distributed across two clusters, each backed by 64 KB L1 data caches and synchronized register files. Later iterations, such as the Elbrus-8C in 2018, evolved into an 8-core VLIW-EPIC hybrid on a 28 nm process, issuing 5-8 operations per cycle while retaining predication for and adding asynchronous array prefetching to mitigate cache misses in HPC tasks. Subsequent models, such as the Elbrus-8SV released in 2023, further refined the architecture on a 28 nm process, supporting up to 1.5 GHz with integrated GPU elements for enhanced HPC and secure computing applications in as of 2025. Unlike pure designs like , Elbrus incorporates dynamic scheduling via prefetch buffers and thread-level parallelization in its compiler, achieving speedups of 1.37-1.61 on SPEC benchmarks through loop splitting and dependence analysis. These processors prioritize energy efficiency and security for domestic supercomputing, with peak floating-point performance comparable to contemporary chips in specialized simulations. Academic prototypes further extended EPIC concepts, emphasizing reconfigurability and domain-specific optimizations. The IMPACT project at the (UIUC) in the developed compiler techniques for EPIC architectures, including extensions for media processing through hyperblock formation and predicated execution to boost (ILP) in control-intensive workloads like video encoding. These efforts, validated on simulators, exposed up to 2.3x performance gains via structural transformations such as and predication, influencing later EPIC toolchains. Similarly, the TRIPS project at the in the introduced a configurable (Explicit Data Graph Execution) architecture inspired by EPIC's explicit ILP, organizing 16 execution units in a 4x4 grid for dynamic issue of bundled instructions up to 128 per hyperblock. TRIPS emphasized reconfigurability over fixed bundles, using operand networks for dataflow-like execution and achieving polymorphous modes for ILP, thread-level parallelism (TLP), and data-level parallelism (DLP) in a tiled . This grid-based design contrasted with Itanium's rigid scheduling by allowing runtime adaptation, targeting scalable performance in nanoscale chips. Other experimental efforts included HP's 1999 Merced simulator, which prototyped early bundling and predication mechanisms before hardware realization, aiding validation for explicit parallelism. Post-2000 , spurred by Itanium's mixed reception, focused on hybrids like Elbrus's dynamic elements and TRIPS's reconfigurability, with IEEE papers highlighting 5-10 such prototypes by 2020 that advanced for embedded and supercomputing domains without relying on commercial x86 dominance.

Impact and Legacy

Commercial Challenges and Discontinuation

The EPIC architecture, as implemented in the processor family, faced significant commercial challenges stemming from poor with the dominant x86 ecosystem. Early systems relied on software for x86 applications, which resulted in significant performance penalties compared to native x86 execution on contemporary processors. This incompatibility deterred adoption, as most required either recompilation or , limiting 's appeal beyond specialized environments. Compiler technology for proved immature upon 's 2001 launch, struggling to extract the (ILP) promised by the architecture's explicit bundling. Optimization lags persisted through 2005, with compilers unable to fully utilize predication and mechanisms, leading to underwhelming real-world performance despite theoretical advantages. These software hurdles delayed development and reinforced perceptions of Itanium as unreliable for general-purpose . Market factors exacerbated these technical issues, as competition intensified from x86-based alternatives. The 2003 release of AMD's processors offered 64-bit extensions with full x86 compatibility at lower cost and power, capturing server market share from . Similarly, IBM's PowerPC architectures provided robust scalability for (HPC) without EPIC's compilation demands. Early models, such as the Itanium 2, consumed up to 150W TDP, contributing to high power and cooling costs that undermined efficiency claims. Efforts to build an EPIC ecosystem failed, with limited vendor support and software availability locking out broader adoption. Intel's strategic pivot in marked a turning point, redirecting resources toward x86 enhancements like the family amid declining sales. The last new design, the 9700-series (Kittson), shipped in 2017, with Intel accepting orders until January 2020 and ceasing shipments by July 2021, signaling full end-of-life (EOL) for new hardware. Legacy support from for -based servers and 11i v3 extends until December 31, 2025. Economically, and invested over $10 billion collectively in R&D and promotion by 2006, including HP's $3 billion commitment from 2004. Despite niche successes in HPC—such as NASA's supercomputers and systems— captured less than 1% of the server market. Post-2010 SPEC benchmarks highlighted persistent deficits, with lagging behind x86 processors in integer workloads due to ecosystem limitations rather than raw hardware capability.

Influence on Modern Computing Research

Despite the commercial discontinuation of the Itanium processor family in 2021, core principles such as predication and explicit instruction bundling continue to influence contemporary instruction set architectures and research in (ILP). Predication, a mechanism to conditionally execute instructions without branches to reduce hazards, was a hallmark of EPIC designs and has been adopted in modern extensions. For instance, ARM's Scalable Vector Extension 2 (SVE2), introduced in the 2020s, incorporates advanced predication using predicate registers to vector operations, enabling efficient handling of irregular data patterns in workloads. Similarly, the Vector Extension (RVV 1.0, ratified in 2021) supports vector predication through mask registers, allowing compilers to explicitly control parallelism in vector instructions, echoing EPIC's compiler-centric approach to ILP. EPIC's speculation mechanisms, which allow compilers to advance instructions past unresolved branches or memory dependencies, have parallels in modern GPU architectures. NVIDIA's Volta architecture (2017) enhanced its (SIMT) model with independent thread scheduling, speculatively executing divergent paths within warps to improve utilization, drawing on EPIC-inspired ideas for managing parallelism in massively threaded environments. This has influenced subsequent GPU designs for and scientific computing, where explicit scheduling aids in exploiting ILP under irregular . In academic settings, concepts remain integral to education. Post-2006 editions of Computer Architecture: A Quantitative Approach by and Patterson dedicate sections and appendices to EPIC, VLIW, and their role in ILP, using as a to illustrate -hardware co-design. Recent research builds on these foundations, exploring compiler techniques for heterogeneous cores that incorporate EPIC-like bundling to optimize ILP across CPU-GPU systems. Modern adaptations extend EPIC ideas to emerging domains like , where low-power parallelism is critical. Proposals for extensions in 2022 incorporated vector predication to enable EPIC-style bundling in resource-constrained devices, facilitating efficient execution of parallel tasks in and embedded applications. In the 2020s, research on hybrid EPIC for accelerators has gained traction, with designs like those inspired by architectures exploring partial explicit scheduling to boost tensor operations, though full implementations remain experimental. Quantum-inspired ILP efforts, such as DARPA's Underexplored Systems for Utility-Scale (US2QC) program initiated in 2022, draw on EPIC's explicit parallelism to inform classical-quantum hybrid solvers for optimization problems. These developments underscore EPIC's enduring conceptual legacy in pushing the boundaries of research.

References

  1. [1]
    HP and Intel Unveil Breakthrough EPIC Technology at ...
    EPIC technology breaks through the sequential nature of today's conventional processor architectures by allowing the software to communicate explicitly to the ...Missing: original paper<|control11|><|separator|>
  2. [2]
    Historical background for HP/Intel EPIC and IA-64 -- Mark Smotherman
    The design style of EPIC (explicitly parallel instruction computing) did not appear instantaneously, like Athena springing from Zeus' head.
  3. [3]
    [PDF] Integrated Predicated and Speculative Execution in the IMPACT ...
    The term Explicitly Parallel Instruction Computing (EPIC) was coined recently by Hewlett Packard and Intel in their joint an- nouncement of the IA-64 ...
  4. [4]
    [PDF] Trace Scheduling: A Technique for Global Microcode Compaction
    In this paper "trace scheduling" is developed as a solution to the global compaction problem. Trace scheduling works on traces (or paths) through ...Missing: Josh VLIW origins
  5. [5]
    A VLIW architecture for a trace scheduling compiler
    Multiflow Computer, Inc., has now built a VLIW called the TRACE TM along with its companion Trace Scheduling TM compacting compiler.
  6. [6]
    [PDF] Bulldog: - A Compiler for VLIW Architectures - Computer Science
    The Bulldog compiler is finished, and it compiles ordinary scientific programs into highly parallel machine code for a large class of VLIWs, achieving order- of ...Missing: duplication bloat
  7. [7]
    [PDF] Intel, HP Ally on New Processor Architecture: 6/20/94 - CECS
    Jun 20, 1994 · The basic concept of VLIW is to allow the compiler to directly schedule instructions for a large set of parallel function units, removing this ...Missing: EPIC | Show results with:EPIC
  8. [8]
    Inside IA-64 - halfhill.com
    It's not an empty claim. EPIC is the result of work that started long before Intel and HP formed their IA-64 partnership in 1994. Two direct ancestors are the ...Missing: alliance | Show results with:alliance
  9. [9]
    Intel, HP Reveal IA-64 Instruction Set Architecture
    In October 1997, Intel and HP announced that the IA-64 architecture would utilize a new technology called EPIC (Explicitly Parallel Instruction Computing), ...Missing: formation | Show results with:formation
  10. [10]
    [PDF] Intel® Itanium® Architecture Software Developer's Manual, Volume 1
    May 3, 2010 · Intel® processors based on the Itanium architecture may contain design defects or errors known as errata which may cause the product to deviate ...
  11. [11]
    [PDF] Understanding EPIC Architectures and Implementations
    EPIC (Explicitly Parallel Instruction Computing) is a new instruction set architecture, like the IPF, that differs from superscalar and VLIW architectures.
  12. [12]
    [PDF] Intel Itanium® Architecture Software Developer's Manual
    ... architecture technology called EPIC, or Explicitly Parallel Instruction Computing. A key feature of the Itanium architecture is IA-32 instruction set ...
  13. [13]
    [PDF] The Impact of If-Conversion and Branch Prediction on Program ...
    We see that there is a substantial reduction in percent of CPU cycles due to branch mispredictions for all benchmarks except perlbmk. In Section 5, we will ...
  14. [14]
    Intel® Itanium™-Based Systems Poised For Production
    Itanium processors will feature 2 and 4 MB of L3 cache and 800 and 733 MHz frequency speeds at prices ranging from $1,177 to $4,227. Intel ...Missing: Merced launch
  15. [15]
    Intel Launches Itanium: OEMs Unveil Systems
    The first group of Itanium processors come at clock speeds of 733 MHz and 800 MHz. They feature 2 MB or 4 MB of L3 cache. Prices range from ...Missing: Merced | Show results with:Merced
  16. [16]
    [PDF] The Itanium (I64) family
    During the introduction of the Poulson processor (11/2012) Intel announced that future generations of Itanium processors (practically Kittson) will adopt an.
  17. [17]
    [PDF] Dual-Core Intel Itanium Processor 9100 Series
    This processor will be equipped with more than two billion transistors and can be expected to deliver more than double the performance of today's Intel Itanium ...
  18. [18]
    [PDF] Intel® Itanium® Processor 9300/9500/9700 Series: Datasheet
    The Intel® Itanium® Processor 9500 Series processor supports six Intel® QuickPath Interconnects at the socket, four full width and two half width. The ...
  19. [19]
  20. [20]
    [PDF] A Brief Analysis of the SPEC CPU2000 Benchmarks on the Intel ...
    Itanium/Alpha ISAs: • Itanium® arch has 40% fewer memory operations and 30% fewer branches than Alpha (some impact from no-pgo on Alpha).Missing: HPC | Show results with:HPC
  21. [21]
    X86 Emulation on Itanium (IA-32 Execution Layer)
    Nov 5, 2023 · Here are a selected number of benchmarks comparing the Hardware Emulation in the Madison based Itanium 2 1.3 GHz chip to the IA-32 Execution Layer.
  22. [22]
    IDC: Itanium 2 shipments to exceed 100,000 units in 2004 - digitimes
    Sep 23, 2004 · Intel Itanium 2 shipments are expected to surpass shipments of the processors used in Hewlett-Packard (HP) PA-RISC servers in 2005, ...Missing: peak annual
  23. [23]
    Itanium's demise approaches: Intel to stop shipments in mid-2021
    Feb 1, 2019 · Intel has announced that it will fulfill the final shipment of Itanium 9700 processors on July 29, 2021. The company says orders must be placed no later than ...Missing: timeline discontinuation peak
  24. [24]
    [PDF] A New Golden Age for Computer Architecture
    Feb 2, 2019 · on EPIC ideas to replace the 32-bit x86. High expectations were set for the first. EPIC processor, called Itanium by In- tel and Hewlett ...
  25. [25]
    E2K Technology and Implementation - SpringerLink
    Aug 18, 2000 · It has developed computers based on superscalar, shared memory multiprocessing and EPIC architectures. The main goal has always been to create a ...
  26. [26]
    [PDF] THREAD-LEVEL AUTOMATIC PARALLELIZATION IN THE ELBRUS ...
    In this paper, we discuss automatic parallelization in the Elbrus (EPIC architecture) optimizing compiler. We describe thread-level automatic ...
  27. [27]
    The 5th Generation 28nm 8-Core VLIW Elbrus-8C Processor ...
    Elbrus-8C has 5x the peak performance for floating-point operations as compared with its predecessor, the Elbrus-4C processor, due to improved single-core ...
  28. [28]
    [PDF] Deploying Elbrus VLIW CPU ecosystem for materials science ...
    Modern Elbrus-4S and Elbrus-8S processors show floating point performance comparable to the popular Intel processors in the field of high-performance computing.Missing: EPIC hybrid
  29. [29]
    [PDF] Field-testing IMPACT EPIC Research Results in Itanium 2.
    The bulk of EPIC transformation, including predication and speculation, is performed in Lcode, a low-level representa- tion. First, classical optimization ...Missing: percentage | Show results with:percentage
  30. [30]
    [PDF] Scaling to the End of Silicon with EDGE Architectures
    Jul 2, 2004 · TRIPS: AN EDGE ARCHITECTURE. Just as MIPS was an early example of a RISC. ISA, the TRIPS architecture is an instantiation of an EDGE ISA.Missing: paper | Show results with:paper
  31. [31]
    EPIC/IA64 for GCC
    Jan 6, 2000 · The first EPIC chip that was announced, the Merced, will be based on the IA64 instruction set architecture, jointly developed by HP and Intel.
  32. [32]
    [PDF] Understanding EPIC Architectures and Implementations
    ... EPIC (Explicitly Parallel Instruction Computing), and a specific architecture called the IPF (Itanium Processor Family). This paper seeks to illustrate the ...
  33. [33]
    Itanium: A cautionary tale - CNET
    Dec 7, 2005 · The chip, initially code-named PA-WideWord, used an architecture called Explicitly Parallel Instruction Computing, or EPIC. HP hoped the design ...
  34. [34]
    AMD readies Opteron to challenge Intel's Itanium - Computerworld
    Apr 25, 2002 · According to Graf, Opteron chips will compete with Intel's Xeon processors as well as with its Itanium chips. ... AMD's single chip will compete ...
  35. [35]
    Intel And Friends To Invest $10 Billion In Itanium - Network Computing
    Jan 26, 2006 · Intel and a group of technology companies that includes Hewlett-Packard on Thursday pledged to spend $10 billion over the remainder of the ...Missing: cost | Show results with:cost
  36. [36]
    SVE2 architecture fundamentals - Arm Developer
    This guide is a short introduction to version two of the Scalable Vector Extension (SVE2) for the Arm AArch64 architecture. In this guide, you can learn ...
  37. [37]
    NVIDIA Volta GPUs Power the World's Fastest Supercomputer
    Jun 8, 2018 · The supercomputer is equipped with 27,648 NVIDIA Volta GPUs and is designed for research in energy, advanced materials, and artificial ...
  38. [38]
    (PDF) EPIC: An Architecture for Instruction-Level Parallel Processors
    2 However, the name EPIC was coined later, in 1997, by the HP-Intel alliance. 3 Moore's Law states that the number of transistors, on a semiconductor die ...