Fact-checked by Grok 2 weeks ago
References
-
[1]
[PDF] Hardware Acceleration for Knowledge Graph ProcessingThere are several kinds of hardware acceleration, includ- ing GPU (Graphics Processing Unit) for graphics and parallel tasks, DSP (Digital Signal Processor) for ...
-
[2]
[PDF] ACCELERATOR ARCHITECTURES - Duke Computer ScienceExamples of accelerators include floating-point coprocessors, graph- ics ... accelerator hardware. Most accelerator spac- es are, however, less mature ...
-
[3]
Hardware Accelerator - an overview | ScienceDirect TopicsHardware accelerators are defined as dedicated hardware components, such as GPUs, TPUs, and FPGAs, that enhance energy efficiency by offloading ...Introduction to Hardware... · Applications of Hardware...
-
[4]
AI Acceleration - ML Systems TextbookThis chapter examines hardware acceleration principles and methodologies for machine learning systems. The analysis begins with the historical evolution of ...
-
[5]
[PDF] Hardware Acceleration for Knowledge Graph ProcessingHardware acceleration involves harnessing the capabilities of specialized hardware components designed to perform specific tasks more efficiently than what ...
-
[6]
The Church-Turing Thesis (Stanford Encyclopedia of Philosophy)Jan 8, 1997 · The Church-Turing thesis concerns the concept of an effective or systematic or mechanical method, as used in logic, mathematics and computer science.The Case for the Church... · The Church-Turing Thesis and...
- [7]
-
[8]
William Thomson's Tide Predicting Machine, 1872The machine is a mechanical analogue computer which traces the tidal curve for a given location, by combining ten astronomical components.
-
[9]
Were Tide Prediction Machines the First Analogue Computers?Apr 14, 2020 · Invented in around 1873, tidal prediction machines were analogue computers that provided an accurate and efficient means of predicting the ocean tide.
-
[10]
ENIAC - CHM Revolution - Computer History MuseumENIAC (Electronic Numerical Integrator And Computer), built between 1943 and 1945—the first large-scale computer to run at electronic speed without being slowed ...Missing: hardware acceleration
-
[11]
The Electronic Computers, Part 3: ENIAC - Creatures of ThoughtOct 23, 2017 · Other functional units included multipliers and function generators for doing table ... By November 1945, ENIAC was fully functioning. It could ...
-
[12]
The IBM 704 - Columbia UniversityThe IBM 704 Computer (1954). The first mass-produced computer with core memory and floating-point arithmetic, whose designers included John Backus.
-
[13]
IBM's 704, the First Computer to Incorporate Indexing & Floating ...In 1954 IBM announced the 704 Data Processing System Offsite Link . Though the company did not designate it as a computer, it was the first commercially ...
-
[14]
The IBM 7030, aka StretchThe IBM 7030, or Stretch, was IBM's first supercomputer, using transistors and advanced storage, and was the fastest for three years. It was used for nuclear ...
-
[15]
[PDF] Planning a Computer System : Project StretchMar 17, 2003 · The project started toward the end of 1954. By then IBM was producing several stored-program digital computers : the IBM 650, a medium-sized ...
-
[16]
[PDF] First draft report on the EDVAC by John von Neumann - MITJune 30, 1945. This is an exact copy of the original typescript draft as obtained from the University of Pennsylvania. Moore School Library except that a ...
-
[17]
[PDF] Von Neumann Computers 1 Introduction - Purdue EngineeringJan 30, 1998 · In 1945, von Neumann wrote the paper \First Draft of a Report on the EDVAC," which was the first written description of what has become to ...
-
[18]
How the von Neumann bottleneck is impeding AI computingFeb 9, 2025 · Processors hit what is called the von Neumann bottleneck, the lag that happens when data moves slower than computation.
-
[19]
The IBM 701 - Columbia UniversityJan 1, 2004 · The IBM 701 Defense Calculator (1952) was IBM's first production computer. It was designed primarily for scientific calculation.Missing: channel | Show results with:channel
-
[20]
[PDF] Buchholz: The System Design of the IBM Type 701 ComputerThe IBM 701 had improved arithmetic/logic, direct input/output control, was designed on paper, was a parallel binary computer with a large memory, and used ...Missing: acceleration | Show results with:acceleration
-
[21]
[PDF] PARALLEL OPERATION IN THE CONTROL DATA 6600Each of the ten peripheral processors contains its own memory for program and buffer areas, thereby isolating and protecting the more critical system control ...
-
[22]
[PDF] CONTROL DATA® 6600 Computer System Reference ManualThe advanced design techniques incor- porated in the system provide for extremely fast solutions to data processing, scientific, and control center problems. ( ...<|separator|>
-
[23]
[PDF] The Compute Architecture of Intel Processor Graphics Gen9Aug 14, 2015 · Execution Unit (EUs) Architecture ... Within the EUs, branch instructions are dispatched to a dedicated branch unit to facilitate SIMD.
-
[24]
[PDF] Cost-effective Hardware Acceleration of Multimedia ApplicationsGeneral-purpose microprocessors augmented with SIMD execution units enhance multimedia applications by ex- ploiting data level parallelism.
-
[25]
Direct Memory Access (DMA): Working, Principles, and BenefitsMar 14, 2024 · DMA lets hardware devices transfer data between themselves and memory without involving the CPU. Learn more about direct memory access here.
-
[26]
Direct Memory Access (DMA) and Interrupt Handling - EventHelixIn this article we will cover Direct Memory Access (DMA) and Interrupt Handling. Knowledge of DMA and interrupt handling would be useful in writing code that ...
-
[27]
[1907.06948] Coprocessors : failures and successes - ar5iv - arXivThe typical example is the Intel 8087 coprocessor. Like Motorola's 68881 and 68882 coprocessors from the same era, the 8087 is working on an 80-bit floating ...
-
[28]
Milestones:Intel 8087 Math Coprocessor, 1980Sep 29, 2025 · John Palmer collaborated with Kahan on the design of the data types, mathematical operations, exception handling, and details like rounding.
-
[29]
[PDF] Intel 8087 Math CoProcessorThe Intel 8087 is a math co-processor that adds math instructions to the 8086/8088, increasing speed for applications using math operations.
-
[30]
[PDF] Reconfigurable Hardware Accelerators: Opportuni - arXivThis chapter focuses on leading research on reconfigurable computing accelerators and expounds the current research status in the field of reconfigurable ...
-
[31]
System level tradeoffs between ASIC and FPGA acceleratorsSome common ways to implement accelerators are configurable circuits (FPGAs), application-specific circuits (ASICs), and application-specific processors (ASPs).
-
[32]
[PDF] arXiv:1802.04799v3 [cs.LG] 5 Oct 2018Oct 5, 2018 · On CPUs, memory latency hiding is achieved implic- itly with simultaneous multithreading [14] or hardware prefetching [10, 20]. GPUs rely on ...
-
[33]
Intel® Instruction Set Extensions TechnologyThe Intel® Streaming SIMD Extensions (Intel® SSE) were introduced into the IA-32 architecture in the Pentium III processor family. These extensions enhance the ...Missing: date | Show results with:date
-
[34]
[PDF] Introduction to Intel® Advanced Vector Extensions - | HPC @ LLNLMay 23, 2011 · Intel AVX is a set of instructions for SIMD operations on Intel CPUs, extending previous SIMD offerings, processing multiple data in a single ...Missing: date | Show results with:date
-
[35]
[PDF] 1995 • First out of order, 3 wide Pentium • First superscalar from intelPentium Pro. • 1995. • First out of order, 3 wide. Pentium. • First superscalar from ... • Dynamic execution. • Dataflow order. • Reservation station – hold ...
-
[36]
[PDF] The CRAY- 1 Computer System - cs.wisc.eduTo be efficient at processing short vectors, vector startup times must be small. On the CRAY-1, vector instructions may issue at a rate of one instruction.
-
[37]
SIMD Directives - OpenMPThe simd construct enables the execution of multiple iterations of the associated loops concurrently by means of SIMD instructions.
-
[38]
Our History: Innovations Over the Years - NVIDIAFounded on April 5, 1993, by Jensen Huang, Chris Malachowsky, and Curtis Priem, with a vision to bring 3D graphics to the gaming and multimedia markets.Missing: 256 | Show results with:256
-
[39]
About CUDA | NVIDIA DeveloperSince its introduction in 2006, CUDA has been widely deployed through thousands of applications and published research papers, and supported by an installed ...
-
[40]
[PDF] FermiTM - NVIDIAOct 4, 2009 · A Brief History of GPU Computing. The graphics processing unit (GPU), first invented by NVIDIA in 1999, is the most pervasive parallel ...
-
[41]
[PDF] Xilinx Overview | AMDA decade later, in 1994, Xilinx released the Virtex® FPGA, a breakthrough in the architecture and performance of their original. FPGA. The Evolution of a FPGA.
-
[42]
[PDF] Xilinx Virtex-II Pro and Virtex-II Pro X FPGA User GuideMar 23, 2005 · ... Virtex-II Pro and Virtex-II Pro X FPGA User Guide. UG012 (v4.0) 23 March 2005. The following table shows the revision history for this document.
-
[43]
TEXAS INSTRUMENTS UNVEILS DSP-BASED SYSTEM-ON-A ...Dec 10, 1999 · TI introduced the world's first commercially successful programmable DSP in 1982, and today is the world leader in programmable DSPs with 47.1 ...
-
[44]
[PDF] The TMS320 Family of Digital Signal Processors - Texas InstrumentsThe TMS320 family are digital signal processors, with the TMS320C30 being a floating-point 33-MFLOP device for applications like digital filtering and image ...Missing: history 1982
-
[45]
A Brief History of the Single-Chip DSP, Part II - EEJournalSep 8, 2021 · ... TI rolled out the first TMS320 DSPs in April, 1982. However, just building the chip was not sufficient for a new technology like this. TI ...
-
[46]
How Cisco beat chip world to net - EE TimesOct 20, 2000 · It's Toaster 2 that lies at the heart of the Cisco 10000 edge-services router, in which complex packet forwarding is handled by two processors ...
-
[47]
Toaster: A High Speed Packet Processing Engine - Andrew McRaeToaster is composed of an array of 16 CPUs, arranged as 4 rows and columns. The core CPUs are a cisco designed CPU optimised for packet processing. A key aspect ...Missing: IEEE | Show results with:IEEE
-
[48]
[PDF] Network processors: Guiding design through analysis - cs.wisc.eduCisco's Toaster 2 contains 16 XMC (express microcontroller) cores. Each core ... White paper,. Juniper Networks, September 2000. [21] A. A. Stepanov and ...
-
[49]
In-Datacenter Performance Analysis of a Tensor Processing UnitApr 16, 2017 · This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural ...
-
[50]
Apple's 'Neural Engine' Infuses the iPhone With AI Smarts - WIREDSep 13, 2017 · Apple fires the first shot in a war over mobile-phone chips with a 'neural engine' designed to speed speech, image processing.
-
[51]
Hybrid quantum-classical simulation of periodic materialsAug 17, 2025 · In this work, we investigate the band gap of periodic materials in a lattice-Hamiltonian representation by means of a hybrid, quantum-classical ...
-
[52]
Samsung Develops Industry's First High Bandwidth Memory with AI ...Feb 17, 2021 · HBM-PIM design has demonstrated impressive performance and power gains on important classes of AI applications, so we look forward to working ...
-
[53]
Hardware Architecture and Software Stack for PIM Based on ...Our evaluation at the system level shows that our PIM improves the performance of memory-bound neural network kernels and applications by 11.2× and 3.5×, ...
-
[54]
Photonic neuromorphic accelerator for convolutional neural ... - NatureApr 28, 2025 · Photonic accelerators have risen as energy efficient, low latency counterparts to computational hungry digital modules for machine learning ...
-
[55]
Photonics for sustainable AI | Communications Physics - NatureOct 14, 2025 · Photonic computing has emerged as a promising alternative to CMOS through its energy-efficient computing capabilities in the optical domain.
-
[56]
Summit GPU Supercomputer Enables Smarter ScienceJun 8, 2018 · Summit is capable of delivering a peak 200 petaflops, ten times faster than its Titan predecessor, the first GPU-accelerated system that started ...Missing: sources | Show results with:sources
-
[57]
Unprecedented cloud resolution in a GPU-enabled full-physics ...A high-resolution benchmark using 4600 nodes on Summit demonstrates the computational capability of the GPU-enabled E3SM-MMF code in a full physics climate ...
-
[58]
[PDF] Accelerating Large Scale de novo Metagenome Assembly Using ...Nov 19, 2021 · Our GPU implementation outperforms the CPU version by about 7x and boosts the performance of MetaHipMer by. 42% when running on 64 Summit nodes.
-
[59]
An in-depth look at Google's first Tensor Processing Unit (TPU)May 12, 2017 · In this post, we'll take an in-depth look at the technology inside the Google TPU and discuss how it delivers such outstanding performance.
-
[60]
A Decentralized and Energy-Efficient Distributed Learning ... - EdgeAIAug 7, 2025 · This system is built on NXP low-power microcontrollers powered by Arm Cortex-M33 cores, equipped with dedicated hardware accelerators (NPU + ...
-
[61]
[PDF] TPM 2.0 Part 1 - Architecture - Trusted Computing GroupMar 13, 2014 · This document is an intermediate draft for comment only and is subject to change without notice.<|separator|>
-
[62]
NIST Releases First 3 Finalized Post-Quantum Encryption StandardsAug 13, 2024 · NIST has finalized its principal set of encryption algorithms designed to withstand cyberattacks from a quantum computer.Missing: hardware acceleration 2020s
-
[63]
Performance Analysis and Industry Deployment of Post-Quantum ...Mar 17, 2025 · This study focuses on the performance evaluation of post-quantum cryptographic algorithms, specifically Kyber and Dilithium, by benchmarking their execution ...
-
[64]
[PDF] SQL2FPGA: Automatic Acceleration of SQL Query Processing on ...In this paper, we present SQL2FPGA, an FPGA accelerator- aware compiler to automatically map SQL queries onto the heterogeneous CPU-FPGA platforms. Our SQL2FPGA ...
-
[65]
What is floating-point operations per second (FLOPS)? - TechTargetAug 22, 2023 · FLOPS is a measure of a computer's performance based on the number of floating-point arithmetic calculations that the processor can perform within a second.
-
[66]
A guide to AI TOPS and NPU performance metrics | QualcommApr 24, 2024 · TOPS is a measurement of the potential peak AI inferencing performance based on the architecture and frequency required of the NPU.
-
[67]
[PDF] Hardware Acceleration of LLMs: A comprehensive survey and ...Sep 5, 2024 · It significantly improves energy efficiency and reduces latency compared to previous FPGA methods. The STA is divided into STA-4 and STA-8 ...
-
[68]
CHIMERA: A 0.92-TOPS, 2.2-TOPS/W Edge AI Accelerator With 2 ...Jan 25, 2022 · CHIMERA's DNN accelerator is specifically optimized for RRAM and achieves 0.92-TOPS peak performance and 2.2-TOPS/W energy efficiency. We ...
-
[69]
Review Performance, efficiency, and cost analysis of wafer-scale AI ...This review compares wafer-scale AI accelerators and single-chip GPUs, examining performance, energy efficiency, and cost in high-performance AI applications.
-
[70]
[PDF] Validity of the Single Processor Approach to Achieving Large Scale ...Amdahl. TECHNICAL LITERATURE. This article was the first publica- tion by Gene Amdahl on what became known as Amdahl's Law. Interestingly, it has no equations.