Fact-checked by Grok 2 weeks ago

Reconfigurable computing

Reconfigurable computing is a paradigm in computer engineering that enables the dynamic adaptation of hardware architecture to specific computational tasks, bridging the performance of application-specific integrated circuits (ASICs) with the flexibility of software by reprogramming configurable logic devices, such as field-programmable gate arrays (FPGAs), to implement custom digital circuits.^[1] This approach allows for runtime reconfiguration, where hardware functionality can be altered post-manufacturing or even during execution, optimizing for speed, energy efficiency, and adaptability without requiring physical redesign.^[2] Key devices include FPGAs, which consist of programmable logic blocks, interconnects, and I/O resources configurable via bitstreams, and coarse-grained reconfigurable architectures (CGRAs) that operate at higher abstraction levels for faster reconfiguration.^[3] The concept traces its origins to the 1960s with early ideas of restructurable computer systems proposed by researchers like Estrin et al., but it gained practical momentum in the mid-1980s with the commercialization of FPGAs by companies such as Xilinx, evolving from simple programmable array logic (PAL) devices used for glue logic and prototyping.^[2] By the 1990s and 2000s, reconfigurable computing emerged as a distinct discipline, with systems like Chimaera (1997) and PipeRench (1998) demonstrating hybrid processor-FPGA architectures for high-performance applications, and surveys highlighting its potential for up to 540x speedups in tasks like elliptic curve cryptography compared to general-purpose processors.^[3] Over the past two decades, advancements in FPGA technology, such as the Xilinx Virtex series and Altera Stratix devices, have integrated reconfigurable elements with embedded processors, enabling partial and dynamic reconfiguration to overlap computation and reconfiguration latency.^[1] Reconfigurable computing excels in domains requiring high throughput and low power, such as digital signal processing, where it achieves significant energy savings (e.g., 35%-70% over microprocessors), and cryptography, with specialized implementations for algorithms like DES and AES providing order-of-magnitude performance gains.^[3] In embedded systems, it supports cyber-physical applications like adaptive control in robotics, delivering 4.5x speedups and 50% energy reductions, while dynamic partial reconfiguration enhances security in real-time environments.^[4] More recently, since the 2010s, it has become integral to artificial intelligence and neuromorphic computing, accelerating neural network inference on FPGAs and CGRAs, achieving significant speedups and energy savings for convolutional neural networks (CNNs) at the edge, and enabling brain-inspired spiking neural networks (SNNs) for pattern recognition with up to 100% accuracy in low-power scenarios like medical implants and robotics.^[5]^[6] These capabilities position reconfigurable computing as a cornerstone for next-generation systems in AI, 5G communications, and autonomous devices, where adaptability to evolving workloads is paramount. As of 2025, it continues to evolve with advanced FPGA platforms supporting AI workloads in edge and autonomous systems.^[7]

Fundamentals

Definition and Core Principles

Reconfigurable computing is a computer architecture paradigm that employs programmable logic devices, such as field-programmable gate arrays (FPGAs), to dynamically create custom hardware circuits at runtime, thereby merging the adaptability of software with the performance of dedicated hardware.^[2] This approach allows systems to tailor their computational fabric to specific applications, enabling efficient execution of parallel and specialized tasks without the need for physical hardware redesign.^[3] At its core, reconfigurable computing operates through the loading of configuration data, often in the form of bitstreams, which define the desired logic and routing within the device. This reconfiguration process implements application-specific circuitry, distinguishing it from application-specific integrated circuits (ASICs), which are fixed post-fabrication and lack post-deployment flexibility, and from general-purpose processors, which execute instructions sequentially and incur higher overhead for parallel computations.^[1] The principle hinges on exploiting hardware-level parallelism and customization to achieve superior efficiency for compute-intensive operations while retaining the reprogrammability that allows adaptation to evolving requirements.^[2] The basic operational model involves mapping algorithms onto a reconfigurable fabric, where key elements include lookup tables (LUTs) that implement combinational logic by storing truth tables for functions of multiple inputs, flip-flops for sequential storage and state management, and programmable interconnects that route signals between logic blocks.^[1] FPGAs serve as the primary enablers of this model, providing a sea of configurable resources organized in an array that can be partially or fully reprogrammed to instantiate custom datapaths and accelerators.^[3]

Advantages Over Traditional Architectures

Reconfigurable computing offers significant performance advantages over traditional architectures such as central processing units (CPUs) and graphics processing units (GPUs), particularly for tasks that exhibit parallelism. By customizing hardware logic at runtime or compile time, reconfigurable systems can achieve speedups of up to 100x compared to CPUs for parallelizable workloads, leveraging fine-grained parallelism and reduced overhead from instruction fetching and decoding.^[8]^[9] In benchmarks involving compute-intensive operations, such as cryptographic algorithms, reconfigurable architectures have demonstrated improvements of three to four orders of magnitude in execution time relative to software implementations on general-purpose processors.^[10] Energy efficiency represents another key benefit, with reconfigurable systems providing 10-100x improvements over CPUs and GPUs for suitable applications due to their ability to eliminate unnecessary computations and optimize data paths.^[11]^[12] Unlike fixed-function accelerators like GPUs, which excel in regular, data-parallel tasks but struggle with irregular access patterns, reconfigurable architectures adapt to irregular parallelism more effectively, delivering higher throughput and lower power consumption for custom operations.^[13]^[10] Compared to application-specific integrated circuits (ASICs), reconfigurable computing avoids high non-recurring engineering (NRE) costs associated with custom design and fabrication, which can exceed millions of dollars and require long development cycles.^[14]^[11] This reconfigurability enables adaptability to evolving workloads without full hardware redesign, supporting iterative development and deployment in dynamic environments.^[15] While these benefits are compelling, reconfigurable systems incur higher initial configuration times, which can introduce overhead during setup. However, this cost is typically amortized over extended runtime flexibility, yielding net gains in overall system efficiency for long-running or multi-task scenarios.^[16]^[17]

Historical Development

Origins and Early Innovations

The origins of reconfigurable computing trace back to the early 1960s, when Gerald Estrin at the University of California, Los Angeles, proposed a novel computer architecture combining a fixed processing unit with a variable structure composed of interconnected computational modules. In his seminal 1960 paper, Estrin described a system where the variable portion could be dynamically adapted by altering connections between basic building blocks, such as arithmetic units and storage elements, to tailor hardware to specific computational tasks, thereby bridging the gap between the flexibility of software and the efficiency of custom hardware.^[18] This fixed-plus-variable (F+V) model introduced key ideas of hardware adaptability, including node-link structures where computational nodes (processing elements) were linked via programmable interconnects to form task-specific topologies. During the 1970s, early academic prototypes began to explore these concepts in practice, influenced by parallel processing research. The ILLIAC IV, developed at the University of Illinois and operational by 1972, featured 256 processing elements organized into four reconfigurable arrays derived from the earlier SOLOMON design, allowing interconnections to be adjusted for different array configurations to support massively parallel computations like fluid dynamics simulations.^[19] This system demonstrated the potential of reconfigurable elements for high-throughput applications but operated under a SIMD paradigm with limited flexibility in reconfiguration. By the 1980s, innovations shifted toward more programmable fabrics, exemplified by systolic arrays proposed by H.T. Kung in 1982, which used pipelined processor arrays with fixed yet adaptable data flows to optimize signal processing tasks, paving the way for hardware that could be tuned for linear algebra and pattern recognition without full redesign.^[20] Academic efforts at the Institute for Defense Analyses' Supercomputing Research Center (SRC) further advanced this with prototypes like the SPLASH system in 1989, a linear array of 32 Xilinx FPGAs reconfigurable for cryptanalysis and DNA sequencing, achieving up to 300 times the performance of conventional processors for matching algorithms. Despite these theoretical and prototypical advances, pre-commercial reconfigurable systems faced significant challenges that hindered widespread adoption until the 1990s. Reconfiguration processes were notoriously slow, often taking seconds to minutes due to reliance on mechanical switches, discrete logic, or early programmable logic devices (PLDs) with limited density and high latency in altering interconnects.^[2] Additionally, the complexity of designing and verifying dynamic hardware adaptations strained available tools and fabrication technologies, while power consumption and scalability issues in custom arrays like those in ILLIAC IV underscored the gap between academic proofs-of-concept and practical, cost-effective implementations. These limitations kept reconfigurable computing largely confined to specialized research environments, emphasizing adaptability's promise but delaying its integration into general-purpose computing.

Commercialization and Key Milestones

The commercialization of reconfigurable computing began in the early 1990s with the introduction of the Algotronix CHS2X4 in 1991, recognized as the world's first commercial reconfigurable computer system. This board, based on the company's CAL1024 field-programmable gate array (FPGA) with 1,024 programmable cells fabricated in 1.5 μm double-metal CMOS, enabled random access to control memory and signal sharing across I/O pins to support arrays of devices. Although it achieved limited commercial success due to high costs and niche applications, the CHS2X4 demonstrated the feasibility of reconfigurable hardware for computing tasks beyond simple logic emulation.^[21] In 1993, Xilinx acquired Algotronix, integrating its pioneering FPGA technology into its portfolio and accelerating the shift toward broader industry adoption. This acquisition provided Xilinx with advanced cell array logic designs, contributing to the evolution of SRAM-based FPGAs and marking a pivotal transition from standalone reconfigurable systems to embedded components in mainstream computing. The move underscored the growing recognition of reconfigurable computing's potential for flexibility in hardware acceleration.^[22] Key milestones in the 1990s included the development of the GARP architecture at UC Berkeley, which integrated a reconfigurable co-processor with a MIPS RISC processor on a single chip to enable hybrid computing. Presented in 1997, GARP highlighted the benefits of coupling fine-grained reconfigurable arrays with general-purpose processors for improved performance in data-intensive tasks. Entering the 2000s, PACT's eXtreme Processing Platform (XPP) emerged as a runtime-reconfigurable architecture based on a hierarchical array of coarse-grained processing elements, announced in 2000 and detailed in subsequent publications, targeting parallel dataflow processing for embedded and high-performance applications.^[23]^[24]^[25] Industry shifts during this period were driven by explosive FPGA market growth, with device capacity expanding over 10,000-fold and performance improving by more than 1,000 times from the early 1990s to the late 2000s, fueled by demand in telecommunications, aerospace, and computing acceleration. This growth enabled the integration of FPGAs into high-performance computing (HPC) systems, exemplified by the Cray XD1 in 2004, the first commercial HPC cluster to incorporate Xilinx Virtex-II Pro FPGAs directly alongside AMD Opteron processors for low-latency reconfigurable acceleration in tasks like signal processing and sorting.^[26]^[27] By the late 2000s, institutional support further propelled commercialization, notably through the National Science Foundation's establishment of the Center for High-Performance Reconfigurable Computing (CHREC) in 2007 as an Industry/University Cooperative Research Center. Hosted primarily at the University of Florida with partners including Virginia Tech and George Washington University, CHREC focused on advancing FPGA-based systems for HPC, emphasizing scalability, power efficiency, and interoperability to bridge research and practical deployment.^[28]^[29]

Theoretical Foundations

Classification Frameworks

Classification frameworks in reconfigurable computing provide structured ways to categorize systems based on their resource variability, programming models, and computational paradigms, enabling designers to evaluate trade-offs in flexibility, performance, and efficiency. These frameworks emerged in the 1990s as the field matured, offering conceptual tools to distinguish reconfigurable architectures from traditional fixed hardware or software-based systems. By analyzing dimensions such as hardware mutability and algorithmic adaptability, they highlight how reconfigurability bridges the gap between application-specific integrated circuits (ASICs) and general-purpose processors. One seminal framework is Nick Tredennick's paradigm classification scheme, introduced in the early 1990s and elaborated in his 2003 analysis of reconfigurable systems. This model classifies computing paradigms along two axes: the variability of hardware resources and the variability of algorithms, resulting in four quadrants that represent evolutionary stages from fixed to fully adaptable systems. Fixed resources with fixed algorithms correspond to early historic computers or custom circuits, where no programming is needed as both hardware and logic are immutable. Fixed resources with variable algorithms define the von Neumann paradigm, relying on software to alter behavior on unchanging hardware. Variable resources with fixed algorithms imply custom configurable hardware without algorithmic flexibility, though less emphasized. The key quadrant for reconfigurable computing features variable resources programmed via configware—bitstreams or configuration data that define hardware structures—and variable algorithms managed by flowware, which schedules data streams or operations on the reconfigured fabric. This scheme underscores configware as the notation for morphware (reconfigurable circuitry like FPGAs) and flowware as the notation for directing data flows, distinguishing reconfigurable systems from software-centric models.

Paradigm	Resources	Algorithms	Programming Source
Early Historic Computers	Fixed	Fixed	None
von Neumann Computer	Fixed	Variable	Fixed (Software)
Reconfigurable Computing	Variable	Variable	Variable (Configware/Flowware)

Another influential framework is Reiner Hartenstein's Xputer paradigm, proposed in 1990 as a non-von Neumann approach to parallel computation. Xputers treat reconfigurable systems as data-stream-driven machines, where execution is controlled by multiple data counters rather than a single program counter, enabling efficient implementation of parallel algorithms without the overhead of instruction fetching. This paradigm emphasizes configuring hardware first via configware to tailor the datapath, then using flowware to schedule data streams, allowing seamless migration of computational tasks from software to hardware for significant efficiency gains—up to orders of magnitude faster for data-intensive applications like digital signal processing. By viewing reconfigurable fabrics as "Xputers," Hartenstein's model promotes a shift toward morphware-centric designs that exploit spatial parallelism, contrasting with sequential von Neumann execution. Beyond these, general taxonomies of reconfigurability levels provide broader categorizations, often distinguishing fine-grained systems (e.g., bit-level operations in LUT-based FPGAs) from coarse-grained ones (e.g., word-level functional units), as a foundational way to assess granularity without delving into specific implementations. These frameworks collectively guide system design by illuminating performance-energy trade-offs; for instance, Tredennick's scheme aids in selecting variability levels to balance flexibility against overhead, while the Xputer paradigm informs optimizations for data-stream efficiency, ensuring reconfigurable architectures align with application demands like high-throughput computing or low-power embedded systems.

Architectural Paradigms

Reconfigurable computing encompasses several architectural paradigms that model computation in ways distinct from traditional sequential processing, emphasizing parallelism and adaptability to overcome limitations like the von Neumann bottleneck, where data and instructions share a single memory pathway, leading to inefficiencies in high-throughput applications.^[30] These paradigms shift focus toward data movement, pipelining, and asynchronous execution, enabling hardware to reconfigure for specific workloads while maintaining flexibility. Systolic arrays represent a foundational paradigm for pipelined computation, consisting of a mesh of processing elements (PEs) that rhythmically exchange data with neighbors in a systolic manner, akin to the heart's pulsing action.^[31] Each PE performs simple operations, such as inner products, while data flows unidirectionally through the array, allowing concurrent computations without global control signals. This design excels in matrix operations, like multiplication or LU decomposition, where traditional sequential systems require O(n^3) steps, but systolic arrays achieve near-linear time complexity through pipelining, with execution times as low as O(3n) for n x n matrices.^[31] Dataflow architectures provide another key paradigm, emphasizing asynchronous execution driven by data availability rather than a centralized clock or instruction sequence.^[32] Programs are represented as directed graphs where nodes (operators) fire only when all input tokens arrive, enabling fine-grained parallelism without synchronization overhead. Unlike von Neumann models, which serialize operations via fetch-execute cycles, dataflow avoids the need for explicit control flow, supporting dynamic matching of operands in a distributed manner and tolerating variable computation delays through packet-based communication.^[32] Hybrids combining von Neumann and non-von Neumann elements integrate sequential processors with reconfigurable fabrics, such as field-programmable gate arrays (FPGAs), to balance general-purpose flexibility with specialized acceleration.^[30] In these systems, a von Neumann core handles control tasks while the reconfigurable component processes data streams in parallel, mitigating sequential bottlenecks by offloading compute-intensive operations to non-sequential paths like systolic or dataflow units. This approach addresses limitations of pure von Neumann architectures, where memory access latencies hinder performance, by enabling localized data processing that reduces global bus contention.^[30] The evolution of these paradigms traces from static pipelines in the 1980s, exemplified by early systolic designs optimized for fixed VLSI layouts, to dynamic data-driven models in the 2000s that incorporated reconfiguration for adaptive parallelism.^[3] Initial systolic arrays focused on regular, predictable data flows for signal processing, but later developments integrated coarse-grained reconfigurable elements, allowing runtime adjustments to handle irregular workloads and enhancing scalability in heterogeneous systems.^[3] Transport-triggered architectures (TTAs) serve as a theoretical model enhancing reconfigurable efficiency by inverting traditional operation-triggered paradigms, where data transport explicitly triggers computations via bus moves.^[33] In TTAs, function units connect through an orthogonal network of buses and sockets, with programs specifying moves between registers that implicitly invoke operations, reducing register pressure and enabling superpipelining with short cycle times. Compared to von Neumann systems, TTAs improve hardware utilization by decoupling transport from execution, allowing flexible reconfiguration of the interconnect for diverse applications while minimizing control overhead.^[33] Collectively, these paradigms counter the sequential processing constraints of traditional architectures by prioritizing data locality, parallelism, and modularity; for instance, systolic and dataflow models achieve up to orders-of-magnitude speedups in pipelined tasks over von Neumann equivalents, while hybrids and TTAs provide reconfiguration paths that evolve with application demands without sacrificing programmability.^[3]

Architectural Classifications

Granularity and Reconfigurable Elements

Reconfigurable computing systems are categorized by granularity, which refers to the size and level of detail of their programmable units, influencing both flexibility and resource efficiency. Fine-grained architectures operate at the bit or gate level, enabling the implementation of arbitrary logic functions through small, versatile elements. In contrast, coarse-grained architectures function at the word or functional unit level, such as 8-bit or 32-bit operations, providing higher-level abstractions suited for datapath-oriented computations.^[3]^[34] In fine-grained systems, typical reconfigurable elements include look-up tables (LUTs), which serve as the core logic units by storing truth tables for combinational functions. For instance, a 6-input LUT can realize any Boolean function of up to six variables and is often paired with flip-flops for sequential logic within configurable logic blocks (CLBs). A CLB typically comprises multiple slices, each containing four LUTs and eight storage elements, allowing dense packing of custom logic. Additionally, digital signal processing (DSP) slices provide specialized fine-grained support for multipliers and accumulators, while memory blocks, such as distributed RAM implemented using LUTs, offer small on-chip storage configurable as 16x1 to 256x1 arrays. These elements enable bit-level manipulations essential for applications requiring irregular or control-intensive logic.^[35]^[3] Coarse-grained reconfigurable elements, by comparison, consist of larger functional units like arithmetic logic units (ALUs) or multipliers operating on fixed-width data paths, often 16-bit or wider. These are organized into arrays where each processing element (PE) handles word-level operations, such as addition or multiplication, reducing the need for extensive bit manipulation. Memory blocks in coarse-grained fabrics are typically embedded as register files or small SRAMs per PE, supporting dataflow-style computations with higher throughput for numerical tasks. Examples include architectures with 8x8 arrays of 16-bit ALUs, which prioritize efficiency over universal logic emulation.^[34]^[3] The choice of granularity involves key trade-offs in versatility, overhead, and performance. Fine-grained elements offer superior flexibility for implementing diverse, custom operations but incur higher routing complexity and interconnect overhead, as each connection requires individual bit-level configuration. Coarse-grained elements, conversely, limit customization to predefined operations but yield smaller configuration footprints and reduced wiring demands, enabling faster reconfiguration and lower power consumption for structured workloads. In terms of metrics, fine-grained FPGAs often dedicate 50-70% of their area to configuration memory and routing, while coarse-grained arrays achieve 2-10x higher density for arithmetic functions and configuration memory usage under 10-20% of the total area, enhancing efficiency in domain-specific scenarios.^[3]^[36]^[34]

Coupling and Interconnection Strategies

In reconfigurable computing systems, host coupling refers to the degree of integration between the reconfigurable hardware and the host processor, which directly affects communication efficiency and system performance. Loose coupling involves connecting the reconfigurable unit as a peripheral device via standard buses such as PCI Express, allowing independent operation but introducing high latency due to infrequent data transfers and protocol overheads.^[2] Tight coupling treats the reconfigurable hardware as a coprocessor with shared memory or direct register access, enabling more frequent interactions and reducing communication delays while still permitting parallel execution.^[2] Full integration embeds reconfigurable logic directly on-chip within the processor, such as in system-on-chip (SoC) designs, providing the lowest latency through seamless datapath incorporation but limiting the scale of reconfigurable resources due to area constraints.^[2] Interconnection strategies within reconfigurable fabrics facilitate signal routing among logic elements, with FPGA architectures commonly employing switchboxes and routing channels. Switchboxes serve as programmable junctions that connect horizontal and vertical routing channels, allowing flexible signal distribution in island-style layouts, while routing channels consist of segmented wires of varying lengths to accommodate local and global connections.^[37] For larger-scale systems, partial crossbars provide efficient point-to-point connectivity by selectively interconnecting subsets of inputs and outputs, minimizing wiring complexity compared to full crossbars.^[38] In advanced reconfigurable SoCs, Network-on-Chip (NoC) architectures replace traditional buses with packet-switched networks of routers and links, supporting dynamic topology reconfiguration to adapt to varying computational demands.^[38] These strategies significantly influence system performance, particularly in terms of bandwidth and latency. Loose coupling often bottlenecks bandwidth to tens of GB/s due to bus contention, whereas tight and full integrations can achieve hundreds of GB/s through dedicated channels, though at the cost of reduced reconfigurability scope.^[2] Within the fabric, inefficient routing via overly complex switchboxes can increase critical path delays by up to 20%, while optimized segmentation in channels reduces this by balancing local expressivity and global reach.^[37] NoC-based interconnects further mitigate latency in multi-core reconfigurable systems by enabling concurrent data flows, potentially lowering end-to-end delays by 30-50% over bus alternatives in high-throughput applications.^[38] To address scalability in expansive reconfigurable arrays, such as multi-FPGA setups, hierarchical interconnects organize routing into multi-level structures, with local clusters connected via short wires and global networks using longer segments or NoC overlays.^[37] This approach reduces wire lengths and switch counts, improving routability for designs exceeding single-chip capacities and cutting area overhead by 10-25% compared to flat topologies.^[37] In NoC hierarchies, reconfigurable routers allow fault isolation and traffic rerouting, enhancing overall system reliability and scalability for ultra-large-scale integrations.^[38]

Reconfiguration Mechanisms

Static and Dynamic Approaches

Reconfigurable computing systems employ two primary approaches to reconfiguration based on timing and system state: static and dynamic methods. Static reconfiguration involves reloading the entire configuration bitstream of the reconfigurable fabric, such as an FPGA, only when the system is in a reset state or otherwise idle.^[2]^[39] This approach is typically used for initial deployment or setup phases, where the hardware is configured once at the start of an application and remains fixed throughout execution, limiting adaptability but ensuring stability.^[2]^[40] In contrast, dynamic reconfiguration enables changes to the hardware configuration during runtime without halting the entire system, allowing the reconfigurable fabric to adapt to varying computational demands.^[39]^[40] This method supports context-switching between different tasks or functions by swapping bitstreams or portions thereof while active logic continues to operate, facilitating multi-tasking and improved resource utilization in time-varying workloads.^[2]^[39] Dynamic approaches often overlap with partial reconfiguration techniques, where only specific regions are updated, though the core distinction lies in the runtime execution continuity.^[40] The static approach offers simplicity and low overhead, as it avoids the need for runtime management of configuration changes, resulting in no additional power consumption or timing disruptions during operation.^[2]^[39] However, it incurs significant downtime during reconfiguration—often milliseconds to seconds depending on bitstream size—making it unsuitable for systems requiring frequent adaptations.^[40] Dynamic reconfiguration, while providing greater responsiveness and the ability to handle larger effective logic capacities through sequential loading, introduces complexities such as potential glitches, increased design effort, and overhead from bitstream transfer and validation.^[2]^[39] This overhead can include reconfiguration times in the range of milliseconds, mitigated by techniques like bitstream compression, but it generally demands more sophisticated system integration compared to static methods.^[40] Enabling technologies for dynamic reconfiguration include internal configuration ports, such as the Internal Configuration Access Port (ICAP) in AMD FPGAs, which provide high-speed access to the configuration memory from within the device itself.^[39]^[40] ICAP allows bitstream loading at rates up to 400 MB/s, facilitating runtime updates without external intervention and reducing latency for context switches.^[39] In multicontext architectures, dynamic reconfiguration can further leverage multiple pre-stored configuration planes for near-instantaneous switching in nanoseconds, enhancing adaptability over single-context static setups.^[2]

Partial and Module-Based Reconfiguration

Partial reconfiguration allows the selective reprogramming of subsets of logic resources within a field-programmable gate array (FPGA) while the rest of the device continues to function without interruption. This approach partitions the FPGA into static regions, which house unchanging logic such as memory controllers or communication interfaces, and reconfigurable regions dedicated to modular components that can be updated independently. In practice, partial reconfiguration supports both static variants, where regions are predefined during design time, and dynamic variants, enabling runtime modifications to adapt to varying computational demands.^[41] Module-based reconfiguration builds on this by structuring designs around reusable intellectual property (IP) cores that are floorplanned into isolated reconfigurable partitions, ensuring compatibility through standardized interfaces. These modules, often synthesized as separate netlists, allow for the swapping of hardware tasks—such as different signal processing algorithms—without redesigning the entire system, promoting resource efficiency and design modularity. Self-reconfiguring modules extend this autonomy by embedding a controller, typically a soft processor, within the FPGA to manage the loading of new configurations directly from onboard storage or external sources, minimizing reliance on host systems.^[42]^[43] Implementation of these techniques requires careful hardware partitioning and interfacing. Tools like AMD Vivado Design Suite facilitate the process by enabling designers to define reconfigurable regions via pblocks, allocate resources, and generate partial bitstreams for each module, supporting hierarchical designs with multiple netlists per partition. To maintain signal integrity across region boundaries, bus macros—pre-placed routing structures—are inserted to isolate reconfigurable areas, preventing glitches or contention during updates; these macros typically consume minimal resources, such as one lookup table (LUT) per signal in modern proxy-based implementations.^[44]^[41] The primary benefits of partial and module-based reconfiguration include optimized resource utilization and reduced reconfiguration overhead. Partial bitstreams are substantially smaller than full configurations—for example, a module bitstream might occupy only about 1% of the total size (e.g., 116 KB versus 9 MB for a Virtex-6 device)—enabling reconfiguration times in milliseconds rather than seconds and conserving storage and power. However, challenges persist, such as ensuring glitch-free switching through proper reset mechanisms and decoupling logic, as well as mitigating internal fragmentation where unused resources within regions reduce overall efficiency. External fragmentation from mismatched module sizes can further complicate placement, though advanced tools like GoAhead aim to address these by supporting flexible styles such as island or grid layouts.^[45]^[44]^[41]

Applications

High-Performance and Scientific Computing

Reconfigurable computing plays a pivotal role in high-performance computing (HPC) by enabling FPGA clusters interconnected through high-speed interfaces like PCI Express to accelerate demanding numerical workloads. These setups allow seamless integration of reconfigurable accelerators into existing HPC infrastructures, facilitating offloading of parallelizable tasks from host CPUs. For linear algebra operations, such as sparse matrix multiplications, CPU-FPGA hybrid systems exploit FPGA parallelism and reduce data movement overheads.^[46] In Monte Carlo simulations for particle physics, FPGA-based designs deliver 270× speedup compared to single-core CPU executions and 110× versus multi-core CPU versions, primarily due to customized pipelined architectures that minimize latency in random number generation and sampling.^[47] Scientific applications benefit substantially from tailored reconfigurable accelerators, particularly in genomics and climate modeling where data-intensive computations dominate. In genomics, FPGA implementations for variant calling and short-read alignment process viral genomes with high throughput, achieving up to 10× faster execution than software equivalents on multi-core CPUs by leveraging domain-specific parallelism in alignment scoring and error correction.^[48] For climate modeling, near-memory processing on FPGAs accelerates weather prediction kernels, such as those involving stencil computations, by mitigating memory bandwidth bottlenecks and yielding 2-4× speedups in global atmospheric simulations relative to CPU baselines.^[49] Fast Fourier Transform (FFT) implementations on FPGAs exemplify these gains, with 3D FFT accelerators providing 2× overall speedup and up to 4.1× better energy efficiency than multi-threaded CPU libraries like FFTW for scientific visualization tasks.^[50] Pioneering reconfigurable systems in the early 2000s demonstrated potential in supercomputing through multi-processor clusters combining general-purpose CPUs with reconfigurable engines for scientific kernels. These platforms targeted energy-efficient scaling for parallel workloads by optimizing custom logic for throughput-oriented tasks.^[51] Such designs influenced discussions on energy efficiency in TOP500 rankings, where reconfigurable elements were highlighted for enhancing performance per watt in heterogeneous HPC environments.^[52]

Embedded Systems and Edge Computing

Reconfigurable computing plays a pivotal role in embedded systems, where resource constraints demand high efficiency and adaptability. In these environments, reconfigurable hardware such as field-programmable gate arrays (FPGAs) enables dynamic optimization of processing tasks to meet stringent power and performance requirements. For instance, adaptive signal processing in sensors benefits from runtime reconfiguration, allowing algorithms to adjust to varying environmental conditions like noise levels in acoustic or image sensors, thereby improving accuracy without excessive power draw.^[53] Similarly, dynamic protocol handling in wireless devices uses reconfigurable logic to switch between communication standards, such as adapting from Wi-Fi to Bluetooth in IoT nodes, ensuring seamless connectivity in battery-limited setups.^[54] In edge computing, reconfigurable systems facilitate AI inference by customizing hardware accelerators for specific neural network models directly at the device level, reducing latency and data transmission to the cloud. This approach is particularly effective for deploying custom neural network accelerators, where partial reconfiguration allows swapping convolutional layers without full system reset, achieving significant power savings in inference tasks.^[55] Coarse-grained reconfigurable arrays (CGRAs) further enhance power efficiency by operating at higher abstraction levels, minimizing reconfiguration overhead and enabling energy-aware adaptations for edge workloads like real-time object detection.^[56] Brief reference to fine granularity supports low-power designs by allowing targeted logic adjustments in these arrays.^[57] Practical examples illustrate these benefits in real-world scenarios. In automotive advanced driver assistance systems (ADAS), reconfigurable platforms process vision data for lane detection and obstacle avoidance, with dynamic reconfiguration enabling adaptation to lighting changes or sensor fusion needs, all within embedded constraints.^[58] For drone navigation, runtime reconfiguration adjusts flight control algorithms to environmental variables, such as wind gusts or obstacle proximity, using modular hardware to reallocate resources for path planning without interrupting flight stability.^[59] These applications highlight the need for limited reconfiguration times, often under 1 ms, to maintain real-time performance in safety-critical operations.^[60] Hybrid CPU-FPGA system-on-chips (SoCs) address these demands by integrating general-purpose processing with reconfigurable fabric, allowing seamless task offloading for embedded edge tasks while preserving low latency. Such architectures support partial reconfiguration for adaptive workloads, ensuring power efficiency in resource-constrained devices like sensors or wearables.^[61] This integration is essential for scaling reconfigurable computing in IoT ecosystems, where brief mentions of vendor platforms like Xilinx Zynq underscore practical implementations without delving into specifics. As of 2025, reconfigurable platforms like AMD Versal AI Edge support advanced AI workloads in 6G edge devices, offering up to 10× better energy efficiency for transformer-based models compared to prior generations.^[7]^[62]

Security and Specialized Domains

Reconfigurable computing plays a critical role in security applications by enabling hardware acceleration for cryptographic algorithms such as the Advanced Encryption Standard (AES). FPGAs provide high-throughput implementations of AES encryption, achieving speeds up to several gigabits per second while optimizing resource utilization on devices like Xilinx Virtex series, which outperform software equivalents in power efficiency for secure data processing.^[63] Additionally, dynamic reconfiguration supports anti-tampering mechanisms by allowing runtime adaptation of hardware configurations to detect and respond to unauthorized modifications, such as through embedded fault detection modules that trigger reconfiguration to isolate compromised regions.^[64] In specialized domains, reconfigurable computing excels in radar signal processing, where adaptive architectures handle varying frequency agile waveforms and perform tasks like fast Fourier transforms (FFT) and constant false alarm rate (CFAR) detection with low latency on platforms integrating RFSoC technology. A notable example is the COPACOBANA cluster, introduced in 2006, which comprises 120 low-cost Xilinx Spartan-3 FPGAs optimized for brute-force cryptanalysis, capable of breaking DES keys in approximately one week at a cost under $10,000, demonstrating the scalability of reconfigurable hardware for parallel exhaustive searches.^[65]^[66] Key benefits in these areas include the isolation of secure modules through partial reconfiguration, which partitions FPGA resources to prevent interference between sensitive operations and enhances system dependability by mitigating single-event upsets without full device downtime.^[67] Furthermore, reconfigurable designs improve resistance to side-channel attacks by leveraging dynamic partial reconfiguration to randomize netlists or obfuscate leaked information, reducing the effectiveness of power and electromagnetic analysis on cryptographic implementations.^[68] Practical examples illustrate these advantages, such as reconfigurable firewalls deployed in network environments, where FPGA-based packet classification engines dynamically adapt rules for intrusion detection, achieving throughputs exceeding 10 Gbps while supporting real-time policy updates. In emerging threats, FPGAs accelerate quantum-resistant algorithms like those from NIST's post-quantum cryptography standardization, including lattice-based schemes such as Kyber, with efficient modular multipliers tailored for 5- to 32-bit operands on Virtex-7 devices to ensure long-term security against quantum adversaries.^[69]^[70]

Modern Systems and Implementations

FPGA-Based Platforms and Vendors

Field-programmable gate arrays (FPGAs) represent the cornerstone of reconfigurable computing platforms, with AMD and Altera dominating the commercial landscape as of 2025. Following AMD's acquisition of Xilinx in 2022, AMD and Altera together hold the majority of the global FPGA market share, estimated over 80% as of mid-2025.^[71] These vendors provide comprehensive ecosystems that integrate fine-grained logic elements, high-speed interconnects, and specialized accelerators, enabling applications from edge devices to data centers. Their platforms emphasize scalability, power efficiency, and adaptability to emerging workloads such as artificial intelligence (AI) and high-performance computing (HPC). AMD's Versal Adaptive Compute Acceleration Platforms (ACAPs) extend the legacy of Xilinx's Virtex series, evolving from the UltraScale+ architecture—known for its high routing density, 3D integration, and enhanced digital signal processing (DSP) blocks supporting up to 32.75 Gb/s transceivers—to heterogeneous systems with embedded AI Engines for machine learning and signal processing. The Versal AI Core Series, for instance, delivers breakthrough AI inference acceleration through integrated vector processors and scalar engines, optimized for HPC tasks like stencil-based computations. Additionally, AMD's Zynq UltraScale+ devices incorporate ARM Cortex-A53 and Cortex-R5 cores, facilitating seamless software-hardware integration for embedded analytics and hardware acceleration. In 2025, Versal Series Gen 2 silicon samples became available, further enhancing performance for AI and adaptive applications.^[72] These features position Versal ACAPs as versatile platforms for compute-intensive environments, with production shipments continuing to expand in 2025. Altera, which became majority-owned by Silver Lake in September 2025 with Intel retaining 49%, focuses on high-bandwidth memory integration and transceiver performance to support data-centric applications through its Agilex series, particularly the Agilex 7 and Agilex 5 families.^[73] The Agilex 7 M-Series introduces a hardened memory Network-on-Chip (NoC) interface, achieving up to 1 TB/s bandwidth with support for DDR5 and high-bandwidth memory (HBM), marking the industry's highest for high-end FPGAs. In September 2025, the Agilex 5 D-Series was expanded to scale up to 1.6 million logic elements with advanced DSP ratios, streamlining developer workflows through unified software tools.^[74] Stratix 10 SX devices integrate quad-core ARM Cortex-A53 processors, enabling 1.5 GHz operation alongside FPGA fabric for hybrid processing in networking and storage. The FPGA market in 2025 reflects robust growth, valued at approximately USD 11.73 billion and projected to expand at a compound annual growth rate (CAGR) of 10.5% through 2030, driven by demand for reconfigurability in data centers to address the complexity of AI accelerators.^[75] This trend underscores FPGAs' role in supplementing GPUs and custom ASICs, offering flexible logic reconfiguration for evolving protocols and workloads without full hardware redesigns.

Coarse-Grained and Hybrid Architectures

Coarse-grained reconfigurable arrays (CGRAs) represent a class of reconfigurable computing architectures that operate at a higher level of granularity than fine-grained field-programmable gate arrays (FPGAs), utilizing word-level functional units such as adders, multipliers, and shifters to process data in parallel.^[11] This design reduces configuration overhead by minimizing the number of bits needed for setup, enabling more efficient mapping of compute-intensive kernels like digital signal processing (DSP) and loop acceleration in embedded systems.^[76] Unlike bit-level reconfiguration, CGRAs interconnect arrays of processing elements (PEs) with configurable routing, supporting dataflow execution that exploits instruction-level parallelism without the routing inefficiencies of finer-grained alternatives.^[77] Early seminal work in CGRAs includes the MorphoSys architecture, developed in the late 1990s, which integrated a 2D array of 16x16 coarse-grained cells with fine-grained reconfigurable logic and a RISC core for multimedia applications, demonstrating up to 30x speedup over general-purpose processors for tasks like image processing.^[78] More modern examples, such as the ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) from IMEC, extend this paradigm by coupling a very long instruction word (VLIW) processor with a tightly integrated CGRA tile array, allowing seamless C-programmable mapping of multimedia and DSP workloads with dynamic reconfiguration at cycle boundaries.^[79] These systems achieve reconfiguration times on the order of microseconds to milliseconds—far faster than the seconds required for full FPGA reconfiguration—while consuming lower power for arithmetic-dominant tasks due to reduced interconnect complexity and optimized PE utilization.^[77] Hybrid architectures combine CGRAs or similar coarse-grained elements with traditional processors and accelerators in system-on-chip (SoC) designs to balance flexibility and performance. For instance, the AMD Versal AI Edge series integrates ARM-based scalar engines, programmable logic, and vector processors with AI Engines—specialized coarse-grained tiles optimized for tensor operations—enabling efficient on-chip acceleration for edge AI inference in applications like autonomous driving and industrial automation.^[7] Processing-in-memory (PIM) reconfigurable systems further hybridize by embedding coarse-grained compute units directly within memory arrays, mitigating data movement bottlenecks in memory-bound workloads; recent PIM designs support reconfigurable logic for vector operations, achieving up to 10x bandwidth improvements over conventional von Neumann architectures for data-intensive tasks.^[80] As of 2025, CGRAs have gained traction for machine learning (ML) acceleration, particularly at the edge, where power constraints are critical. Implementations like ultra-low-power CGRAs for transformer models report energy efficiencies of several times higher than fine-grained FPGAs—e.g., up to 2x better for convolutional neural networks in comparative studies—due to tailored word-level operations that minimize bit manipulations and enhance data reuse in sparse ML kernels.^[81] These developments position CGRAs as key enablers for sustainable edge AI, with ongoing research focusing on scalable arrays that deliver 5-10x power savings over FPGA baselines for GEMM-heavy workloads in resource-constrained environments.^[82]

Software Emulation and Prototyping Tools

Software emulation and prototyping tools play a crucial role in reconfigurable computing by enabling developers to simulate and validate hardware designs in virtual environments prior to physical implementation. These tools facilitate cycle-accurate modeling of field-programmable gate arrays (FPGAs) and other reconfigurable architectures, reducing development time and costs associated with hardware prototyping.^[83]^[84] Emulation techniques often rely on cycle-accurate simulators for register-transfer level (RTL) designs, such as Verilator, which converts Verilog or SystemVerilog code into efficient C++ models for high-speed simulation. Verilator achieves simulation speeds up to 100 times faster than traditional event-driven simulators while maintaining bit-accurate behavior, making it suitable for verifying complex reconfigurable logic. For high-level synthesis (HLS), tools like AMD Vitis HLS provide built-in emulators that simulate C/C++ algorithms on generated RTL, allowing rapid iteration and functional validation without full hardware synthesis. The Vitis HLS emulator supports co-simulation with software testbenches, bridging algorithmic descriptions to hardware behavior.^[85]^[86] Prototyping tools extend emulation to scalable environments, including cloud-based FPGA instances like Amazon Web Services (AWS) EC2 F1, which allow deployment of custom bitstreams on remote Xilinx UltraScale+ FPGAs for real-time testing and acceleration. These instances support rapid provisioning of hardware emulation clusters, enabling distributed validation of reconfigurable designs without local hardware. Additionally, projects like MiSTer demonstrate FPGA-based emulation of vintage computers and consoles, recreating original hardware timing and interfaces on modern reconfigurable platforms for preservation and research.^[87]^[88] In development workflows, these tools enable early validation of bitstreams through simulated reconfiguration scenarios, identifying timing and resource issues before fabrication. They also bridge software developers to hardware by providing familiar C/C++ interfaces for HLS emulation, lowering the entry barrier for integrating reconfigurable computing into software-centric projects.^[83]^[89] As of 2025, advancements include AI-assisted emulation tools that accelerate iteration by automating design space exploration and optimization in simulators like Synopsys ZeBu, which supports AI workload validation with up to 10x faster bring-up times compared to prior generations. Open-source frameworks such as OpenFPGA further democratize prototyping by automating the generation of customizable FPGA architectures from high-level descriptions, supporting agile verification flows.^[90]

Programming and System Challenges

Design Methodologies and Languages

Design methodologies for reconfigurable computing encompass a range of flows that enable the mapping of algorithms to hardware, balancing abstraction levels with performance optimization. At the low level, register-transfer level (RTL) design using hardware description languages such as Verilog and VHDL allows precise control over data paths, timing, and resource utilization in field-programmable gate arrays (FPGAs). These languages describe synchronous digital circuits at the register and logic gate abstraction, facilitating cycle-accurate simulations and synthesis into gate-level netlists for reconfigurable fabrics. For higher abstraction, high-level synthesis (HLS) tools transform algorithmic descriptions in C, C++, or SystemC into RTL implementations, accelerating development by automating microarchitectural decisions like pipelining and resource sharing.^[86] A prominent example is Vitis HLS, which inputs behavioral C/C++ specifications with optimization directives (e.g., PIPELINE for throughput enhancement) and outputs synthesizable Verilog or VHDL, integrating seamlessly into FPGA design suites for co-simulation verification.^[86] This flow reduces design time from months to days while targeting metrics like latency and area, though it requires iterative directive tuning for optimal results.^[86] Programming languages for reconfigurable systems extend beyond traditional HDLs to support heterogeneous computing and domain-specific optimizations. OpenCL, an open standard for parallel programming, enables kernel offloading to FPGAs by compiling host-device code into hardware accelerators, abstracting low-level details like memory mapping and reconfiguration.^[91] It facilitates portable implementations across CPU-GPU-FPGA platforms, with frameworks like UT-OCL providing runtime support for embedded reconfigurable systems.^[91] Similarly, domain-specific languages like Halide target image processing pipelines by decoupling functional algorithms from execution schedules, allowing compilation to FPGA backends for vectorized, memory-efficient hardware.^[92] Halide's scheduling primitives, such as tiling and fusion, yield up to 5x performance gains over hand-optimized CUDA on GPUs and extend to FPGAs via tools like HeteroHalide for automated accelerator generation.^[93] Key methodologies emphasize modularity and reliability to manage complexity in reconfigurable designs. IP-based reuse involves encapsulating pre-verified hardware blocks (e.g., multipliers or communication interfaces) as configurable cores, enabling rapid assembly of systems-on-chip (SoCs) while minimizing redundant verification efforts.^[94] This approach leverages standards like IP-XACT for automated integration, supporting partial reconfiguration by allowing runtime swapping of IP modules without full device reload.^[95] Verification combines simulation-based testing, which exercises RTL models against testbenches to cover functional scenarios, with formal methods that mathematically prove properties like deadlock-freedom using model checking or theorem proving.^[96] Formal techniques complement simulation by exhaustively exploring state spaces, reducing escape bugs in safety-critical reconfigurable applications.^[96] As of 2025, emerging trends integrate machine learning into synthesis flows for automated optimization. ML-optimized HLS employs large language models (LLMs) as agents for directive generation and design space exploration, fine-tuning on synthesis feedback to enhance metrics like area-delay product by 15% over traditional methods.^[97] These approaches use Bayesian optimization with LLM-guided sampling to navigate vast configuration spaces efficiently.^[97] Concurrently, auto-reconfiguration generators automate dynamic partial reconfiguration (PR) for adaptive systems, generating bitstreams on-the-fly based on workload profiles, as seen in co-design methodologies for CNN accelerators that scale resource allocation without manual intervention.^[98] Such tools enable runtime adaptation in edge AI, reducing latency by optimizing accelerator utilization through PR.^[98]

Operating System Integration and Hurdles

Reconfigurable computing systems face significant challenges in integrating with operating systems, primarily due to the lack of standardized application programming interfaces (APIs) for dynamic hardware reconfiguration. Traditional operating systems, such as Linux, are designed for fixed processor architectures and do not natively support the runtime allocation or deallocation of reconfigurable logic resources like those in field-programmable gate arrays (FPGAs). This mismatch requires custom extensions to manage resource partitioning and scheduling, often leading to fragmented support across platforms. For instance, without unified APIs, developers must rely on vendor-specific drivers, complicating portability and increasing development overhead.^[99] To address these integration issues, approaches such as virtual machine monitors (VMMs) have been developed to enable FPGA sharing in multi-tenant environments. VMMs, like those based on Xen or custom hypervisors such as Ker-ONE, abstract the FPGA fabric into virtual devices, allowing multiple virtual machines to access isolated regions without direct hardware conflicts. These systems facilitate resource allocation by partitioning the FPGA into static shells and dynamic roles, supporting partial reconfiguration for efficient sharing. Additionally, runtime environments like OpenCL drivers provide higher-level abstractions, integrating reconfigurable accelerators as co-processors within the OS kernel, as seen in frameworks like HybridOS for Xilinx SoCs. Such methods enhance usability but still demand modifications to the host OS for seamless operation.^[100]^[101]^[102] Key hurdles in OS integration include managing reconfiguration latency and ensuring security in multi-tenant setups. Reconfiguration times, which can range from milliseconds to seconds depending on the bitstream size and interface speed, disrupt real-time task scheduling and resource availability, often necessitating prefetching or modular designs to minimize downtime. In multi-tenant scenarios, security risks arise from potential bitstream tampering or side-channel attacks, addressed through encryption mechanisms like AES-128 with "Bring Your Own Keys" (BYOK) schemes to protect intellectual property during provisioning. Validation tools, such as bitstream checkers, further mitigate threats by verifying configurations before loading, though they introduce additional latency.^[100]^[103] As of 2025, scalability remains a pressing issue for reconfigurable computing in data centers, where the demand for efficient resource pooling conflicts with ease-of-use barriers. High reconfiguration overheads and the need for custom OS modifications hinder widespread adoption, requiring specialized platforms that limit interoperability. Emerging hypervisors, such as those based on the seL4 microkernel, aim to improve scalability by supporting multicore isolation and faster dynamic partial reconfiguration, but challenges in standardizing these for cloud environments persist, slowing integration with hyperscale infrastructures.^[102]^[100]

References

[1]
[PDF] An Introduction to Reconfigurable Computing
This paper presents a brief overview of current research in hardware and software systems for reconfigurable computing, as well as techniques that specifically ...
[2]
[PDF] Reconfigurable Computing: A Survey of Systems and Software
In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal ...
[3]
[PDF] Reconfigurable computing: architectures and design methods
This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable ...
[4]
[PDF] Reconfigurable Computing for Next-Generation Embedded Systems ...
ABSTRACT. Reconfigurable computing (RC) has recently become a very important paradigm in scaling up performance, flexibility and energy efficiency of.
[5]
[PDF] Reconfigurable Digital FPGA Implementations for Neuromorphic ...
This survey reviews reconfigurable computing on various. FPGA devices for ... From April 2023 to June 2023, he was a Re- searcher with the Department of ...
[6]
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips
**Summary of Reconfigurable Computing in AI Chips**
[7]
Elastic computing: a framework for transparent, portable, and ...
... speedups of 10x to 100x. Despite numerous compiler and h ... Proceedings of the Workshop on High-Performance Reconfigurable Computing ...
[8]
The Impact of Adopting Computational Storage in Heterogeneous ...
Feb 13, 2020 · Compared to CPU system, we found that the modern FPGA system can achieve a 100x ... Published in: 2019 International Conference on ReConFigurable ...
[9]
Reconfigurable Computing Architectures - CSE CGI Server
Reconfigurable architectures offer hardware performance and energy efficiency with software flexibility, and can be upgraded and specialized for tasks.Missing: principles | Show results with:principles
[10]
A Survey of Coarse-Grained Reconfigurable Architecture and Design
The comprehensive comparison provided in Figure 1 compares CGRAs with ASICs, FPGAs, DSPs, GPUs and CPUs in terms of the energy efficiency, flexibility and ...
[11]
High-Performance Architecture Using Fast Dynamic Reconfigurable ...
... over CPU ... ARC'10: Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications ... 100X increase in energy ...
[12]
Practical Acceleration of Irregular Applications on Reconfigurable ...
Oct 18, 2021 · Coarse-grain reconfigurable arrays (CGRAs) can achieve much higher performance and efficiency than general-purpose cores, approaching the ...
[13]
Leveraging Reconfigurability to Raise Productivity in FPGA ...
FPGAs carry several advantages over. ASICs, including reconfigurability and lower NRE costs for mid-to-high volume applications. While there remains a gap ...
[14]
Survey on FPGA Architecture and Recent Applications - IEEE Xplore
When compared with application specific integrated circuit (ASIC) ... (NRE) costs. The unique property which differ it from ASIC is its reconfiguration.
[15]
Exploiting Partial Runtime Reconfiguration for High-Performance ...
However, the RTR feature comes with the cost of high configuration overhead which might negatively impact the overall performance.
[16]
https://dl.acm.org/doi/10.1145/1462586.1462590
[17]
Organization of computer systems: the fixed plus variable structure ...
Organization of computer systems: the fixed plus variable structure computer. Author: Gerald Estrin.
[18]
[PDF] The ILLIAC IV computer
Summary of theILLIAC IV. The ILLIAC IV main structure consists of 256 processing elements arranged in four reconfigurable SOLOMON-type arrays of 64.
[19]
[PDF] Why Systolic Architectures? - Computer Science
Jan 4, 1982 · The basic principle of a systolic architecture, a systolic array in particular, is illustrated in Figure 1. By replacing a single processing ...Missing: reconfigurable 1960s-
[20]
https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf
[21]
Company Background - Algotronix
Origins. Algotronix was originally formed in 1989 to develop a unique FPGA chip for computing applications: the CAL1024. The architecture of this device was ...Missing: CHS2X4 1991 first like system
[22]
[PDF] Garp: A MIPS Processor with a Reconfigurable Coprocessor
In this paper we outline a candidate hybrid architecture, which we call Garp, in which the FPGA is recast as a slave computational unit located on the same die ...Missing: 1990s | Show results with:1990s
[23]
PACT XPP—A Self-Reconfigurable Data Processing Architecture
The eXtreme Processing Platform (XPP TM ) is a new runtime-reconfigurable data processing architecture. It is based on a hierarchical array of coarsegrain, ...
[24]
PACT Unveils The eXtreme Processor Platform - HPCwire
Oct 13, 2000 · To break the bottleneck, XPP's reconfigurable parallel data flow processor sends high-speed data streams through an array of processing elements ...
[25]
(PDF) Three ages of FPGAs: A retrospective on the first thirty years ...
Aug 5, 2025 · Since their introduction, field programmable gate arrays (FPGAs) have grown in capacity by more than a factor of 10 000 and in performance ...
[26]
Cray XD1 - Wikipedia
Announced on 4 October 2004, the Cray XD1 range incorporate Xilinx Virtex-II Pro FPGAs for application acceleration. With 12 CPUs in a chassis, and up to 12 ...Missing: HPC cluster
[27]
ALABAMA SUPERCOMPUTER AUTHORITY CHOOSES CRAY XD1
Oct 22, 2004 · The Cray XD1 is providing scientists and engineers with a platform designed from the ground up to meet their HPC challenges, which is an ...
[28]
New NSF Center Targets Reconfigurable Computing - HPCwire
Nov 3, 2006 · Advantages from a reconfigurable approach can be realized in terms of performance, power, size, cooling, cost, versatility, scalability, and ...
[29]
Center for High-Performance Reconfigurable Computing (CHREC)
May 15, 2017 · In 2007, under the auspices of the Industry/University Cooperative Research Centers (I/URC) program of the National Science Foundation, ...
[30]
[PDF] A Potential Solution to the von Neumann Bottleneck
One of the ideas submitted as a possible replacement for the von Neumann architecture is the reconfigurable system, otherwise known as morphware— the idea of ...Missing: non- | Show results with:non-
[31]
[PDF] SYSTOLIC ARRAYS FOR (VLSI) - Computer Science
Copyright -C- 1979 by H.T. Kung and Charles E. Leiserson. AU Rights R ... The hardware demands of the systolic arrays in this paper are readily seen to be.
[32]
[PDF] Dennis-Dataflow.pdf - Washington
The first task is to expand the architecture of the elementary machine to incorporate decision capability by implementing deciders, g stes and merges. A fairly.Missing: 1975 | Show results with:1975
[33]
[PDF] Design of Transport Triggered Architectures - Semantic Scholar
Paper organization. • Concept of transport triggering. • MOVE32INT – a prototype TTA processor. • Automatic generation of arbitrary TTAs. Page 3. Why TTA. • ...
[34]
[PDF] Configurable Computing: A Survey of Systems and Software
In this survey we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal ...
[35]
[PDF] 7 Series FPGAs Configurable Logic Block User Guide (UG474)
Nov 17, 2014 · Each 7 series FPGA slice contains four LUTs and eight flip-flops; only SLICEMs can use their LUTs as distributed RAM or SRLs. 2. Number of ...
[36]
Coarse-Grained Reconfigurable Computing with the Versat ... - MDPI
Mar 12, 2021 · This paper provides an overview of coarse-grained reconfigurable architectures and describes Versat, a Coarse-Grained Reconfigurable Array (CGRA) with self- ...
[37]
[PDF] FPGA Architecture: Survey and Challenges
FPGAs consist of programmable logic blocks which implement logic functions, programmable routing to interconnect these functions and. I/O blocks to make off- ...
[38]
https://typeset.io/pdf/a-survey-on-reconfigurable-system-on-chips-2x3hisujog.pdf
[39]
FPGA Dynamic and Partial Reconfiguration - ACM Digital Library
Dynamic and partial reconfiguration are key capabilities of FPGAs, which are reviewed in this survey, along with architectures and applications.
[40]
[PDF] FPGA Dynamic and Partial Reconfiguration - WRAP: Warwick
Dynamic and partial reconfiguration are key FPGA capabilities, allowing runtime function changes and modification of parts of the hardware.
[41]
(PDF) Partial reconfiguration on FPGAs in practice — Tools and ...
This tutorial gives a survey on state-of-the-art trends on reconfigurable architectures and devices, application specific requirements, and design techniques ...
[42]
[PDF] Partial Reconfiguration of Xilinx FPGAs - Doulos
Each configuration will generate one full bitstream and one partial bitstream for each reconfigurable partition/module. Page 4. Partial Reconfiguration of ...
[43]
[PDF] A self-reconfiguring platform - Eric Keller
Abstract. A self-reconfiguring platform is reported that enables an FPGA to dynamically reconfigure itself under the control of an embedded microproces-.
[44]
[PDF] module-based-implementation-of-partial-reconfiguration-in-fpga-for ...
Module-based partial reconfiguration of FPGAs play important role, it provides possibility for runtime flexibility. It enables hardware tasks to.
[45]
[PDF] Performance of Partial Reconfiguration in FPGA Systems
The paper is structured as follows: Section 2 has the basics of partial recon- figuration and discusses recent works that include measurement of reconfiguration.
[46]
[PDF] Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra - arXiv
Apr 29, 2020 · Abstract—This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra.
[47]
[PDF] Demonstration of FPGA Acceleration of Monte Carlo Simulation
The FPGA implementation was over 110 times faster than an optimized parallel CPU implementation and over 270 times faster than a single-core CPU implementation.Missing: FFT linear algebra
[48]
An FPGA Accelerator for Genome Variant Calling - ACM Digital Library
Sep 1, 2023 · In particular, this accelerator is targeted at virus analysis, which is particularly challenging, compared to human genome analysis, as the ...
[49]
[2107.08716] Accelerating Weather Prediction using Near-Memory ...
Jul 19, 2021 · To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth ...
[50]
Evaluating the Design Space for Offloading 3D FFT Calculations to ...
Jun 29, 2021 · This paper evaluates offloading 3D FFT to FPGA, finding initial limitations, but with potential for 2x speedup and 3.7x-4.1x lower power ...
[51]
[PDF] arXiv:1404.4629v2 [cs.AR] 18 Apr 2014
Apr 18, 2014 · [2010] compare the energy efficiency of a GPU with an FPGA and a single and a multi-core CPU for three throughput computing applications, viz.
[52]
Good Times for FPGA Enthusiasts - TOP500
Nov 8, 2016 · The prospect of FPGA-powered supercomputing has never looked brighter. The availability of more performant chips, the maturation of the OpenCL toolchain,
[53]
https://ieeexplore.ieee.org/document/7001919
[54]
https://ieeexplore.ieee.org/document/1213314
[55]
Dynamic FPGA reconfiguration for scalable embedded artificial ...
The methodology uses dynamic FPGA reconfiguration to enable runtime customization of CNNs and hardware, enhancing performance and reducing latency.
[56]
[PDF] Partial Reconfiguration for Energy-Efficient Inference on FPGA - HAL
Sep 18, 2024 · Efficient acceleration of deep convolutional neural networks is currently a major focus in Edge Computing research. This paper presents a ...<|separator|>
[57]
https://ieeexplore.ieee.org/document/1661622
[58]
(PDF) A reconfigurable embedded vision system for advanced driver ...
Aug 7, 2025 · Usually, an ADAS is a vision-based tracking system that relies on observing the heading of the vehicle, via a camera that detects lanes, ...
[59]
https://www.worldscientific.com/doi/abs/10.1142/S2301385022300013
[60]
[PDF] Autonomous FPGA Reconfigurability in Embedded Systems - HAL
Jan 28, 2025 · Our method enables real-time hardware reconfiguration to address defects. For example, when a section of the FPGA becomes non-functional due to ...
[61]
https://ieeexplore.ieee.org/document/4550799
[62]
Field Programmable Gate Arrays (FPGAs) for Artificial Intelligence (AI)
FPGAs are reconfigurable computing components that can be used to accelerate AI workloads. · FPGAs play an important role in enabling AI at the edge, in the data ...What Is An Fpga? · Role Of Fpgas In Ai · Benefits Of Fpgas For Ai
[63]
AES Hardware Accelerator on FPGA with Improved Throughput and ...
Nov 7, 2017 · High-throughput and resource-optimized implementation of 128-bit Advanced Encryption Standard (AES 128-bit), which can be used as an accelerator, is presented ...
[64]
A system for fault detection and reconfiguration of hardware based ...
The FPGA can be reconfigured multiple times on-the-fly with several Active Applications (IP-cores). A fault detection module is permanently configured in one of ...
[65]
https://www.hyperelliptic.org/tanja/SHARCS/talks06/copacobana.pdf
[66]
The Future of Radar Technology – Integrating RFSoC with ...
This paper proposes a radar system design that combines Radio Frequency System-on-Chip (RFSoC) technology with reconfigurable computing.
[67]
An Isolated Partial Reconfiguration Design Flow for Xilinx FPGAs
This allows building secure and dependable systems that can use partial reconfiguration to mitigate from single-event upsets (SEUs) and that are more tolerant ...
[68]
Protecting the FPGA IPs against Higher-order Side Channel Attacks ...
In this work, we proposed a novel countermeasure which utilizes the Dynamic Partial Reconfiguration (DPR) property of the FPGA devices to obfuscate the leaked ...
[69]
Fast and reconfigurable packet classification engine in FPGA-based ...
Packet classification is a fundamental task for network devices such as routers, firewalls, and intrusion detection systems. In this paper we present ...
[70]
Efficient Reconfigurable Modular Multipliers for Post-Quantum ...
We present two efficient designs for common PQC algorithms q sizes (5–32 bits). These are implemented on the Xilinx Virtex-7 FPGA platform and demonstrate ...
[71]
[PDF] Coarse Grained Reconfigurable Array (CGRA) - NUS Computing
In [40], the power consumption of a 4KB configuration memory in a 4x4 CGRA is around 40% of the whole chip power. A spatial CGRA is more energy-efficient than ...
[72]
[PDF] Coarse-Grained Reconfigurable Array Architectures
Some CGRAs, like ADRES, Silicon Hive, and MorphoSys are fully dynamically reconfigurable: exactly one full reconfiguration takes place for every execution ...Missing: seminal | Show results with:seminal
[73]
(PDF) MorphoSys: A Coarse Grain Reconfigurable Architecture for ...
Aug 7, 2025 · MorphoSys is a reconfigurable architecture for computation intensive applications. It combines both coarse grain and fine grain ...
[74]
Architectural Exploration of the ADRES Coarse-Grained ...
Reconfigurable computational architectures are envisioned to deliver power efficient, high performance, flexible platforms for embedded systems design.
[75]
AMD Versal AI Edge Series
The Versal AI Edge series delivers high performance, low latency AI inference for intelligence in automated driving, predictive factory and healthcare systems.Applications & Industries · Product Specifications · For All DevelopersMissing: hybrid reconfigurable
[76]
A survey on processing-in-memory techniques: Advances and ...
In this survey, we analyze recent studies that explored PIM techniques, summarize the advances made, compare recent PIM architectures, and identify target ...
[77]
(PDF) A comparative study of FPGA and CGRA technologies in ...
This paper conducts a comprehensive comparative analysis of FPGA and CGRA for accelerating deep learning workloads.
[78]
An ultra-low-power CGRA for accelerating Transformers at the edge
Jul 17, 2025 · This paper introduces an ultra-low-power CGRA designed to accelerate GEMM operations in transformer models for edge applications, using a 4x4 ...
[79]
[PDF] How do Logic Simulation, Emulation, and FPGA Prototyping work
Logic simulation mimics and validates digital circuit designs by simulating hardware behavior in a computer, using models written in HDL.
[80]
1. FPGA Review and Emulation Overview - FPGAEmu - Read the Docs
Hardware emulation allows these manufacturers to debug their designs in simulated but realistic conditions before undertaking the extreme cost of mass ...
[81]
AMD Vitis™ HLS
The AMD Vitis™ HLS tool allows users to easily create complex FPGA algorithms by synthesizing a C/C++ function into RTL. The Vitis HLS tool is tightly ...
[82]
Vitis High-Level Synthesis User Guide (UG1399) - 2025.1 English
Sep 10, 2025 · Vitis High-Level Synthesis User Guide (UG1399) - 2025.1 English - Describes using the AMD Vitis™ High Level Synthesis tool. - UG1399.Introduction to Vitis HLS... · Obtaining a Vitis HLS License · HLS Pragmas · PointersMissing: FPGA | Show results with:FPGA
[83]
EC2 F1 Instances with FPGAs – Now Generally Available
Apr 19, 2017 · We are making the F1 instances generally available in the US East (N. Virginia) Region, with plans to bring them to other regions before too long.Missing: emulation prototyping
[84]
MiSTer FPGA Hardware | RetroRGB
The MiSTer is an open-source project that emulates consoles, computers and arcade boards via FPGA – This is different from software emulation.
[85]
FPGA Prototyping for Faster Validation and Production
Aug 14, 2024 · FPGA emulation is a critical tool for late-stage validation, offering a detailed and holistic view of how a design will function in real-world ...Missing: bitstream | Show results with:bitstream
[86]
Faster AI Chip Design Emulation & Prototyping | Synopsys Blog
Mar 20, 2024 · Synopsys ZeBu EP2 provides the fastest emulation platform for AI workloads, making it ideal for software/hardware validation and power/performance analysis.<|separator|>
[87]
[PDF] Vivado Design Suite User Guide: High-Level Synthesis
May 4, 2021 · The Xilinx® Vivado® High-Level Synthesis (HLS) tool transforms a C specification into a register transfer level (RTL) implementation that ...Missing: reconfigurable | Show results with:reconfigurable
[88]
an OpenCL framework for embedded systems using xilinx FPGAs
This paper presents UT-OCL, an OpenCL framework for embedded systems using FPGAs. The framework is composed of a hardware system and its necessary software ...
[89]
Halide – Communications of the ACM
Jan 1, 2018 · We propose a new programming language for image processing pipelines, called Halide, that separates the algorithm from its schedule.
[90]
HeteroHalide: From Image Processing DSL to Efficient FPGA ...
Feb 24, 2020 · We propose HeteroHalide, an end-to-end system for compiling Halide programs to FPGA accelerators. This system makes use of both algorithm and scheduling ...
[91]
[PDF] A Flexible Array of Reusable Run-Time-Reconfigurable IP-Blocks
Consequently, to enable an efficient design flow, we devise a set of prerequisites to increase the flexibility and reusability of current FPGA-based RTR.
[92]
[PDF] IP-XACT Extensions for Reconfigurable Computing
Using IP-XACT, hardware components can be described in a standardized way. This enables automated configuration and integration of IP blocks, aiding hardware.
[93]
[PDF] Exploring Formal Verification Methodology for FPGA-based Digital ...
The verification algorithms developed by this work support the analysis of such critical digital components with mathematical reasoning from automated theorem ...
[94]
High-level Synthesis Directives Design Optimization via Large ...
Sep 11, 2025 · HLS design flow. A behavioral description, synthesis directives and related constraints are given to the design tool, which enables the tool to ...
[95]
Dynamic FPGA reconfiguration for scalable embedded artificial ...
Oct 3, 2025 · Dynamic FPGA reconfiguration for scalable embedded artificial intelligence (AI): A co-design methodology for CNN acceleration. February 2025 ...
[96]
[PDF] A Survey of System Architectures and Techniques for FPGA ... - arXiv
In some of the literature, a hypervisor is referred to as an OS, a resource management system/framework [30] [31], a virtual machine monitor (VMM) [32], a run- ...
[97]
Multi-Tenant Cloud FPGA: A Survey on Security, Trust, and Privacy
Apr 12, 2025 · PR is a technique where the partial region of the FPGA HW fabric is reconfigured through the configuration memory layer while not interrupting ...
[98]
[PDF] Hypervisor Mechanisms to Manage FPGA Reconfigurable ... - HAL
Nov 6, 2018 · Each guest OS is running in an isolated domain named virtual machine, and is managed by an underlying virtual machine monitor (VMM). The VMM ...
[99]
[PDF] Reconfigurable Computing Hypervisors: State-of-the-Art and Ways ...
Feb 14, 2025 · A detailed survey study of recent publications has been published by. [WWG21] which gives an overview of current research, including solutions ...
[100]
[PDF] Cryptographically Secure Multi-Tenant Provisioning of FPGAs - arXiv
bitstreams would potentially be encrypted in bulk, a symmetric- key encryption algorithm such as AES-128 is the ideal choice in this regard. Note that this ...