C-element

The C-element, also known as the Muller C-element or C-gate, is a fundamental synchronization primitive in asynchronous digital circuit design, introduced by David E. Muller in 1955 and first used in the ILLIAC II computer, that produces an output transition only when all of its inputs agree on the same logic level (all high or all low), thereby enabling robust coordination without relying on a global clock.^[1]^[2]^[3] This device exhibits hysteresis, meaning it maintains its current output state until the inputs unanimously stabilize on a change, which prevents glitches, oscillations, and unstable intermediate states while providing noise immunity and stability in timing-independent operations.^[2] In asynchronous systems, the C-element functions as a rendezvous or join mechanism, facilitating delay-insensitive communication and mutual exclusion by merging multiple request signals into a single coordinated response.^[1]^[2] Commonly implemented in CMOS with series-connected transistors for both pull-up and pull-down networks, the C-element serves as a basic building block in various asynchronous architectures, including Muller pipelines, FIFOs, arbiters, and hazard-free control logic.^[1] Its behavior can be described by the following characteristic: for a two-input C-element with inputs A and B and output C, C sets to 1 only if both A and B are 1 while C is 0, resets to 0 only if both are 0 while C is 1, and holds otherwise.^[2] Key applications include synchronizing data paths at pipeline boundaries, supporting two-phase handshake protocols, and enabling modular, low-power designs in systems where clock skew or global timing is impractical.^[1] By promoting efficiency in area and delay—such as achieving a three-gate-delay cycle time in C-element-based FIFOs—the component has become integral to self-timed circuits, enhancing reliability in high-performance computing and embedded systems.^[1]^[2]

Introduction

Definition and Purpose

The C-element, also known as the Muller C-element, is a fundamental memory element in asynchronous digital logic with inherent hysteresis. For a two-input version, it produces an output of 1 only when both inputs are 1, an output of 0 only when both inputs are 0, and holds its prior state in all other cases, functioning similarly to an asynchronous set-reset latch.^[4] This behavior ensures stable state retention until explicit input consensus, distinguishing it from standard combinational gates.^[5] The core purpose of the C-element is to enable synchronization and hazard-free coordination in asynchronous circuits, where signals from concurrent processes arrive without a global clock. It serves as a rendezvous point, delaying output changes until all inputs align, thereby preventing glitches and supporting speed-independent designs tolerant to gate and wire delays.^[4] This synchronization is critical for reliable operation in handshaking protocols, dataflow structures, and self-timed systems, ensuring orderly signal propagation without timing assumptions beyond input agreement.^[1] A key characteristic of the C-element is its reliance on monotonic inputs, where signals transition unidirectionally (all rising or all falling) without reversal until the output responds, guaranteeing hazard-free behavior in speed-independent contexts.^[4] The basic symbolic representation features two inputs (A and B) connected to a central gate-like symbol yielding output Q, with a feedback loop from Q back to the inputs to sustain the hysteretic state.^[4]

Historical Development

The C-element was developed by David E. Muller in the 1950s during his pioneering work on self-timed systems at the University of Illinois at Urbana-Champaign.^[6] Muller's research focused on asynchronous sequential circuits that could operate reliably without a global clock, addressing challenges in early computing systems.^[7] The element emerged from studies on relay-based asynchronous circuits and techniques for hazard avoidance, where hazards refer to unintended glitches in logic transitions. The C-element was first formally specified by Muller in 1955 and first implemented in the ILLIAC II supercomputer, which became operational in 1962.^[8] The theoretical foundation for asynchronous switching circuits using lattice theory to model behaviors independent of timing delays was provided in Muller's 1956 report (published 1959) co-authored with W. S. Bartky.^[6] In the following decades, the C-element gained prominence through its adoption in key asynchronous designs. It was integral to Ivan Sutherland's micropipelines, introduced in his 1989 Turing Award lecture, which demonstrated efficient dataflow processing using self-timed handshaking protocols. The element was also influenced by Charles L. Seitz's contributions to self-timed VLSI at Caltech during the 1970s and 1980s, including his 1979 paper on systems that ensure correct operation regardless of gate delays, paving the way for scalable asynchronous integrated circuits. By the 1990s, the C-element had been integrated into hazard-free logic minimization tools, enabling automated synthesis of robust asynchronous controllers that avoid static and dynamic hazards. In the post-2000 era, it found renewed relevance in low-power designs and globally asynchronous locally synchronous (GALS) architectures, supporting energy-efficient interfaces between synchronous islands in systems-on-chip.^[9]

Functional Specifications

Truth Table

The logical behavior of the C-element, a fundamental synchronization primitive in asynchronous digital circuits, is formally specified by its truth table. This table defines the next output value Q^+ as a function of the inputs A and B, along with the previous output value Q. The C-element asserts Q^+ = 1 only when both inputs are asserted (A = 1, B = 1), deasserts Q^+ = 0 when both are deasserted (A = 0, B = 0), and otherwise retains the prior state (Q^+ = Q) to ensure stability during input disagreement.^[10]

A	B	Q^+
0	0	0
0	1	Q
1	0	Q
1	1	1

This table captures the device's symmetric hysteresis, where the output follows the inputs exclusively upon consensus.^[10] The state diagram of the C-element graphically represents this hysteresis as a bistable system with two stable states (Q = 0 and Q = 1). Transitions occur solely on input agreement: from Q = 0 to Q = 1 when both A and B rise to 1, and from Q = 1 to Q = 0 when both fall to 0. No transitions happen when inputs differ, reinforcing the hold behavior and preventing oscillations.^[10] For reliable operation in asynchronous environments, the inputs to the C-element must exhibit monotonicity, meaning signals should transition monotonically (e.g., from 0 to 1 without intermediate 1-to-0 toggles, or vice versa) to preserve circuit integrity. Violations of this requirement can induce undefined behavior, such as glitches or erroneous state changes, disrupting synchronization.^[11] In contrast to combinational gates like AND or OR, which produce outputs solely based on current inputs without memory, the C-element's state-holding mechanism introduces sequential hysteresis, enabling robust event synchronization rather than immediate logical combination.^[11]

Delay Assumptions and Timing Constraints

The operation of the C-element relies on specific delay assumptions to ensure reliable behavior in asynchronous circuits, particularly the inertial delay model for gates. Under this model, each gate has a propagation delay during which it filters out input pulses shorter than the gate delay itself, preventing spurious transitions from noise or glitches. This assumption is crucial for the C-element, as it models the gate's response time as both a transport delay (time to propagate a signal) and an inertial characteristic that rejects brief disturbances, thereby maintaining output stability unless inputs stabilize in agreement.^[4]^[12] In asynchronous designs incorporating multiple C-elements, the isochronic fork assumption is essential to prevent race conditions. This assumption requires that when a signal forks to the inputs of several C-elements, the delays in the fork branches are sufficiently matched—typically within the setup time of the receiving C-elements—so that the signal arrives simultaneously enough to avoid one input changing before the other, which could lead to incorrect output latching. Violations of this assumption can introduce timing nondeterminism, but it is a standard constraint in quasi-delay-insensitive (QDI) circuits to guarantee correctness under bounded gate delays and zero or equal wire delays at forks.^[4] Key timing parameters for the C-element include the propagation delay \tau_{pd}, defined as the time from a valid input change (both inputs agreeing) to the corresponding output transition, and the setup time \tau_{su}, the minimum duration that inputs must remain stable before the output can reliably respond. These parameters ensure that the C-element only updates its output after inputs have settled, with \tau_{pd} typically bounded but unknown in speed-independent designs (e.g., around 2 ns in example CMOS implementations), while \tau_{su} enforces input stability to avoid metastability, often tied to the inertial delay for pulse filtering. The relationship can be expressed as the output change occurring only if inputs are stable for at least \tau_{su} following the inertial delay.^[4]^[12] The C-element is inherently robust against static hazards when inputs change monotonically, as its feedback mechanism holds the output state until both inputs agree, preventing momentary glitches that could occur in combinational logic. For instance, if one input rises while the other is already high, the output remains high without dipping to low, avoiding a static-1 hazard; similarly for falling transitions. However, dynamic hazards may arise if inter-gate delays vary significantly under non-monotonic inputs, potentially causing temporary output glitches during the propagation phase, though the inertial model mitigates short pulses in such cases.^[12]^[4] In simulation tools like VHDL and Verilog, C-element timing is modeled using delay annotations (e.g., #delay in Verilog for inertial or transport delays) to replicate real-world propagation and setup behaviors, allowing verification of asynchronous protocols under assumed delay models. These tools support inertial delay semantics natively, where assignments with delays filter short events, but simulations must incorporate isochronic fork approximations (e.g., zero wire delay) to check for races, though they cannot fully prove timing closure without formal methods.^[4]

Circuit Implementations

Static Implementations

The static CMOS implementation of the C-element realizes its hysteresis function using complementary logic gates without dynamic storage, ensuring robust state retention through continuous feedback paths. The basic design employs cross-coupled inverters for the feedback mechanism, augmented by input logic gates that compute the AND of both inputs for setting the output high and the OR of the complements for resetting it low, totaling 6 transistors in a standard configuration.^[4] Schematic details feature complementary pull-up and pull-down networks that prevent any direct path from Vdd to ground during the hold state when inputs differ. The pull-up network is configured to drive the output high only when both inputs are high (consensus to 1), typically using inverted gating or parallel PMOS structures in series with feedback to enforce the set condition while maintaining the state. The pull-down network is similarly structured to pull the output low only when both inputs are low (consensus to 0), using series NMOS with appropriate feedback to avoid contention. The feedback loop maintains the state otherwise. This structure, introduced in early asynchronous pipeline designs, avoids contention and short-circuit currents in steady states.^[13] Key advantages include operation without a clock signal, resulting in low static power dissipation due to the absence of continuous switching, and glitch-free behavior provided inputs transition monotonically as per delay assumptions. These properties make static realizations particularly suitable for power-efficient asynchronous systems where robustness to process variations is essential.^[10] Transistor sizing recommendations focus on balancing rise and fall times; PMOS widths are typically set to twice those of NMOS transistors to account for lower hole mobility, ensuring symmetric switching characteristics without impacting functionality.^[4]

Dynamic and Semistatic Implementations

Dynamic implementations of the C-element rely on capacitors to store the output state temporarily, operating through distinct precharge and evaluate phases managed by clocked transistors. However, pure dynamic versions typically implement the set function (output to 1 on input consensus to 1) by precharging low and evaluating to high, or vice versa, but full hysteresis requires additional mechanisms. A basic design for the set direction incorporates four transistors: one clocked NMOS for precharge to ground (or PMOS to VDD depending on polarity), one clocked PMOS/NMOS for evaluation, and logic transistors (e.g., parallel PMOS for inputs to charge when both high). This approach reduces the transistor count to four compared to six in static designs, resulting in smaller area and potentially faster switching due to the absence of strong feedback contention. However, dynamic C-elements are susceptible to charge sharing between the dynamic node and intermediate nodes in the logic network, which can erroneously alter the output, and they require periodic refresh mechanisms to counteract charge loss from subthreshold leakage.^[4]^[14] Semistatic implementations blend dynamic node operation with static feedback to enhance reliability, incorporating keeper transistors to maintain state against leakage while minimizing power overhead. These designs feature a dynamic precharge-evaluate structure augmented by a weak feedback PMOS (or NMOS) transistor connected from VDD (or output inverter) to the dynamic node, sized to provide just enough drive strength to hold the state without significantly impeding transitions. The keeper activates during evaluation to resolve potential races by weakly pulling the node to the stable state if inputs agree, ensuring hysteresis without full static contention. With a transistor count of approximately five—adding the keeper to the core dynamic quartet—semistatic C-elements offer improved robustness over pure dynamic versions, with switching speeds that are faster than static but moderated by the weak feedback, and reduced vulnerability to charge sharing due to the restorative mechanism.^[4]^[15] Overall, both dynamic and semistatic variants trade static reliability for efficiency in asynchronous systems, where delay assumptions ensure inputs remain stable long enough for evaluation without clock overlap issues. Performance evaluations in CMOS technologies show dynamic designs achieving up to 20-30% lower latency in pipeline stages compared to static, though at the cost of higher susceptibility to process variations and noise. Recent analyses as of 2024 highlight comparative low-power benefits in finFET-based semistatic variants.^[4]^[10]^[16]

Gate-Level and Transistor-Level Designs

Gate-level designs of C-element networks often involve meshing multiple C-elements to form hazard-free asynchronous control logic, where Karnaugh maps are used to minimize the implementation while ensuring no glitches occur during input transitions. This approach allows for the synthesis of complex functions by identifying prime implicants that cover both the functional behavior and required consensus terms to eliminate static hazards, resulting in compact multi-level networks suitable for speed-independent circuits. For instance, a generalized C-element implementation can be derived from a Karnaugh map that incorporates set and reset conditions, enabling efficient combination with standard gates for larger asynchronous modules.^[4] At the transistor level, optimizations focus on reducing transistor count and stack height to improve speed and area efficiency in CMOS realizations. Stacked transistor configurations in static C-elements limit the number of series transistors to two, minimizing voltage drop and capacitance, which reduces propagation delay by up to 20% compared to taller stacks while maintaining robustness against process variations. Pass-transistor logic variants achieve further area savings, with designs using as few as 5 transistors by leveraging transmission gates for input evaluation and a weak keeper for hysteresis, offering 30% smaller footprint than the conventional 6-transistor static version without sacrificing hysteresis functionality. These optimizations are particularly effective in high-density VLSI, where transistor folding allows shared diffusion regions between adjacent devices.^[17] Differential signaling implementations enhance high-speed performance through sense-amplifier-based C-elements, employing cross-coupled inverter pairs to provide strong feedback and rapid resolution of metastable states. This topology uses differential inputs to drive a latch-like structure, achieving 1.5-2x faster switching times than single-ended static designs at the cost of doubled transistor count (typically 10-12 transistors), making it ideal for pipelined asynchronous systems operating above 1 GHz. Simulations in 65 nm CMOS demonstrate that such differential C-elements exhibit lower sensitivity to supply noise, with setup times under 50 ps.^[18] Quantitative comparisons reveal trade-offs in power-delay product (PDP) across implementations; static C-elements in 65 nm technology achieve a PDP of approximately 10 fJ at 1 GHz, while optimized dynamic variants with pass-transistor elements reduce this to around 5 fJ due to lower switching capacitance, though at the expense of increased leakage in hold modes. Layout considerations emphasize diffusion sharing between pull-up and pull-down networks to minimize interconnect parasitics, potentially cutting parasitic capacitance by 15-25% in standard cell libraries and enabling denser integration in asynchronous VLSI designs. These techniques, when combined, yield up to 40% overall area reduction in multi-C-element meshes without compromising timing constraints.^[19]

Non-Transistor-Based Realizations

Non-transistor-based realizations of the C-element leverage alternative physical mechanisms to achieve the required hysteresis and mutual agreement behavior, enabling applications in emerging computing paradigms beyond traditional CMOS. These implementations often prioritize low power, non-volatility, or compatibility with novel substrates, such as mechanical, resistive, magnetic, quantum-dot, or optical systems. Key examples include nanoelectromechanical (NEM) relays for mechanical switching, memristive devices for resistive feedback, spintronic elements using magnetic tunnel junctions (MTJs) for spin-based state retention, quantum-dot cellular automata (QCA) for charge-based interactions, and photonic structures for light-induced bistability.^[20]^[21]^[22]^[23]^[24] NEM relay implementations utilize mechanical deflection to create bistable switches that mimic the C-element's hold state without electrical transistors. In these designs, NEM switches replace the feedback loop and pull-down networks of conventional C-elements, achieving hysteresis through physical latching of suspended beams under electrostatic actuation. Simulations demonstrate that a 64-bit NEM-based C-element offers over 16× energy efficiency compared to CMOS equivalents, with standby power near zero due to the absence of leakage currents in mechanical rest states. This approach is particularly suited for asynchronous VLSI, where the mechanical nature provides robustness against radiation but introduces challenges in switching speed around 1-10 ns.^[20] Memristor-based C-elements employ resistive switching elements, such as programmable metallization cells (PMCs), to realize the feedback mechanism via variable resistance states that retain history-dependent behavior. A PMC-C-element design integrates a single PMC with a CMOS inverter, where the PMC's electrodeposition process creates asymmetric thresholds for set and reset, enforcing output stability until input consensus. This configuration supports stochastic computing for Bayesian inference, processing bitstream data with probabilistic accuracy exceeding 90% in simulations for simple hypotheses. The non-volatile nature of PMCs enables zero-static-power operation, though hybrid integration limits pure non-transistor purity; energy per inference is reduced by up to 50% relative to all-CMOS stochastic gates.^[21] Spintronic realizations use MTJs to exploit magnetic hysteresis for state holding, where the relative magnetization orientation between ferromagnetic layers determines resistance states detectable via tunneling currents. In clockless spintronic NULL convention logic (STENCL), MTJs and domain wall (DW) devices form C-elements by controlling DW motion with spin-transfer torque, requiring critical current densities of approximately 5.2 × 10¹² A/m² for stationary walls and 6.2 × 10¹² A/m² for motion at 20 m/s. This achieves intrinsic bistability without explicit feedback transistors, supporting quasi-delay-insensitive asynchronous circuits with 2× to 20× energy savings (e.g., 0.876 pJ for 32-bit arithmetic) and up to 8× area reduction compared to CMOS, albeit with delays around 35 ns.^[22] QCA-based C-elements operate through columbic repulsion between electron configurations in quantum dots, forming cellular arrays that propagate polarization without transistor-mediated charge flow. Designs feature majority gates with clock-zone-shifted feedback loops; a simple 2-input version uses 14 cells across three clock phases to ensure output transition only on input agreement, occupying 0.013 μm² with total energy dissipation of 1.15 × 10⁻² eV. Complex variants duplicate cells for fault tolerance, improving robustness against cell misalignment or missing dots by up to 143% (e.g., 86.63% functionality retention under combined defects), though at higher area (0.029 μm² for 2-input) and energy (3.39 × 10⁻² eV). These are promising for nanoscale asynchronous logic, with 56% area and 66% energy improvements over prior QCA gates.^[23] Photonic C-elements achieve hysteresis via optical bistability in structures like photonic crystal cavities, where dual-argument phase shifts induce nonlinear refractive index changes for memory-like behavior. In all-optical designs, a central cavity with nonlinear rods exhibits bistable transmission as a function of input power and phase, enabling set-hold-reset dynamics analogous to C-element agreement detection without electrical components. Operating thresholds occur between 2-6 μW/μm², with hysteresis loops supporting switching times below 1 ps, suitable for high-speed optical computing; contrast ratios exceed 10 dB in simulations. This approach avoids transistor heat dissipation, facilitating integration in silicon photonics for ultra-low-power logic.^[24]

Extensions and Generalizations

Multi-Valued Logic Adaptations

The C-element has been generalized to multi-valued logic (MVL) frameworks beyond binary systems, where the output retains its prior state until all inputs converge to the same unambiguous logic level, preventing transitions amid partial or conflicting signals. This adaptation employs threshold detectors to sense agreement on distinct voltage thresholds representing logic states, ensuring hysteresis-like behavior in systems with more than two values. Such extensions maintain the core synchronization properties of the binary C-element while accommodating higher radix logics like ternary or quaternary.^[25]^[26] In ternary adaptations, the C-element operates across three states, often represented in balanced ternary as -1, 0, and +1, corresponding to distinct voltage levels (e.g., -V_{DD}, 0, +V_{DD}). The output updates to the agreed state only upon unanimous input alignment on a definite value, such as all inputs at +1 or all at -1, while holding steady for any disagreement or intermediate combinations; for all inputs at 0, the output may set to 0 if defined as definite, or hold otherwise depending on the design. For a two-input ternary C-element, the behavior is captured in the following truth table:

Input x_1	Input x_2	Output y
-1	-1	-1
-1	0	Hold
-1	+1	Hold
0	0	0
0	+1	Hold
+1	+1	+1

This table assumes symmetric two-input agreement, with "Hold" indicating retention of the previous output value.^[26]^[25] Implementations of ternary C-elements in multi-threshold CMOS leverage additional voltage references to differentiate states, enabling compact circuits with enhanced threshold control via transistor gate biasing. These circuits support asynchronous operation by enforcing input-output hysteresis at multiple levels.^[25]^[26] Ternary and higher MVL C-elements offer benefits including elevated information density in asynchronous pipelines, allowing more data per wire and reducing interconnect overhead, which is advantageous for AI accelerators and multi-level signal processing where compact representation of probabilistic or analog-like computations is key. However, challenges arise from heightened complexity in enforcing monotonicity across states, as ensuring hazard-free transitions requires stricter covering conditions and precise threshold tuning to avoid glitches in multi-level domains.^[27]^[25]

Variations for Specific Applications

Specialized variants of the C-element have been developed to address constraints in harsh environments, power-limited scenarios, and complex synchronization needs. Radiation-hardened designs, such as those using dual interlocked storage cell (DICE) techniques or dual redundancy in the feedback loop, enhance resilience against single event upsets (SEUs) caused by cosmic radiation in space applications.^[28] For instance, a dual-redundancy C-element-based flip-flop extension achieves SEU immunity through C-element hysteresis and parallel data paths, occupying about 30 μm² in 65 nm CMOS with improved speed over full TMR counterparts.^[29] In power-constrained environments like IoT sensors, low-power C-element variants operate in the subthreshold regime, where supply voltages drop below the transistor threshold to minimize energy consumption while preserving functionality.^[30] These adaptations leverage the C-element's inherent glitch-free nature in quasi-delay-insensitive (QDI) designs, making them suitable for battery-operated sensors where average power drops to nanowatts during idle states and leakage power is reduced by up to 97% compared to nominal operation using techniques like power gating.^[9]^[31] For applications requiring synchronization of more than two signals, scalable multi-input C-elements extend the basic two-input design using a tree structure of intermediate two-input C-elements to handle n > 2 inputs efficiently.^[4] This hierarchical arrangement avoids the area and delay penalties of wide fan-in gates by cascading C-elements in a binary tree, where each level merges outputs from child nodes, ensuring monotonicity and hysteresis across the entire structure with logarithmic depth scaling.^[32] Such implementations reduce propagation delays for large n, as seen in self-timed circuits where a tree-based 8-input C-element achieves up to 20% faster completion detection than flat multi-input alternatives.^[33] Fault-tolerant C-element designs incorporate built-in self-test (BIST) capabilities and error-correcting feedback loops to detect and recover from transient faults, such as those induced by noise or aging in unreliable computing environments.^[34] These variants use time-redundant sampling combined with C-element decoding to mask errors, where duplicated computations feed into a Muller C-element-based decoder (MCD) that corrects single-bit transients by exploiting the element's hold-last-output property.^[35] BIST features are embedded via scan chains or redundant observers that periodically verify feedback loop integrity, enabling on-line error correction without halting operation and achieving fault coverage exceeding 95% for stuck-at faults.^[36] A notable case study of these variations appears in asynchronous Network-on-Chip (NoC) routers, where multi-input C-elements synchronize data flits from multiple ports to prevent metastability in globally asynchronous locally synchronous (GALS) systems.^[37] In a 2D mesh topology, tree-structured C-elements merge completion signals from input buffers, enabling wormhole routing with throughputs up to 550 Mflits/s while reducing power by 86% over synchronous counterparts through adaptive idling.^[38] Fault-tolerant enhancements, including TMR in critical C-elements, further ensure data integrity in NoC interconnects, as demonstrated in prototypes tolerating up to 10% link failures without packet loss.^[39]

Applications in Digital Systems

Role in Asynchronous Logic

The C-element serves as a fundamental synchronization primitive in speed-independent asynchronous circuit design, where it acts as an arbiter in handshake protocols by changing its output state only when all inputs agree, thereby ensuring robust operation without reliance on specific gate or wire delays.^[2] This behavior enables the realization of delay-insensitive circuits, such as those specified via signal transition graphs (STGs), where the C-element's monotonic cover property guarantees hazard-free transitions during state changes.^[4] In particular, it facilitates mutual exclusion in request-acknowledge handshaking, coordinating data transfer between circuit modules by holding the previous output until inputs stabilize, which is essential for protocols like four-phase bundled-data and dual-rail encoding.^[2] A prominent application is in Muller pipelines, where series-connected C-elements propagate events sequentially, ensuring that each stage completes before the next begins, thus maintaining orderly data flow in delay-insensitive pipelines.^[2] These pipelines, originally proposed by David Muller, leverage the C-element's state-holding capability to implement two-phase or four-phase handshaking without global timing assumptions, allowing for composable asynchronous systems.^[40] In micropipeline architectures, C-elements integrate with toggle flip-flops to bundle data events, where the flip-flops alternate between rising and falling edges to generate request signals, while C-elements synchronize acknowledges and control oscillations in feedback loops, enabling clockless data streaming.^[13] This configuration separates data paths from control, using the C-element's hysteresis to latch states during transitions. The C-element also plays a critical role in hazard elimination within self-timed systems, where it detects completion of operations by combining signals via OR or AND logic, preventing glitches from propagating and ensuring stable outputs even under variable delays.^[4] By filtering dynamic and static hazards—such as unwanted 0-1-0 pulses—it supports inertial delay models that model real gate behavior, while its use in completion detectors for dual-rail encodings mitigates metastability risks during arbitration, as the element resolves to a defined state only after input consensus.^[4] In burst-mode controllers, C-elements sequence state transitions by mapping production rules to generalized forms with set (n-stack) and reset (p-stack) conditions, allowing multiple input changes in bursts while enforcing stability between them; synthesis tools like MINIMALIST automate this to produce hazard-free implementations from STGs.^[41]^[42] This asynchronous synchronization via C-elements enables average-case performance improvements over synchronous designs, particularly in variable workloads, by sensing computation completion and propagating signals at data-dependent speeds rather than fixed clock cycles, potentially doubling throughput in event-driven pipelines through dual-edge utilization.^[13]^[43] Recent applications include low-power designs for Internet of Things (IoT) devices and edge artificial intelligence accelerators, leveraging C-elements for energy-efficient asynchronous processing as of 2020.^[44]

Integration in Synchronous and Mixed Systems

The C-element plays a crucial role in bridging asynchronous components within predominantly synchronous environments, enabling robust synchronization across clock domains without introducing hazards. In globally asynchronous locally synchronous (GALS) systems, C-elements are deployed at boundaries between clock domains to implement mesochronous interfaces, where they ensure monotonic signal transitions and prevent race conditions during data transfer between locally clocked modules. This integration allows asynchronous islands to communicate seamlessly with synchronous cores, leveraging the C-element's hysteresis property to hold states until both inputs agree, thus maintaining data integrity in mixed-timing architectures.^[45] Clockless wrappers, which encapsulate asynchronous modules for integration into synchronous system-on-chip (SoC) designs, frequently employ C-elements for handshaking and completion detection. These wrappers use dual-rail completion detection trees incorporating multi-input C-elements to generate acknowledgment signals, facilitating low-latency interfaces that double throughput in pipelined network-on-chip (NoC) links while reducing wire lengths. For instance, in GALS NoC implementations, C-elements in half-buffer pipeline stages with 4-phase quasi-delay-insensitive (QDI) encoding enable elastic data flow, as demonstrated in applications like AES cryptography and OFDM transceivers, where they contribute to a 6% power reduction at 160 MHz operation.^[46] Hybrid designs further exploit C-elements alongside dual-rail encoding to generate glitch-free clocks in mixed systems, combining bundled-data inputs with dual-rail carry chains for constant latency on empty data paths. In value-safe clocking schemes, a ring-oscillator-based generator incorporates a C-element to dynamically stretch clock periods during potential metastability events, ensuring hazard-free operation without relying solely on dual-rail protocols. This approach is particularly effective in multi-clock environments, where C-elements synchronize transitions in phase-decoupled components, supporting robust handshake protocols in asynchronous-synchronous hybrids.^[4] Early prototypes from the late 1990s, such as Caltech's MiniMIPS asynchronous microprocessor fabricated in 1999, illustrate practical integration of C-elements in mixed systems, achieving 180 MIPS at 4 W and 3.3 V while interfacing with synchronous memory via micropipeline controls. Modern field-programmable gate array (FPGA) implementations map C-elements onto lookup tables (LUTs), with a single 6-input LUT realizing a 2-input C-element on Xilinx Virtex-6 devices, yielding delays of 0.8 to 1.2 ns and minimal area overhead of one LUT per element. These LUT-based designs enhance reconfigurability for fault-tolerant hybrids, preserving the C-element's synchronization benefits in contemporary SoCs.^[47]^[48] Despite these advantages, integrating C-elements in mixed setups presents challenges, including handling clock jitter from phase variations in multi-clock domains, which can increase pipeline slip by up to 65% in GALS systems even with nominal 0.5% frequency matching. These issues necessitate careful partitioning and macro reuse to balance performance gains with design complexity.^[49]