Parallel processing
Parallel processing, also known as parallel computing, is the simultaneous use of multiple compute resources—such as central processing units (CPUs), cores, or computers—to solve a single computational problem by dividing it into smaller, concurrent tasks.[1] This approach exploits concurrency to enhance performance, enabling faster execution of complex algorithms in fields like scientific simulation, data analysis, and artificial intelligence.[2] The concept of parallel processing emerged in the mid-20th century alongside the development of early computers, with foundational work in the 1960s and 1970s focusing on multiprocessor systems to overcome the limitations of single-processor architectures.[3] By the 1980s, advancements in vector processors and shared-memory multiprocessors, such as those from Cray Research, marked a significant evolution, driven by the need for high-performance computing in scientific applications.[4] The 1990s and 2000s saw a shift toward commodity hardware, including clusters of off-the-shelf processors and the rise of multicore chips, fueled by Moore's Law and the demand for scalable systems, with supercomputers achieving performance exceeding 1 exaflop as of 2025.[5][6] Parallel processing architectures are classified using frameworks like Flynn's taxonomy, which categorizes systems based on instruction and data streams: single instruction, single data (SISD) for sequential computing; single instruction, multiple data (SIMD) for vector operations; multiple instruction, single data (MISD) for fault-tolerant pipelines; and multiple instruction, multiple data (MIMD) for general-purpose parallelism.[7] Common implementations include shared-memory systems, where multiple processors access a unified memory space (e.g., multicore CPUs), and distributed-memory systems, where processors communicate via message passing (e.g., clusters using MPI).[8] Hybrid models, combining both, are prevalent in modern supercomputers and GPUs for tasks requiring massive parallelism.[9] Key benefits of parallel processing include substantial speedups for embarrassingly parallel workloads, improved scalability for big data and simulations, and enhanced resource utilization in high-performance computing environments.[10] For instance, it enables breakthroughs in climate modeling and genomics by distributing computations across thousands of nodes.[11] However, challenges persist, such as managing synchronization to avoid race conditions, minimizing inter-processor communication overhead, ensuring load balancing, and developing portable software that scales efficiently across heterogeneous hardware.[12][13]Fundamentals
Definition and core principles
Parallel processing is a computational paradigm that involves the simultaneous execution of multiple processes or threads across multiple processing units to solve problems more efficiently than sequential execution on a single processor. This approach divides a computational task into independent subtasks that can be performed concurrently, leveraging the aggregate computational power of multiple processors to reduce overall execution time. The primary motivations for parallel processing stem from the demands of handling massive datasets, performing real-time simulations, and tackling computationally intensive applications such as scientific modeling and climate forecasting, where single-processor systems fall short in terms of speed and scalability. By distributing workloads, parallel processing enables the analysis of large-scale data volumes that would otherwise exceed the memory or processing capacity of individual machines, and it supports time-critical tasks requiring rapid results.[14] Central to parallel processing are principles that quantify performance gains, such as speedup and efficiency. Speedup S is defined as the ratio of the execution time of the best sequential algorithm T_s to the execution time of the parallel algorithm T_p on p processors:S = \frac{T_s}{T_p}.
Efficiency E measures resource utilization as the speedup divided by the number of processors:
E = \frac{S}{p}.
These metrics highlight the ideal linear scaling where S = p and E = 1, though real-world factors often yield sublinear results. Gustafson's law extends this by focusing on scalable parallelism for fixed-time problems, where the problem size grows with the number of processors; it posits that speedup is bounded by S = s + p(1 - s), with s as the serial fraction, emphasizing that nearly all computation can be parallelized in appropriately scaled problems.[15][16] The degree of achievable parallelism is fundamentally constrained by dependencies within the computation. Data dependencies occur when an operation relies on the output of a preceding operation, enforcing sequential ordering to ensure correctness. Control dependencies arise from conditional branches or loops that dictate alternative execution paths, potentially serializing portions of the code. These dependencies limit the granularity of parallelization, as unresolved conflicts can lead to synchronization overheads or reduced concurrency, impacting overall speedup and efficiency.[17]