Fact-checked by Grok 2 weeks ago

Group testing

Group testing is a statistical and combinatorial problem aimed at identifying a small number of defective or positive items, known as defectives, within a large of n items by performing tests on carefully chosen subsets, or pools, of these items. Each test reveals whether at least one defective is present in the pool (a positive outcome) or none are (a negative outcome), with the primary goal of minimizing the total number of tests required to pinpoint all defectives exactly, often achieving substantial efficiency gains over testing each item individually when the number of defectives k is much smaller than n (i.e., the sparse where k = \Theta(n^\alpha) for \alpha < 1). The method was first proposed in 1943 by Robert Dorfman to optimize large-scale blood testing for syphilis among U.S. military draftees, introducing a simple two-stage adaptive procedure where initial group tests are followed by individual retesting of positive pools, which can reduce the expected number of tests by up to a factor related to the defect probability. Early extensions formalized the problem in combinatorial terms, with contributions from Sobel and Groll (1959) on optimal group sizes and from Kautz and Singleton (1964) on disjunct matrix constructions for non-adaptive testing, linking the approach to coding theory. Group testing encompasses two primary paradigms: adaptive testing, where test outcomes inform subsequent pool selections (e.g., binary splitting algorithms that halve pools iteratively, achieving near-optimal performance with roughly k \log_2(n/k) tests in the noiseless case), and non-adaptive testing, where all pools are predefined and tested in parallel (e.g., using random Bernoulli designs or explicit constructions like superimposed codes, with information-theoretic lower bounds of approximately k \log_2(n/k) tests for exact recovery). Theoretical analysis reveals that the minimal number of tests T satisfies T \geq k \log_2(n/k) from counting arguments, with adaptive methods attaining capacity 1 (full efficiency) in the sparse regime, while non-adaptive approaches approach rates up to about 0.693 for certain designs. Beyond its origins in medical screening, group testing has found wide applications in for detecting faulty components, such as DNA library screening and sequencing, for multi-access channels and error-correcting codes, in sparse signal recovery and compressive sensing, and for topology inference and fault detection, including pooled testing for COVID-19. Recent advancements address noisy tests (e.g., with error rates modeled via binary symmetric channels, reducing capacity to $1 - h(\rho) where h is the binary entropy function) and practical constraints like partial recovery or heterogeneous defect probabilities, underscoring its versatility in resource-limited scenarios.

Fundamentals

Core Concepts and Terminology

Group testing is a statistical and combinatorial method for identifying a small number of defective items among a large population of n items by performing tests on subsets, or pools, of these items, rather than testing each one individually. The core objective is to determine up to d defective items efficiently, where d is typically much smaller than n, such as in scenarios involving rare defects. This approach leverages the binary nature of tests to minimize resource use while ensuring accurate identification. In group testing, items are classified as either defective (also called positive) or non-defective (negative). Defective items possess a specific property of interest, such as being contaminated or faulty, and their presence in a tested pool triggers a detectable response; non-defective items lack this property and do not affect the test outcome. Each test on a pool yields a binary outcome: positive if the pool contains at least one defective item, and negative if all items in the pool are non-defective. This qualitative model assumes reliable tests without errors, focusing on the presence or absence of defectives. The efficiency of group testing is measured by the number of tests required to identify all defectives, which is significantly lower than the baseline of n tests needed for individual testing. For instance, when d is small relative to n, the expected number of tests can approach O(d log n), providing substantial savings over exhaustive individual checks. A simple illustration of pooling occurs in medical screening, where blood samples from multiple individuals are combined into a single test for diseases like syphilis; a negative result clears the entire group, while a positive prompts targeted follow-up tests. Group testing strategies may be adaptive, with test designs depending on prior outcomes, or non-adaptive, performing all tests in parallel.

Problem Classifications

Group testing problems are broadly classified into adaptive and non-adaptive variants based on the timing and dependency of tests. In adaptive group testing, test outcomes from previous pools inform the design of subsequent tests, allowing for sequential refinement and potentially fewer total tests in scenarios with feedback availability. This approach contrasts with non-adaptive group testing, where all tests are predefined and can be conducted in parallel, making it suitable for high-throughput applications like large-scale screening but often requiring more tests overall. Problems are further distinguished by their identification objectives, particularly exact versus approximate recovery. Exact identification requires precisely determining the set of all defective items among the population, ensuring no false positives or negatives in the final output. Approximate or threshold variants, however, relax this to goals such as identifying at least one defective, estimating the number of defectives up to a threshold, or tolerating a bounded error rate, which can significantly reduce the testing complexity in sparse regimes. Classifications also arise from assumptions about the number of defectives, denoted as d or k. When d is known exactly, algorithms can be tailored to that fixed count, optimizing for scenarios like exactly k defectives. In contrast, problems with unknown or only bounded d (e.g., d \leq k) demand robust methods that either estimate d or perform without prior knowledge, often at a slight efficiency cost. Additional variations account for complications in test outcomes. The inhibitor model introduces a third class of items—inhibitors—that can mask the presence of defectives in a pool, yielding a negative result even if defectives are present, as first formalized in molecular biology contexts. Noisy multi-defective settings extend the standard model by incorporating errors, such as false positives or negatives, where multiple defectives may interact adversarially or probabilistically in the outcome. Extensions to quantitative group testing differ from the classical qualitative framework by producing non-binary outcomes, such as the exact count or concentration of defectives in a pool, enabling more informative measurements for recovery. Qualitative group testing, the foundational binary-positive/negative model, focuses solely on presence or absence, prioritizing simplicity in detection.

Historical Development

Origins and Early Applications

Group testing was first proposed by economist Robert Dorfman in 1943 as a method to efficiently screen large populations for syphilis among U.S. Army inductees during World War II, where the prevalence of the disease was low, making individual testing inefficient and costly. The approach addressed the need to test millions of recruits with limited laboratory resources, leveraging the rarity of positives to minimize the total number of serological tests required. Early applications focused on military and public health contexts, particularly pooling blood samples to detect syphilitic antigen via the . This pooling strategy was implemented successfully by the U.S. to screen draftees, reducing the burden on testing facilities while maintaining detection accuracy for defective (infected) individuals in large groups. Dorfman's two-stage model divided the population into groups of equal size, testing each pooled sample first; negative pools were cleared, while positive pools underwent individual retesting to identify defectives. The optimal group size was derived to minimize expected tests, approximately the square root of the population size divided by the defect probability for low prevalence. This method achieved substantial savings, with up to 90% reduction in tests when defect rates were around 1%. Initial progress in the 1940s and 1950s included refinements to Dorfman's procedure, such as A. Sterrett's 1957 modification, which improved efficiency by stopping individual testing in a positive group upon finding the first defective, thus addressing cases with multiple defectives more effectively. These developments emphasized practical optimizations for group sizes under varying prevalence rates in screening scenarios.

Emergence of Combinatorial Methods

The development of combinatorial methods in group testing during the 1950s and 1960s marked a significant theoretical shift, drawing inspiration from coding theory to address non-adaptive testing scenarios. This era saw the formalization of non-adaptive schemes, where all tests are predetermined, allowing for parallel execution and applications beyond medical screening. Central to these advancements were the introduction of superimposed codes and disjunct matrices as core combinatorial tools for non-adaptive group testing. Superimposed codes are binary codes where the logical OR of any collection of codewords can be uniquely decoded to identify the constituent codewords, facilitating the detection of multiple defectives through pooled tests. Disjunct matrices, binary matrices where the union of the supports of any set of at most d columns does not contain the support of any other column, ensure that up to d defectives can be uniquely identified from test outcomes. These structures provided rigorous guarantees for exact identification in predefined test pools, transforming group testing into a branch of design theory. A pivotal contribution came from Alfréd Rényi's 1961 work on separating systems, which established existence results for families of sets capable of distinguishing arbitrary subsets of elements—essential for isolating defectives in combinatorial designs. Such systems enabled the construction of test matrices that separate potential defective sets, laying the groundwork for efficient non-adaptive protocols. Building briefly on early pooling methods from wartime applications, these combinatorial innovations allowed for scalable, exact identification strategies that prioritized theoretical optimality over probabilistic assumptions. A notable milestone in the 1960s was the adaptation of these methods to computer science, particularly fault detection in digital systems, where disjunct matrices helped pinpoint erroneous components in networks and processors with minimal queries.

Modern Probabilistic and Algorithmic Advances

The rise of probabilistic group testing in the 1980s introduced models that assume defective items follow a random distribution, such as i.i.d. Bernoulli priors, shifting focus from worst-case combinatorial guarantees to expected performance metrics like the average number of tests required. This approach incorporated random defective sets to analyze efficiency under uncertainty, enabling bounds on expected test counts in sparse regimes where the number of defectives k is constant relative to population size n. Seminal reviews of Russian literature from this era highlighted nonadaptive strategies with small error probabilities, optimizing designs like Bernoulli matrices with probability p = \nu / k to achieve near-optimal expected tests T = O(k \log n). These developments extended earlier deterministic methods by emphasizing probabilistic decoding, improving scalability for large populations. A key contribution bridging adaptive and combinatorial paradigms was F.K. Hwang's work in the late 1970s and 1980s, particularly the generalized binary splitting algorithm, which achieves an information-theoretic rate of 1 for sparsity levels \alpha \in [0, 1), requiring approximately T = k \log_2(n/k) + O(k) tests in expectation. This algorithm adaptively refines pools based on probabilistic outcomes, providing a foundation for hybrid strategies that balance computational efficiency and error rates, influencing subsequent extensions for random defectives. In the 1990s and 2000s, advances in algorithmic decoders emphasized computational complexity, with the algorithm emerging as a landmark nonadaptive method. Introduced in 2011 but building on complexity studies from the prior decade, COMP decodes sparse signals from noisy measurements using a greedy pursuit akin to sparse recovery, achieving a nonzero rate of R_\text{COMP} = 0.531(1 - \alpha) for sparsity \alpha \leq 0.5, with T = O(k \log n) tests. This work, driven by analyses of decoding tractability, marked a shift toward practical, polynomial-time algorithms for large-scale problems. The 2000s saw group testing linked to compressed sensing, framing defective identification as sparse binary recovery from linear measurements, where test matrices enable reconstruction via \ell_1-minimization or greedy methods. This connection, formalized in works like Boolean compressed sensing, allowed group testing to leverage sensing guarantees for noisy, nonadaptive settings, achieving recovery with O(k \log(n/k)) measurements under restricted isometry properties. Recent trends up to 2025 integrate machine learning for dynamic testing and big-data scalability, using techniques like to model heterogeneous risks and optimize pool formation in real-time. For instance, ML-driven strategies employ correlated risk predictions to enhance frequent testing efficiency, reducing expected tests by up to 50% in simulated large populations while adapting to evolving defectives. These approaches, often combining with traditional decoders, enable handling of non-i.i.d. priors and massive datasets, as seen in probabilistic analyses achieving near-optimal rates via learned priors.

Mathematical Foundations

Formal Problem Models

In group testing, the standard combinatorial model considers a set of n items, labeled by the index set = \{1, 2, \dots, n\}, among which there are at most d defective items, where d \ll n. The defective items form a subset D \subseteq with |D| \leq d, and the goal is to identify D exactly using the minimum number of tests. Each test is defined by a subset S \subseteq , and the outcome of the test is binary: positive (1) if S intersects the defective set (i.e., S \cap D \neq \emptyset), and negative (0) otherwise. This setup corresponds to disjunctive testing, where the test outcome reflects the union coverage of defectives in the pool. The indicator vector x \in \{0,1\}^n represents the defectives, with x_i = 1 if item i \in D and x_i = 0 otherwise. Group testing procedures are classified as adaptive or non-adaptive based on how tests are selected. In the adaptive model, tests are performed sequentially, with each subsequent test subset chosen based on the outcomes of previous tests. This allows for dynamic refinement of the search space. In contrast, the non-adaptive model requires all tests to be predefined in advance, represented by a fixed testing matrix A \in \{0,1\}^{m \times n}, where m is the number of tests, and A_{j,i} = 1 if item i is included in test j. The outcome vector y \in \{0,1\}^m is then given by the disjunctive (Boolean OR) product y = A \circ x, where y_j = 1 if at least one defective is in test j, and 0 otherwise; exact recovery involves solving for D (or equivalently x) from y. These models assume noise-free qualitative tests, meaning outcomes are perfectly reliable and binary, with no false positives or negatives. Extensions to noisy settings incorporate error probabilities (e.g., false negatives or positives with fixed rates), while quantitative models allow test outcomes to reflect the exact number of defectives in a pool rather than just their presence.

Information-Theoretic Bounds

In group testing, information-theoretic bounds provide fundamental lower limits on the number of tests t required to identify up to d defectives among n items in the noiseless setting. These bounds arise from the principle that each binary test outcome yields at most 1 bit of information, and the testing procedure must distinguish among all possible defective configurations, including the case of zero defectives. Specifically, there are \sum_{k=0}^d \binom{n}{k} possible outcomes to distinguish, leading to the lower bound t \geq \log_2 \left( \sum_{k=0}^d \binom{n}{k} \right). For large n and fixed d, the sum is dominated by the largest term, so \sum_{k=0}^d \binom{n}{k} \approx \binom{n}{d}, and using , \log_2 \binom{n}{d} \approx d \log_2 (n/d). This yields the asymptotic bound t = \Omega(d \log (n/d)), which holds for both adaptive and non-adaptive protocols in the noiseless case. The derivation follows from applied to the mutual information between the defective set and the test outcomes, ensuring that the entropy of the possible configurations cannot be reduced below zero without sufficient tests. These bounds are tight in the adaptive setting, where algorithms such as binary splitting achieve performance within an additive O(d) term of the lower bound for small d, attaining an information-theoretic rate of 1. In contrast, non-adaptive methods, while also subject to the same \Omega(d \log (n/d)) lower bound, require a multiplicative constant greater than 1 in practice, with achievable rates up to approximately 0.693 for certain sparse regimes due to the fixed testing matrix structure.

Combinatorial Bounds and Constructions

In combinatorial group testing, a key concept for ensuring guaranteed recovery of up to d defectives among n items is the d-disjunct matrix, defined as a t \times n binary matrix where the union (Boolean sum) of any d columns does not contain another column. This property implies that the supports of the columns corresponding to defectives are sufficiently separated, allowing unique identification without ambiguity from up to d false positives. Seminal work established that such matrices provide a combinatorial guarantee for non-adaptive testing, contrasting with probabilistic methods by offering worst-case determinacy. Combinatorial lower bounds on the minimal number of tests t for a d-disjunct matrix yield the asymptotic bound t = \Omega\left( \frac{d^2 \log n}{\log d} \right), reflecting the need to distinguish \binom{n}{d} possible defective sets while avoiding overlaps in unions of d columns. These combinatorial bounds exceed the information-theoretic baseline of \Omega(d \log (n/d)) in the non-adaptive setting, highlighting the overhead for deterministic guarantees. Upper bounds on t are achieved through explicit and random constructions of d-disjunct matrices, with t = O(d^2 \log n) tests sufficing for recovery. Random constructions, such as Bernoulli sampling with appropriate probabilities, yield d-disjunct matrices with high probability using t \approx d^2 \log n / \log d, leveraging the to bound union overlaps. Explicit constructions draw from , introduced as nonrandom binary codes where no codeword is contained in the superposition (union) of d others, enabling d-disjunct matrices via disjunctive code properties. The , using concatenated constant-weight codes, achieves t = O(d^2 \log_d n), foundational for scalable non-adaptive schemes. Further constructions employ separating hash families, which generalize hash functions to ensure that for any d+1 distinct items, there exists a hash (test) separating one from the others, yielding d-disjunct matrices with t = O(d^2 \log n). These families provide efficient explicit builds via finite geometries or algebraic methods, improving over random sampling in computational efficiency. A related key property is that of (d, t)-separating systems, binary matrices where for any set of d columns and any other column, there is a row distinguishing the single column from the union of the d, ensuring identification even under bounded errors. Such systems underpin advanced disjunct constructions, with minimal t scaling as O(d \log n + t \log d), facilitating robust group testing variants.

Adaptive Algorithms

Binary Splitting

Binary splitting is an adaptive group testing algorithm that identifies defective items among a population of n items by iteratively dividing the search space in half and testing subsets to narrow down the location of defectives. Introduced by Sobel and Groll in their seminal 1959 paper, the method assumes binary test outcomes (positive if at least one defective is present, negative otherwise) and proceeds sequentially based on results. For the case of exactly one known defective (d=1), the procedure begins with the full set of n items. A subset of size \lfloor n/2 \rfloor is formed and tested. If the test is positive, the algorithm recurses on this subset; if negative, it recurses on the remaining items. This halving continues until a singleton set is reached, confirming the defective item. No initial test of the full set is needed if d=1 is known, as the presence is assumed. The expected number of tests required is \log_2 n, matching the information-theoretic lower bound of \log_2 n bits needed to identify one defective among n items. In the worst case, the algorithm requires \lceil \log_2 n \rceil tests. To extend binary splitting to multiple defectives (d > 1), the algorithm adjusts the splitting ratio based on an estimate of the number of defectives in the current group, often using a 1:2 or similar ratio instead of strict halving to optimize for higher defect concentrations. The process identifies one defective at a time: after locating and removing a defective via halving, the algorithm restarts on the remaining items, updating the defective count estimate as needed. This sequential approach achieves an average of approximately d \log_2 (n/d) + d tests, which is near-optimal and within a small constant factor of the for moderate d. The worst-case number of tests remains up to n, occurring if defectives are sparse and many singletons must be verified. A simple example illustrates the procedure for d=1 and n=8 items labeled 1 through 8. First, test items 1-4: if positive, recurse on 1-4 (test 1-2 next); if negative, recurse on 5-8. Continuing halvings (e.g., positive on 1-2 leads to testing item 1), the defective is identified in exactly 3 tests, as \log_2 8 = 3. This deterministic path highlights the efficiency for power-of-two sizes, though rounding applies for arbitrary n.

Generalized Binary Splitting

Generalized binary splitting extends the basic binary splitting by adjusting sizes to account for the expected number of defectives in the current group, enabling more efficient identification of multiple defectives. In this approach, a group of size s is not simply divided into halves but instead involves testing a carefully chosen of size k, with the remaining s - k items handled recursively based on the test outcome. This adjustment optimizes the procedure for scenarios where the number of defectives d > 1, reducing unnecessary tests compared to uniform halving. Hwang's generalization, introduced in 1972, determines optimal split ratios using probability distributions to minimize the expected or worst-case number of tests. For a group of size s with an upper bound d on the number of defectives (or expected d under a ), the size k is selected as the largest power of 2 such that k \leq (s - d + 1)/d, ensuring the tested is likely to contain approximately one defective while balancing depths. In probabilistic settings with defective probability p per item, the procedure uses outcomes to guide split sizes. The full procedure proceeds as follows: Test the subgroup of size k. If positive, apply binary splitting to isolate one defective (requiring approximately \log_2 k additional tests), remove the identified defective and confirmed non-defectives, and recurse on the remaining items with updated d - 1. If negative, discard the subgroup and recurse on the remaining s - k items with unchanged d. This continues until all defectives are identified or the group is confirmed defect-free. When d is unknown, an upper bound m (e.g., m_\beta such that the probability of at most m defectives is at least \beta, like 0.99) is used to select splits conservatively. In terms of performance, achieves the information-theoretic lower bound asymptotically, requiring approximately t \approx d \log_2 (n/d) + O(d) tests for n items and d defectives, where the O(d) term captures overhead from isolations. This is within a constant factor of the bound d \log_2 (n/d) + d \log_2 e, and the worst-case number of tests exceeds the combinatorial lower bound by at most d - 1. Compared to basic splitting, which effectively requires closer to d \log_2 n tests for multiple defectives, generalized binary splitting reduces the total by 20-30% when d > 1, particularly in moderate d regimes, by better distributing the search effort.

Other Adaptive Strategies

In generalized group testing, adaptive strategies extend beyond binary outcomes by forming test groups of size greater than two, selected to maximize information gain about the defective items' identities. These methods model the test outcome as a probabilistic of the number of defectives in the , allowing for more flexible pooling that balances test efficiency and diagnostic accuracy. For instance, the Adaptive Bayesian Group Testing (ABGT) designs each subsequent test by choosing a measurement vector that maximizes the expected update in posterior probabilities, effectively prioritizing groups that yield the highest reduction. Sequential algorithms in adaptive group testing iteratively refine item statuses using partial test results to update and re-pool suspects, contrasting with one-shot non-adaptive approaches. A prominent example is the Bayesian sequential experimental design for noisy group testing, which treats the identification problem as an task and selects pools to minimize the expected posterior after each observation. This process continues until the falls below a , enabling efficient recovery even when tests have error rates up to 0.5. Such sequential updates ensure that subsequent tests focus on unresolved items, reducing the total number of tests required compared to static designs. Probabilistic adaptive testing leverages to dynamically choose the next test that maximizes entropy reduction, incorporating prior beliefs about defectives and updating posteriors based on outcomes. In this framework, the utility of a potential group is quantified by the between the test result and the latent defective set, guiding the selection of pools that most sharply narrow the over possible configurations. These methods are particularly effective in sparse regimes, where the number of defectives is much smaller than the , as they adaptively target high-uncertainty regions. For example, the posterior after a test on group G_t is updated via Bayes' rule, with the next group G_{t+1} chosen to optimize I(X; Y_{t+1} \mid Y_{1:t}), where X is the defective set and Y the observations. A practical illustration of multi-stage adaptive strategies is the extension of Dorfman's original two-stage procedure to additional rounds with re-pooling, where positive pools from prior stages are subdivided and retested based on intermediate results. In multi-stage Dorfman testing, initial large pools (e.g., size ) are screened, followed by adaptive subdivision of positives into smaller groups, repeating until individual statuses are resolved. Simulations show that three-stage schemes can test up to seven times more individuals than individual testing with the same test budget, assuming low (e.g., 1-5% defectives), while maintaining high . These alternative adaptive strategies often combine elements of matrix-based designs and probabilistic selection, achieving performance near the information-theoretic lower bounds from combinatorial analysis—typically requiring O(d \log n) tests for d defectives in a of size n—but at the cost of increased computational overhead for posterior updates and group optimization.

Non-Adaptive Algorithms

Matrix-Based Representations

In non-adaptive group testing, the testing procedure is represented by an m \times n A, where m denotes the number of tests and n the number of items to be screened. Each entry a_{ij} = 1 if item j is included in test i, and $0 otherwise; the rows of A thus specify the composition of each pooled test. This matrix formulation encapsulates the entire testing strategy upfront, enabling all tests to be prepared and executed simultaneously without sequential dependencies. Given a set of at most d defectives among the n items, the outcome of the tests is captured by a binary outcome vector y \in \{0,1\}^m, where y = A x and x \in \{0,1\}^n is the indicator vector of defectives (with \|x\|_0 \leq d). In the standard disjunctive model of group testing, the i-th entry y_i is the Boolean OR of the entries \{x_j : a_{ij} = 1\}, yielding y_i = 1 if at least one defective is present in test i, and $0 otherwise. An alternative quantitative model uses modular arithmetic, such as y_i = \sum_j a_{ij} x_j \mod 2, which aligns with certain coding-theoretic interpretations but is less common in classical combinatorial settings. For unique identification of the defective set from y, the matrix A must ensure that distinct possible indicator vectors x and x' (each with at most d ones) produce distinct outcomes y and y'. This property is formalized by requiring A to be d-separable: the unions of any two distinct d-subsets of columns must differ in at least one row. A stronger condition is that A be d-disjunct, meaning no column is contained in the union of any d other columns; this not only guarantees separability but also enables efficient decoding by ensuring that non-defective items do not mimic defectives in positive tests. The d-disjunct property was introduced in the context of superimposed codes for non-adaptive identification. The non-adaptive matrix representation offers significant parallelization advantages, as all m tests can be conducted concurrently, which is particularly beneficial under physical or temporal constraints such as limited laboratory throughput or communication limitations in multiaccess channels. Combinatorial bounds ensure that suitable d-disjunct or d-separable matrices exist with m = O(d^2 \log n) rows, balancing identification reliability with testing efficiency. A simple example illustrates the matrix for d=1 (identifying a single defective) with n=7 items using m=3 tests: the columns of A are the seven nonzero vectors in \{0,1\}^3, ensuring each possible defective produces a unique nonzero outcome vector y. This construction, akin to a with distinct codewords, guarantees identification since no two columns coincide.
Test/Item1234567
10001111
20110011
31010101

Decoding Algorithms

In non-adaptive group testing, decoding algorithms aim to reconstruct the set of defective items from the binary outcomes of pooled tests, represented as y = M x where M is the T × n , x is the n-dimensional 0-1 indicating defectives (with ||x||_0 = ≤ d), and y is the T-dimensional outcome under the combinatorial (OR) model. These algorithms exploit properties of the testing matrix, such as disjunctness, to ensure reliable recovery in time. Two core greedy methods, Combinatorial Orthogonal (COMP) and Definite Defectives (DD), provide guarantees for exact recovery when the matrix satisfies appropriate combinatorial conditions, with COMP offering broader applicability at the cost of potential false positives. Combinatorial Orthogonal Matching Pursuit (COMP) is a greedy decoding algorithm that iteratively selects columns of the testing matrix most aligned with the observed test outcomes to approximate the support of x. Introduced as a combinatorial analog to orthogonal matching pursuit from compressed sensing, COMP proceeds as follows: initialize an empty estimated support set S and residual r = y; for up to d iterations, identify the column j that maximizes the correlation |M_j^T r| (the number of positions where the column and residual agree on positive outcomes), add j to S, and update the residual by setting r_i = 0 for tests i covered by the selected columns in S (simulating the effect of confirmed defectives); output S as the estimated defectives. This process ensures no false negatives by conservatively including candidates that match positive tests, though it may include false positives if non-defectives are masked by defectives. COMP guarantees exact recovery if the testing matrix is d-disjunct, meaning no column is contained in the union of any d other columns, a property that holds for matrices with O(d² log n) rows. Definite Defectives (DD) is another greedy algorithm that focuses on identifying items definitively responsible for positive tests, outputting the minimal set consistent with the outcomes. The procedure identifies possible defectives (PD) as items not appearing in any negative test (similar to COMP's initial step), then declares an item i in PD as a definite defective if there exists a positive test t containing only i from PD; all such definite defectives form the output set, with remaining PD items deemed non-defective. This approach avoids false positives by requiring unambiguous evidence but may miss defectives if no isolating tests exist. DD guarantees exact recovery for strongly disjunct matrices, where no column is contained in the union of any d other columns and the supports are sufficiently separated, a stricter condition than d-disjunctness that demands more rows for construction, often approaching O(d² log n / log d). Both COMP and DD are , polynomial-time algorithms with O(T n d) dominated by computations, making them efficient for large n. COMP is more robust to noise, succeeding with positive rates under symmetric or asymmetric error models using O(d log(n/d)) tests when paired with robust designs, whereas DD requires stronger matrix properties and performs less reliably in noisy settings without adaptations. These methods underpin many practical non-adaptive schemes, trading off recovery guarantees with computational simplicity.

Advanced Constructions and Variants

One advanced construction for non-adaptive group testing is the algorithm, which generates explicit d-disjunct testing matrices using evaluations of polynomials over . In this approach, n items are arranged in a d-dimensional grid over a \mathbb{F}_q where q is a and n \approx q^d; pools are then formed by selecting items whose coordinates satisfy equations derived from polynomials of at most d-1, such as y = a x^{d-1} + \cdots + b for coordinates (x, y), ensuring the matrix's rows correspond to these polynomial-based subsets. This method guarantees exact recovery of up to d defectives through combinatorial decoding, with the number of tests scaling as t = O(d^2 \log n). Another variant is Sequential COMP (SCOMP), a decoding procedure for fixed non-adaptive testing matrices that extends the COMP by iteratively building a satisfying set of defectives in a greedy manner, starting from definite defectives and adding items consistent with all test outcomes. Unlike standard COMP, which declares an item defective if it appears in all positive pools, SCOMP sequentially tests candidate additions to minimize the set while preserving consistency, thereby reducing false positives and achieving higher decoding success probabilities in sparse regimes. For random matrices with m tests, SCOMP succeeds with high probability when the number of defectives k satisfies k = O\left(\frac{m}{\log m}\right), approaching information-theoretic limits more closely than COMP. Additional variants incorporate to learn optimized testing matrices tailored to data-driven priors, enabling approximate recovery where a small fraction of misclassifications is tolerated. For instance, models can predict item positivity probabilities from features like symptoms or demographics, informing the of pooling matrices that group low-risk items while isolating high-risk ones, reducing the expected number of tests compared to uniform pooling at low . Random coding schemes, generating matrices with i.i.d. entries, further support approximate recovery by leveraging concentration bounds, succeeding when m = \Theta(k \log(n/k)) for k defectives, often decoded via efficient local search or algorithms.

Applications

Communication and Multiaccess Channels

Group testing has found significant applications in , particularly in identifying within multiaccess channels where multiple transmitters share a common medium. In these scenarios, active users correspond to "defectives," and pooled signaling—where signals from multiple users are combined and tested collectively—enables efficient detection of collisions or active transmissions without requiring individual channel assignments. This approach is especially valuable in sparse regimes, where only a small fraction of potential users are active, reducing the overhead associated with compared to exhaustive probing. The connection between group testing and communication channels traces back to the 1970s, when researchers began linking combinatorial group testing to channel coding bounds and superimposed codes. Early work established that non-adaptive group testing designs could model the decoding of superimposed signals in multiaccess environments, providing bounds on the minimal number of tests needed to identify up to k active users among n potential ones. For instance, Malyutov's contributions in the late 1970s demonstrated how information-theoretic limits from could inform group testing efficiency in noisy multiaccess settings. In -like protocols, such as slotted ALOHA, group testing addresses collision resolution by treating colliding transmissions as pooled tests that reveal the presence of multiple active users. Non-adaptive testing schemes allow users to transmit signature sequences in a superimposed manner, enabling the receiver to decode the set of active participants from the aggregate signal. A representative example is the use of superimposed codes to detect and identify colliding users in wireless networks, where each user broadcasts a unique binary codeword, and the receiver observes the Boolean OR of the active codewords to reconstruct the user set. This method outperforms traditional retransmission-based collision avoidance in low-activity scenarios by minimizing the number of slots wasted on unresolved collisions. Superimposed codes, introduced by Kautz and Singleton in 1964, form the foundational structure for these applications, particularly in (CDMA) systems. These codes ensure that the (or superposition) of any small number of codewords can be uniquely decoded, mirroring disjunctive testing in group testing. In CDMA, they facilitate user identification by allowing multiple users to transmit over the same frequency band using orthogonal or near-orthogonal signatures, with the receiver employing decoding algorithms to separate active signals. et al. further solidified this link in 1984, showing how group testing algorithms enhance random multiple-access efficiency by optimizing the trade-off between test and probability. The primary benefit of group testing in these contexts is the substantial reduction in identification overhead for sparse user scenarios, where the number of active users k is much smaller than the total n, achieving test complexities on the order of O(k \log n) rather than O(n). This efficiency is realized through matrix-based non-adaptive representations, where rows correspond to test outcomes and columns to user signatures. Such designs have influenced modern protocols in and massive machine-type communications, ensuring robust performance under channel noise models like addition or .

Biological and Diagnostic Testing

Group testing originated in biological applications with Robert Dorfman's 1943 proposal to pool blood samples from military recruits for screening, reducing the number of tests needed in low- scenarios by first testing groups and retesting individuals only if the pool is positive. This Dorfman-style adaptive pooling, involving group sizes optimized for (typically 5-20 samples), achieved up to 90% gains over individual testing when rates were below 1%. The method was later extended to screening in blood donations, where pooled nucleic acid testing (NAT) in minipools of up to 16 samples became standard in low- settings to detect viral while minimizing costs and supply use, as per FDA approvals. During the , non-adaptive group testing saw widespread adoption through FDA-authorized multiplex assays for detection, enabling pooled screening of 5 to 384 samples in formats like 96- or 384-well plates to scale in asymptomatic populations. For instance, the TaqPath Pooling Kit allowed qualitative RT-PCR on pools of up to 5 nasopharyngeal swabs, with positive pools followed by individual retesting, conserving reagents amid global shortages and testing over millions in low-prevalence environments like universities and workplaces. These designs maintained high specificity (typically >99%, meeting FDA criteria of ≥95% for pooled testing) but required careful pool sizing to avoid false negatives from dilution. Quantitative extensions of group testing incorporate measurements, using Dorfman pooling or matrix-based to estimate individual concentrations from pool signals via algorithms that solve linear systems from overlapping pools. In HIV monitoring, marker-assisted mini-pooling with reduced assays by 40-50% in resource-limited settings by leveraging counts or prior data to resolve positives without full retesting. Key challenges in biological pooled testing include sample contamination during mixing, which can introduce false positives, and reduced in low-prevalence settings due to signal dilution, where pool sizes exceeding 10 may drop detection rates below 90% for low-viral-load cases. Adaptive retesting of positives addresses some issues but increases compared to non-adaptive formats. By 2025, CRISPR-based pooled diagnostics advanced for multiple pathogens, using Cas12a/Cas13a systems with bead arrays to detect and differentiate viruses like , , and in pooled samples at attomolar sensitivity, enabling simultaneous screening without amplification in point-of-care settings.

and

Group testing provides a foundational framework for sparse signal recovery in , where the goal is to identify a small number of defective items (or non-zero entries) from a large set using efficient measurements. In this context, group tests correspond to linear measurements applied to \ell_0-sparse vectors, which have at most d non-zero entries out of n possible positions, representing the sparse signal. The measurement matrix, often , aggregates subsets of items, and the outcomes reveal information about the support of the sparse vector. Recovery is achieved through decoders like \ell_1 minimization, which solves an to reconstruct the exact support under conditions such as the , adapting classical techniques to the , combinatorial setting of group testing. In applications, group testing facilitates in large datasets by treating outliers or defective data points as sparse anomalies within a high-dimensional space. Non-adaptive queries, performed in a single batch, enable efficient identification of rare events, such as unusual patterns in or hierarchical structures, where tests on subsets help localize anomalies with minimal queries. For instance, in data stream monitoring, group testing algorithms like confidence-bound random walks on tree hierarchies detect targets (anomalies) with false alarm rates below 1/2, achieving near-optimal of O(\log n + \log(1/\epsilon)). Similarly, group testing inspires parallel in tasks, where defective features are identified via pooled evaluations to reduce dimensionality while preserving . Integration with enhances group testing through neural networks, particularly for adaptive test selection and decoding in high-dimensional settings. models can approximate optimal decoders for sparse , unfolding iterative algorithms like approximate into network layers to directly map measurements to the sparse support, outperforming traditional \ell_1 methods in speed and accuracy for tasks. In adaptive scenarios, or neural policies select subsequent tests based on prior outcomes, optimizing for sparse in dynamic environments. A key example arises in genome-wide association studies (GWAS), where DNA pooling treats single nucleotide polymorphisms (SNPs) as potential "defectives" associated with traits; by pooling samples from responder and non-responder groups, group testing identifies risk variants like rs4910623 in the OR52B4 linked to response in age-related , reducing costs while validating associations. These approaches achieve information-theoretically optimal , requiring O(d \log(n/d)) measurements for exact of d defectives in non-adaptive settings, matching the lower bounds derived from combinatorial and information-theoretic analyses.

Data Forensics and Security

In forensics, group testing enables the efficient of tampered files or pixels in images by pooling cryptographic checks on subsets of blocks. This approach leverages combinatorial group testing (CGT) constructions, such as shifted transversal designs, to compute and store a reduced set of values for large datasets like hard disk sectors, significantly lowering requirements while pinpointing defective (altered) elements. For instance, on a 55 GB disk with up to one tampered sector, this method uses approximately 0.4 of for hashes and identifies modifications with guaranteed comparable to brute-force hashing, but in execution times under 17,000 seconds. In security applications, group testing facilitates key predistribution in wireless sensor networks by modeling compromised as defectives within a combinatorial framework. Transversal designs and resolvable balanced incomplete block designs are employed to pre-load keys into , ensuring network and resilience against node capture; if up to t are compromised, the scheme isolates them as defectives using group tests on key overlaps, evicting threats without disrupting the overall topology. This deterministic approach achieves optimal local (e.g., λ for each ) while minimizing key storage per to O(√(n log n)) for n . A representative example of non-adaptive group testing in this domain is its use for software bug localization in codebases, where error locating arrays derived from covering arrays test subsets of code components to identify faulty interactions without sequential feedback. In binary-alphabet systems (pass/fail outcomes), non-adaptive constructions guarantee localization of up to d=2 bugs among k components using O(d^2 log k) tests, covering all t-way interactions and outperforming prior disjunct matrix methods in efficiency for large k (e.g., k ≥ 1,000). Decoding algorithms then extract the defective set from the test matrix outcomes, enabling parallel execution suitable for large-scale . Key challenges arise in adversarial noise models, where defectives (anomalies) can actively hide by inducing false test outcomes, limiting the fraction of tolerable errors to O(1/d) for reliable . Under such models, where up to a constant fraction δ < 1 of measurements may be adversarially flipped (e.g., O(m/d) false negatives), exact defective identification becomes impossible, necessitating approximate recovery schemes that include the true support but allow O(d) false positives. Constructions using randomness extractors achieve this with m = O(d log n) non-adaptive tests, but the hiding behavior increases the minimal test complexity by factors depending on noise bounds. Recent work in 2023 has extended group testing to blockchain auditing, applying adaptive variants to detect transaction anomalies via malicious node identification in sharded permissioned networks. The RecAGT scheme uses shard testable codes and adaptive group testing to isolate up to f malicious nodes among m shards with T ≤ log(1 - P(H=0))/ρ + 2√((n - m - 1)f) tests, reducing communication overhead from O(n²b) to O(b) bits per transaction block and enabling efficient auditing of tampered or withheld transactions. This addresses adversarial settings in blockchain by verifying node identities through digital signatures before pooling tests, with simulations confirming low false positive rates and scalability for enterprise deployments.

References

  1. [1]
  2. [2]
    The Detection of Defective Members of Large Populations
    The Detection of Defective Members of Large Populations. Robert Dorfman. DOWNLOAD PDF + SAVE TO MY LIBRARY. Ann. Math. Statist. 14(4): 436-440 (December, 1943).Missing: group testing
  3. [3]
    None
    Below is a merged summary of the group testing paper (arXiv:1902.06002) that consolidates all information from the provided segments into a single, comprehensive response. To maximize detail and clarity, I will use a combination of narrative text and a table to organize key models, theoretical results, and applications efficiently. The response retains all unique details while avoiding redundancy where possible.
  4. [4]
    [1902.06002] Group Testing: An Information Theory Perspective - arXiv
    Feb 15, 2019 · The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items.
  5. [5]
    Regression analysis of group-tested current status data | Biometrika
    Group testing, which is also known as pooled testing, was introduced by Dorfman (1943) as a cost-effective strategy to screen World War II inductees for ...<|control11|><|separator|>
  6. [6]
    [PDF] Pooled Testing - DePauw University
    Pooled testing was used successfully by the United States military during WWII to test men for syphilis (Dorfman, 1943). The logic of pooled testing is ...Missing: Army | Show results with:Army
  7. [7]
    On the Detection of Defective Members of Large Populations - jstor
    detection. Rather than analyze each sample of a defective pool, it is proposed to make individual tests only until a defective is found.
  8. [8]
    Combinatorial Group Testing and Its Applications
    ### Summary of Chapter 1.1: The History of Group Testing
  9. [9]
    Superimposed Codes and Threshold Group Testing - SpringerLink
    We will discuss superimposed codes and non-adaptive group testing designs arising from the potentialities of compressed genotyping models in molecular ...
  10. [10]
    [1111.5003] Construction of Almost Disjunct Matrices for Group Testing
    Nov 21, 2011 · In this paper, we consider a relaxed definition of a disjunct matrix known as \emph{almost disjunct matrix}. This concept is also studied under the name of \ ...
  11. [11]
    [PDF] A Survey of and an Introduction to Fault Diagnosis
    Oct 1, 1972 · This report surveys the field of fault diagnosis and introduces the reader to some of the key algorithIns and heuristics currently in use.
  12. [12]
  13. [13]
  14. [14]
  15. [15]
    [0907.1061] Boolean Compressed Sensing and Noisy Group Testing
    Jul 6, 2009 · The key contribution of this paper is in adopting a new information-theoretic perspective on group testing problems. We formulate the group ...Missing: seminal 2000s
  16. [16]
    Gradient boosting for group testing - ScienceDirect.com
    Machine Learning with Applications. Available online 28 October 2025 ... nonparametric (Delaigle & Meister, 2011), and additive models (Liu et al., 2021) to allow for nonlinear effects with group testing data.
  17. [17]
    The Role of Frequent Testing, Correlated Risk, and Machine Learning
    Jul 2, 2020 · We conclude that frequent group testing, aided by machine learning, is a promising and inexpensive surveillance strategy. Download a PDF.
  18. [18]
  19. [19]
    [1708.03429] Nearly Optimal Sparse Group Testing - arXiv
    Aug 11, 2017 · For both scenarios we provide information-theoretic lower bounds on the number of tests required to guarantee high probability recovery. In both ...
  20. [20]
    [PDF] Improved combinatorial group testing algorithms for real-world ...
    Abstract. We study practically efficient methods for performing combinatorial group testing. We present efficient nonadaptive and two-stage combinatorial ...
  21. [21]
    [PDF] Strongly separable matrices for nonadaptive combinatorial group ...
    Oct 7, 2020 · 1) M is called a d-disjunct matrix, or briefly d-DM, if the Boolean sum of any d column vectors of M does not cover any other one. 2) M is ...
  22. [22]
    [PDF] Lecture 11: Group Testing Bounds 1 Lower Bound on ta(d, n)
    Feb 8, 2010 · For non-adaptive testing t(d, n) is used to denote the minimum number of non-adaptive tests needed for a pair (d, n). Non-adaptive tests are ...Missing: covering designs
  23. [23]
    [PDF] New bounds on the number of tests for disjunct matrices - arXiv
    Oct 25, 2016 · combinatorial group testing aims to identify all positive items using as ... see that a d-disjunct matrix gives a nonadaptive group testing.Missing: seminal | Show results with:seminal
  24. [24]
    [PDF] Group Testing Algorithms: Bounds and Simulations
    Dec 12, 2013 · In this paper we study four detection algorithms for group testing, which we explain fully in. Section 3: Combinatorial optimal matching pursuit ...
  25. [25]
    [PDF] Nearly Optimal Sparse Group Testing - Samson Zhou
    A matrix M is called d-disjunct if the union of any d columns does not contain any other column. It is well known [20] that a T × n binary d-disjunct matrix ...<|separator|>
  26. [26]
    [PDF] A Bound on the Size of Separating Hash Families
    Jul 31, 2007 · The paper provides an upper bound on the size of a (generalised) separating hash family, a notion introduced by Stinson, Wei and Chen. The upper ...
  27. [27]
    separable and d-disjunct matrices - ScienceDirect.com
    Mar 15, 2007 · Let M be 2 d -separable. Then there exists a d-disjunct matrix by adding at most one row to M. Proof. If M is d-disjunct, ...
  28. [28]
    A Method for Detecting all Defective Members in a Population by ...
    September 1972, Volume 67, Number 339. Theory and Methods Seclion. A Method for Detecting All Defective Members in a. Population by Group Testing. F. K. HWANG*.
  29. [29]
    [PDF] Group testing: An Information Theory Perspective
    Oct 1, 2019 · Russian literature in the 1970s and 80s, particularly for ... of the Kautz–Singleton construction in probabilistic group testing, 2018,.
  30. [30]
    Adaptive Bayesian group testing: Algorithms and performance
    Hwang [27] proposed Generalized Binary Splitting Algorithm (GBSA) to overcome this problem. In GBSA, measuring groups are chosen to ensure that the probability ...
  31. [31]
    [PDF] ADAPTIVE GROUP TESTING WITH MISMATCHED MODELS - OSTI
    Adaptive group testing updates the Bayesian models of in- fection status ... entropy indicates the improvement of infection detection during the adaptive group ...Missing: probabilistic reduction
  32. [32]
    Multi-Stage Group Testing Improves Efficiency of Large-Scale ...
    The data showed that group testing is more efficient than individual testing for prevalence rates under 30%.
  33. [33]
    [1306.6438] Group testing algorithms: bounds and simulations - arXiv
    Jun 27, 2013 · We describe four detection algorithms: the COMP algorithm of Chan et al.; two new algorithms, DD and SCOMP, which require stronger evidence to declare an item ...Missing: 2011 | Show results with:2011
  34. [34]
  35. [35]
    Noisy Non-Adaptive Group Testing: A (Near-)Definite Defectives ...
    Aug 28, 2018 · We introduce a class of algorithms that we refer to as Near-Definite Defectives (NDD), and study bounds on the required number of tests for ...
  36. [36]
  37. [37]
    BBB Research: Group Testing and Novel Designs | NICHD
    Oct 1, 2021 · The group testing strategy, first introduced by Dorfman (1943) to test for the syphilis antigen in U.S. army recruits, tested blood samples ...
  38. [38]
    Nucleic Acid Test Blood Components Reduce Risk Transmission ...
    May 8, 2024 · Use of nucleic acid tests on pooled and individual samples from donors of whole blood and blood components (including source plasma and source leukocytes)
  39. [39]
    Pooled testing with replication as a mass testing strategy for ... - Nature
    Feb 10, 2021 · Five sets of 384 samples, containing 1–5 positive carriers were screened and all positive carriers in each set of experiments were correctly ...
  40. [40]
    [PDF] TaqPath COVID-19 Pooling Kit - Instructions for Use - FDA
    May 26, 2021 · The TaqPath™ COVID-19 Pooling Kit includes the assays and controls for a multiplex real-time RT-PCR test for the qualitative detection of RNA ...
  41. [41]
    [PDF] Pooling and Serial Testing Amendment - Letter - FDA
    Apr 20, 2021 · The amendment allows tests to use pooled anterior nasal specimens from individuals without known COVID-19 in serial testing programs, ...
  42. [42]
    Improved HIV-1 Viral Load Monitoring Capacity using Pooled ... - PMC
    We developed mMPA (marker-assisted mini-pooling with algorithm), a new VL pooling deconvolution strategy that utilizes information from low-cost, routinely- ...
  43. [43]
    Factors affecting sensitivity and specificity of pooled-sample testing ...
    Pooled-sample testing (PST) has been proposed as a more cost-efficient alternative in screening for low prevalence diseases, where samples are tested in groups ...
  44. [44]
    Pooled Testing: Guidance from the CAP's Microbiology Committee
    Pooled testing is commonly employed for screening blood donors for a variety of transfusion-transmitted pathogens using nucleic acid amplification tests that ...Missing: example control
  45. [45]
    Bead-based approaches for increased sensitivity and multiplexing of ...
    Sep 22, 2025 · CRISPR-based diagnostics have emerged as a promising tool for fast, accurate and portable pathogen detection. There has been rapid progress ...
  46. [46]
    [PDF] Group Testing and Sparse Signal Recovery
    The purpose of this article is to examine the role of group testing [6] in streaming algorithms [12], compressive sensing [4], [2], and function learning/ ...
  47. [47]
  48. [48]
  49. [49]
    GWAS study using DNA pooling strategy identifies association of ...
    Nov 28, 2016 · Pooled DNA based GWAS to determine genetic association of SNPs with visual acuity (VA) outcome in anti-vascular endothelial growth factor ...Results · First Phase- Genome Wide... · Patient Recruitment And Data...
  50. [50]
    None
    ### Summary of Combinatorial Group Testing for Hard Disk Integrity Check and Tampered Data Detection
  51. [51]
    A combinatorial approach to key predistribution for distributed ...
    The GD designs are useful in group testing experiments, sampling and ... sensor nodes are compromised in the network. We found that CD-KPD and CD-RKPD ...
  52. [52]
    [PDF] Error Locating Arrays, Adaptive Software Testing, and Combinatorial ...
    Pairwise Group Testing . . . . . . . . . . . . . . . . . . . . . . . 94. 4.3 ... of experiments to software testing, In Proc. 27th Annual NASA Goddard ...
  53. [53]
    None
    ### Summary of Adversarial Noise Model in Group Testing and Challenges Where Defectives Can Hide
  54. [54]
    None
    **Summary of Adaptive Group Testing in Blockchain Sharding (arXiv:2311.02582, 2023)**