Group testing
Group testing is a statistical and combinatorial inference problem aimed at identifying a small number of defective or positive items, known as defectives, within a large population of n items by performing tests on carefully chosen subsets, or pools, of these items. Each test reveals whether at least one defective is present in the pool (a positive outcome) or none are (a negative outcome), with the primary goal of minimizing the total number of tests required to pinpoint all defectives exactly, often achieving substantial efficiency gains over testing each item individually when the number of defectives k is much smaller than n (i.e., the sparse regime where k = \Theta(n^\alpha) for \alpha < 1).[1] The method was first proposed in 1943 by Robert Dorfman to optimize large-scale blood testing for syphilis among U.S. military draftees, introducing a simple two-stage adaptive procedure where initial group tests are followed by individual retesting of positive pools, which can reduce the expected number of tests by up to a factor related to the defect probability.[2] Early extensions formalized the problem in combinatorial terms, with contributions from Sobel and Groll (1959) on optimal group sizes and from Kautz and Singleton (1964) on disjunct matrix constructions for non-adaptive testing, linking the approach to coding theory.[1] Group testing encompasses two primary paradigms: adaptive testing, where test outcomes inform subsequent pool selections (e.g., binary splitting algorithms that halve pools iteratively, achieving near-optimal performance with roughly k \log_2(n/k) tests in the noiseless case), and non-adaptive testing, where all pools are predefined and tested in parallel (e.g., using random Bernoulli designs or explicit constructions like superimposed codes, with information-theoretic lower bounds of approximately k \log_2(n/k) tests for exact recovery).[1] Theoretical analysis reveals that the minimal number of tests T satisfies T \geq k \log_2(n/k) from counting arguments, with adaptive methods attaining capacity 1 (full efficiency) in the sparse regime, while non-adaptive approaches approach rates up to about 0.693 for certain designs.[1] Beyond its origins in medical screening, group testing has found wide applications in quality control for detecting faulty components, molecular biology such as DNA library screening and sequencing, communications for multi-access channels and error-correcting codes, data science in sparse signal recovery and compressive sensing, and networking for topology inference and fault detection, including pooled testing for COVID-19.[1][3] Recent advancements address noisy tests (e.g., with error rates modeled via binary symmetric channels, reducing capacity to $1 - h(\rho) where h is the binary entropy function) and practical constraints like partial recovery or heterogeneous defect probabilities, underscoring its versatility in resource-limited scenarios.[1]Fundamentals
Core Concepts and Terminology
Group testing is a statistical and combinatorial method for identifying a small number of defective items among a large population of n items by performing tests on subsets, or pools, of these items, rather than testing each one individually.[4] The core objective is to determine up to d defective items efficiently, where d is typically much smaller than n, such as in scenarios involving rare defects.[4] This approach leverages the binary nature of tests to minimize resource use while ensuring accurate identification. In group testing, items are classified as either defective (also called positive) or non-defective (negative). Defective items possess a specific property of interest, such as being contaminated or faulty, and their presence in a tested pool triggers a detectable response; non-defective items lack this property and do not affect the test outcome.[4] Each test on a pool yields a binary outcome: positive if the pool contains at least one defective item, and negative if all items in the pool are non-defective.[4] This qualitative model assumes reliable tests without errors, focusing on the presence or absence of defectives. The efficiency of group testing is measured by the number of tests required to identify all defectives, which is significantly lower than the baseline of n tests needed for individual testing.[4] For instance, when d is small relative to n, the expected number of tests can approach O(d log n), providing substantial savings over exhaustive individual checks.[4] A simple illustration of pooling occurs in medical screening, where blood samples from multiple individuals are combined into a single test for diseases like syphilis; a negative result clears the entire group, while a positive prompts targeted follow-up tests.[4] Group testing strategies may be adaptive, with test designs depending on prior outcomes, or non-adaptive, performing all tests in parallel.[4]Problem Classifications
Group testing problems are broadly classified into adaptive and non-adaptive variants based on the timing and dependency of tests. In adaptive group testing, test outcomes from previous pools inform the design of subsequent tests, allowing for sequential refinement and potentially fewer total tests in scenarios with feedback availability.[4] This approach contrasts with non-adaptive group testing, where all tests are predefined and can be conducted in parallel, making it suitable for high-throughput applications like large-scale screening but often requiring more tests overall.[4] Problems are further distinguished by their identification objectives, particularly exact versus approximate recovery. Exact identification requires precisely determining the set of all defective items among the population, ensuring no false positives or negatives in the final output.[4] Approximate or threshold variants, however, relax this to goals such as identifying at least one defective, estimating the number of defectives up to a threshold, or tolerating a bounded error rate, which can significantly reduce the testing complexity in sparse regimes.[4] Classifications also arise from assumptions about the number of defectives, denoted as d or k. When d is known exactly, algorithms can be tailored to that fixed count, optimizing for scenarios like exactly k defectives.[4] In contrast, problems with unknown or only bounded d (e.g., d \leq k) demand robust methods that either estimate d or perform without prior knowledge, often at a slight efficiency cost.[4] Additional variations account for complications in test outcomes. The inhibitor model introduces a third class of items—inhibitors—that can mask the presence of defectives in a pool, yielding a negative result even if defectives are present, as first formalized in molecular biology contexts. Noisy multi-defective settings extend the standard model by incorporating errors, such as false positives or negatives, where multiple defectives may interact adversarially or probabilistically in the outcome.[4] Extensions to quantitative group testing differ from the classical qualitative framework by producing non-binary outcomes, such as the exact count or concentration of defectives in a pool, enabling more informative measurements for recovery.[4] Qualitative group testing, the foundational binary-positive/negative model, focuses solely on presence or absence, prioritizing simplicity in detection.[4]Historical Development
Origins and Early Applications
Group testing was first proposed by economist Robert Dorfman in 1943 as a method to efficiently screen large populations for syphilis among U.S. Army inductees during World War II, where the prevalence of the disease was low, making individual testing inefficient and costly.[2] The approach addressed the need to test millions of recruits with limited laboratory resources, leveraging the rarity of positives to minimize the total number of serological tests required.[5] Early applications focused on military and public health contexts, particularly pooling blood samples to detect syphilitic antigen via the Wassermann test. This pooling strategy was implemented successfully by the U.S. military to screen draftees, reducing the burden on testing facilities while maintaining detection accuracy for defective (infected) individuals in large groups.[2][6] Dorfman's two-stage model divided the population into groups of equal size, testing each pooled sample first; negative pools were cleared, while positive pools underwent individual retesting to identify defectives. The optimal group size was derived to minimize expected tests, approximately the square root of the population size divided by the defect probability for low prevalence. This method achieved substantial savings, with up to 90% reduction in tests when defect rates were around 1%.[2] Initial progress in the 1940s and 1950s included refinements to Dorfman's procedure, such as A. Sterrett's 1957 modification, which improved efficiency by stopping individual testing in a positive group upon finding the first defective, thus addressing cases with multiple defectives more effectively. These developments emphasized practical optimizations for group sizes under varying prevalence rates in screening scenarios.[7]Emergence of Combinatorial Methods
The development of combinatorial methods in group testing during the 1950s and 1960s marked a significant theoretical shift, drawing inspiration from coding theory to address non-adaptive testing scenarios. This era saw the formalization of non-adaptive schemes, where all tests are predetermined, allowing for parallel execution and applications beyond medical screening. Central to these advancements were the introduction of superimposed codes and disjunct matrices as core combinatorial tools for non-adaptive group testing. Superimposed codes are binary codes where the logical OR of any collection of codewords can be uniquely decoded to identify the constituent codewords, facilitating the detection of multiple defectives through pooled tests.[8] Disjunct matrices, binary matrices where the union of the supports of any set of at most d columns does not contain the support of any other column, ensure that up to d defectives can be uniquely identified from test outcomes.[9] These structures provided rigorous guarantees for exact identification in predefined test pools, transforming group testing into a branch of design theory. A pivotal contribution came from Alfréd Rényi's 1961 work on separating systems, which established existence results for families of sets capable of distinguishing arbitrary subsets of elements—essential for isolating defectives in combinatorial designs.[10] Such systems enabled the construction of test matrices that separate potential defective sets, laying the groundwork for efficient non-adaptive protocols. Building briefly on early pooling methods from wartime applications, these combinatorial innovations allowed for scalable, exact identification strategies that prioritized theoretical optimality over probabilistic assumptions. A notable milestone in the 1960s was the adaptation of these methods to computer science, particularly fault detection in digital systems, where disjunct matrices helped pinpoint erroneous components in networks and processors with minimal queries.[11]Modern Probabilistic and Algorithmic Advances
The rise of probabilistic group testing in the 1980s introduced models that assume defective items follow a random distribution, such as i.i.d. Bernoulli priors, shifting focus from worst-case combinatorial guarantees to expected performance metrics like the average number of tests required. This approach incorporated random defective sets to analyze efficiency under uncertainty, enabling bounds on expected test counts in sparse regimes where the number of defectives k is constant relative to population size n. Seminal reviews of Russian literature from this era highlighted nonadaptive strategies with small error probabilities, optimizing designs like Bernoulli matrices with probability p = \nu / k to achieve near-optimal expected tests T = O(k \log n). These developments extended earlier deterministic methods by emphasizing probabilistic decoding, improving scalability for large populations. A key contribution bridging adaptive and combinatorial paradigms was F.K. Hwang's work in the late 1970s and 1980s, particularly the generalized binary splitting algorithm, which achieves an information-theoretic rate of 1 for sparsity levels \alpha \in [0, 1), requiring approximately T = k \log_2(n/k) + O(k) tests in expectation.[12] This algorithm adaptively refines pools based on probabilistic outcomes, providing a foundation for hybrid strategies that balance computational efficiency and error rates, influencing subsequent extensions for random defectives.[13] In the 1990s and 2000s, advances in algorithmic decoders emphasized computational complexity, with the Combinatorial Orthogonal Matching Pursuit (COMP) algorithm emerging as a landmark nonadaptive method. Introduced in 2011 but building on complexity studies from the prior decade, COMP decodes sparse signals from noisy measurements using a greedy pursuit akin to sparse recovery, achieving a nonzero rate of R_\text{COMP} = 0.531(1 - \alpha) for sparsity \alpha \leq 0.5, with T = O(k \log n) tests. This work, driven by analyses of decoding tractability, marked a shift toward practical, polynomial-time algorithms for large-scale problems. The 2000s saw group testing linked to compressed sensing, framing defective identification as sparse binary recovery from linear measurements, where test matrices enable reconstruction via \ell_1-minimization or greedy methods.[14] This connection, formalized in works like Boolean compressed sensing, allowed group testing to leverage sensing guarantees for noisy, nonadaptive settings, achieving recovery with O(k \log(n/k)) measurements under restricted isometry properties.[14] Recent trends up to 2025 integrate machine learning for dynamic testing and big-data scalability, using techniques like gradient boosting to model heterogeneous risks and optimize pool formation in real-time.[15] For instance, ML-driven strategies employ correlated risk predictions to enhance frequent testing efficiency, reducing expected tests by up to 50% in simulated large populations while adapting to evolving defectives.[16] These approaches, often combining reinforcement learning with traditional decoders, enable handling of non-i.i.d. priors and massive datasets, as seen in probabilistic analyses achieving near-optimal rates via learned priors.Mathematical Foundations
Formal Problem Models
In group testing, the standard combinatorial model considers a set of n items, labeled by the index set = \{1, 2, \dots, n\}, among which there are at most d defective items, where d \ll n. The defective items form a subset D \subseteq with |D| \leq d, and the goal is to identify D exactly using the minimum number of tests. Each test is defined by a subset S \subseteq , and the outcome of the test is binary: positive (1) if S intersects the defective set (i.e., S \cap D \neq \emptyset), and negative (0) otherwise. This setup corresponds to disjunctive testing, where the test outcome reflects the union coverage of defectives in the pool.[17] The indicator vector x \in \{0,1\}^n represents the defectives, with x_i = 1 if item i \in D and x_i = 0 otherwise. Group testing procedures are classified as adaptive or non-adaptive based on how tests are selected. In the adaptive model, tests are performed sequentially, with each subsequent test subset chosen based on the outcomes of previous tests. This allows for dynamic refinement of the search space. In contrast, the non-adaptive model requires all tests to be predefined in advance, represented by a fixed testing matrix A \in \{0,1\}^{m \times n}, where m is the number of tests, and A_{j,i} = 1 if item i is included in test j. The outcome vector y \in \{0,1\}^m is then given by the disjunctive (Boolean OR) product y = A \circ x, where y_j = 1 if at least one defective is in test j, and 0 otherwise; exact recovery involves solving for D (or equivalently x) from y.[17] These models assume noise-free qualitative tests, meaning outcomes are perfectly reliable and binary, with no false positives or negatives. Extensions to noisy settings incorporate error probabilities (e.g., false negatives or positives with fixed rates), while quantitative models allow test outcomes to reflect the exact number of defectives in a pool rather than just their presence.[17]Information-Theoretic Bounds
In group testing, information-theoretic bounds provide fundamental lower limits on the number of tests t required to identify up to d defectives among n items in the noiseless setting. These bounds arise from the principle that each binary test outcome yields at most 1 bit of information, and the testing procedure must distinguish among all possible defective configurations, including the case of zero defectives. Specifically, there are \sum_{k=0}^d \binom{n}{k} possible outcomes to distinguish, leading to the lower bound t \geq \log_2 \left( \sum_{k=0}^d \binom{n}{k} \right).[18] For large n and fixed d, the sum is dominated by the largest term, so \sum_{k=0}^d \binom{n}{k} \approx \binom{n}{d}, and using Stirling's approximation, \log_2 \binom{n}{d} \approx d \log_2 (n/d). This yields the asymptotic bound t = \Omega(d \log (n/d)), which holds for both adaptive and non-adaptive protocols in the noiseless case. The derivation follows from Fano's inequality applied to the mutual information between the defective set and the test outcomes, ensuring that the entropy of the possible configurations cannot be reduced below zero without sufficient tests.[18][19] These bounds are tight in the adaptive setting, where algorithms such as binary splitting achieve performance within an additive O(d) term of the lower bound for small d, attaining an information-theoretic rate of 1. In contrast, non-adaptive methods, while also subject to the same \Omega(d \log (n/d)) lower bound, require a multiplicative constant greater than 1 in practice, with achievable rates up to approximately 0.693 for certain sparse regimes due to the fixed testing matrix structure.[18][19]Combinatorial Bounds and Constructions
In combinatorial group testing, a key concept for ensuring guaranteed recovery of up to d defectives among n items is the d-disjunct matrix, defined as a t \times n binary matrix where the union (Boolean sum) of any d columns does not contain another column. This property implies that the supports of the columns corresponding to defectives are sufficiently separated, allowing unique identification without ambiguity from up to d false positives. Seminal work established that such matrices provide a combinatorial guarantee for non-adaptive testing, contrasting with probabilistic methods by offering worst-case determinacy.[20][21] Combinatorial lower bounds on the minimal number of tests t for a d-disjunct matrix yield the asymptotic bound t = \Omega\left( \frac{d^2 \log n}{\log d} \right), reflecting the need to distinguish \binom{n}{d} possible defective sets while avoiding overlaps in unions of d columns. These combinatorial bounds exceed the information-theoretic baseline of \Omega(d \log (n/d)) in the non-adaptive setting, highlighting the overhead for deterministic guarantees.[22][23] Upper bounds on t are achieved through explicit and random constructions of d-disjunct matrices, with t = O(d^2 \log n) tests sufficing for recovery. Random constructions, such as Bernoulli sampling with appropriate probabilities, yield d-disjunct matrices with high probability using t \approx d^2 \log n / \log d, leveraging the Lovász Local Lemma to bound union overlaps. Explicit constructions draw from superimposed codes, introduced as nonrandom binary codes where no codeword is contained in the superposition (union) of d others, enabling d-disjunct matrices via disjunctive code properties. The Kautz-Singleton construction, using concatenated constant-weight codes, achieves t = O(d^2 \log_d n), foundational for scalable non-adaptive schemes.[24][22] Further constructions employ separating hash families, which generalize hash functions to ensure that for any d+1 distinct items, there exists a hash (test) separating one from the others, yielding d-disjunct matrices with t = O(d^2 \log n). These families provide efficient explicit builds via finite geometries or algebraic methods, improving over random sampling in computational efficiency. A related key property is that of (d, t)-separating systems, binary matrices where for any set of d columns and any other column, there is a row distinguishing the single column from the union of the d, ensuring identification even under bounded errors. Such systems underpin advanced disjunct constructions, with minimal t scaling as O(d \log n + t \log d), facilitating robust group testing variants.[25][26]Adaptive Algorithms
Binary Splitting
Binary splitting is an adaptive group testing algorithm that identifies defective items among a population of n items by iteratively dividing the search space in half and testing subsets to narrow down the location of defectives. Introduced by Sobel and Groll in their seminal 1959 paper, the method assumes binary test outcomes (positive if at least one defective is present, negative otherwise) and proceeds sequentially based on results. For the case of exactly one known defective (d=1), the procedure begins with the full set of n items. A subset of size \lfloor n/2 \rfloor is formed and tested. If the test is positive, the algorithm recurses on this subset; if negative, it recurses on the remaining items. This halving continues until a singleton set is reached, confirming the defective item. No initial test of the full set is needed if d=1 is known, as the presence is assumed. The expected number of tests required is \log_2 n, matching the information-theoretic lower bound of \log_2 n bits needed to identify one defective among n items. In the worst case, the algorithm requires \lceil \log_2 n \rceil tests. To extend binary splitting to multiple defectives (d > 1), the algorithm adjusts the splitting ratio based on an estimate of the number of defectives in the current group, often using a 1:2 or similar ratio instead of strict halving to optimize for higher defect concentrations. The process identifies one defective at a time: after locating and removing a defective via halving, the algorithm restarts on the remaining items, updating the defective count estimate as needed. This sequential approach achieves an average of approximately d \log_2 (n/d) + d tests, which is near-optimal and within a small constant factor of the information-theoretic bound for moderate d. The worst-case number of tests remains up to n, occurring if defectives are sparse and many singletons must be verified. A simple example illustrates the procedure for d=1 and n=8 items labeled 1 through 8. First, test items 1-4: if positive, recurse on 1-4 (test 1-2 next); if negative, recurse on 5-8. Continuing halvings (e.g., positive on 1-2 leads to testing item 1), the defective is identified in exactly 3 tests, as \log_2 8 = 3. This deterministic path highlights the efficiency for power-of-two sizes, though rounding applies for arbitrary n.Generalized Binary Splitting
Generalized binary splitting extends the basic binary splitting algorithm by adjusting subgroup sizes to account for the expected number of defectives in the current group, enabling more efficient identification of multiple defectives.[27] In this approach, a group of size s is not simply divided into halves but instead involves testing a carefully chosen subgroup of size k, with the remaining s - k items handled recursively based on the test outcome. This adjustment optimizes the procedure for scenarios where the number of defectives d > 1, reducing unnecessary tests compared to uniform halving.[27] Hwang's generalization, introduced in 1972, determines optimal split ratios using binomial probability distributions to minimize the expected or worst-case number of tests.[27] For a group of size s with an upper bound d on the number of defectives (or expected d under a prior), the subgroup size k is selected as the largest power of 2 such that k \leq (s - d + 1)/d, ensuring the tested subgroup is likely to contain approximately one defective while balancing recursion depths.[27] In probabilistic settings with prior defective probability p per item, the procedure uses binomial outcomes to guide split sizes.[28] The full procedure proceeds as follows: Test the subgroup of size k. If positive, apply binary splitting to isolate one defective (requiring approximately \log_2 k additional tests), remove the identified defective and confirmed non-defectives, and recurse on the remaining items with updated d - 1. If negative, discard the subgroup and recurse on the remaining s - k items with unchanged d. This continues until all defectives are identified or the group is confirmed defect-free. When d is unknown, an upper bound m (e.g., m_\beta such that the probability of at most m defectives is at least \beta, like 0.99) is used to select splits conservatively.[27] In terms of performance, the algorithm achieves the information-theoretic lower bound asymptotically, requiring approximately t \approx d \log_2 (n/d) + O(d) tests for n items and d defectives, where the O(d) term captures overhead from isolations.[28] This is within a constant factor of the entropy bound d \log_2 (n/d) + d \log_2 e, and the worst-case number of tests exceeds the combinatorial lower bound by at most d - 1.[27] Compared to basic binary splitting, which effectively requires closer to d \log_2 n tests for multiple defectives, generalized binary splitting reduces the total by 20-30% when d > 1, particularly in moderate d regimes, by better distributing the search effort.[28]Other Adaptive Strategies
In generalized group testing, adaptive strategies extend beyond binary outcomes by forming test groups of size greater than two, selected to maximize information gain about the defective items' identities. These methods model the test outcome as a probabilistic function of the number of defectives in the pool, allowing for more flexible pooling that balances test efficiency and diagnostic accuracy. For instance, the Adaptive Bayesian Group Testing (ABGT) algorithm designs each subsequent test by choosing a measurement vector that maximizes the expected update in posterior probabilities, effectively prioritizing groups that yield the highest entropy reduction.[29] Sequential algorithms in adaptive group testing iteratively refine item statuses using partial test results to update and re-pool suspects, contrasting with one-shot non-adaptive approaches. A prominent example is the Bayesian sequential experimental design for noisy group testing, which treats the identification problem as an active learning task and selects pools to minimize the expected posterior entropy after each observation. This process continues until the uncertainty falls below a threshold, enabling efficient recovery even when tests have error rates up to 0.5. Such sequential updates ensure that subsequent tests focus on unresolved items, reducing the total number of tests required compared to static designs. Probabilistic adaptive testing leverages Bayesian inference to dynamically choose the next test that maximizes entropy reduction, incorporating prior beliefs about defectives and updating posteriors based on outcomes. In this framework, the utility of a potential group is quantified by the mutual information between the test result and the latent defective set, guiding the selection of pools that most sharply narrow the probability distribution over possible configurations. These methods are particularly effective in sparse regimes, where the number of defectives is much smaller than the population size, as they adaptively target high-uncertainty regions. For example, the posterior after a test on group G_t is updated via Bayes' rule, with the next group G_{t+1} chosen to optimize I(X; Y_{t+1} \mid Y_{1:t}), where X is the defective set and Y the observations.[30] A practical illustration of multi-stage adaptive strategies is the extension of Dorfman's original two-stage procedure to additional rounds with re-pooling, where positive pools from prior stages are subdivided and retested based on intermediate results. In multi-stage Dorfman testing, initial large pools (e.g., size 16) are screened, followed by adaptive subdivision of positives into smaller groups, repeating until individual statuses are resolved. Simulations show that three-stage schemes can test up to seven times more individuals than individual testing with the same test budget, assuming low prevalence (e.g., 1-5% defectives), while maintaining high sensitivity.[31] These alternative adaptive strategies often combine elements of matrix-based designs and probabilistic selection, achieving performance near the information-theoretic lower bounds from combinatorial analysis—typically requiring O(d \log n) tests for d defectives in a population of size n—but at the cost of increased computational overhead for real-time posterior updates and group optimization.[32]Non-Adaptive Algorithms
Matrix-Based Representations
In non-adaptive group testing, the testing procedure is represented by an m \times n binary matrix A, where m denotes the number of tests and n the number of items to be screened. Each entry a_{ij} = 1 if item j is included in test i, and $0 otherwise; the rows of A thus specify the composition of each pooled test. This matrix formulation encapsulates the entire testing strategy upfront, enabling all tests to be prepared and executed simultaneously without sequential dependencies. Given a set of at most d defectives among the n items, the outcome of the tests is captured by a binary outcome vector y \in \{0,1\}^m, where y = A x and x \in \{0,1\}^n is the indicator vector of defectives (with \|x\|_0 \leq d). In the standard disjunctive model of group testing, the i-th entry y_i is the Boolean OR of the entries \{x_j : a_{ij} = 1\}, yielding y_i = 1 if at least one defective is present in test i, and $0 otherwise. An alternative quantitative model uses modular arithmetic, such as y_i = \sum_j a_{ij} x_j \mod 2, which aligns with certain coding-theoretic interpretations but is less common in classical combinatorial settings. For unique identification of the defective set from y, the matrix A must ensure that distinct possible indicator vectors x and x' (each with at most d ones) produce distinct outcomes y and y'. This property is formalized by requiring A to be d-separable: the unions of any two distinct d-subsets of columns must differ in at least one row. A stronger condition is that A be d-disjunct, meaning no column is contained in the union of any d other columns; this not only guarantees separability but also enables efficient decoding by ensuring that non-defective items do not mimic defectives in positive tests. The d-disjunct property was introduced in the context of superimposed codes for non-adaptive identification. The non-adaptive matrix representation offers significant parallelization advantages, as all m tests can be conducted concurrently, which is particularly beneficial under physical or temporal constraints such as limited laboratory throughput or communication bandwidth limitations in multiaccess channels. Combinatorial bounds ensure that suitable d-disjunct or d-separable matrices exist with m = O(d^2 \log n) rows, balancing identification reliability with testing efficiency. A simple example illustrates the matrix for d=1 (identifying a single defective) with n=7 items using m=3 tests: the columns of A are the seven nonzero binary vectors in \{0,1\}^3, ensuring each possible defective produces a unique nonzero outcome vector y. This construction, akin to a binary code with distinct codewords, guarantees identification since no two columns coincide.| Test/Item | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| 2 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 3 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |