Detection theory

Signal detection theory (SDT), also known as detection theory, is a mathematical and statistical framework developed to model and analyze decision-making processes in situations involving uncertainty, such as distinguishing a weak signal from background noise in perceptual, cognitive, or sensory tasks.^[1] It quantifies an observer's ability to detect the presence or absence of a stimulus by separating sensitivity (the capacity to discriminate signal from noise, often measured as d') from response bias (the tendency to favor one response over another, quantified by metrics like criterion c or likelihood ratio β).^[2] Core to SDT is the assumption that internal responses to stimuli follow probability distributions—typically Gaussian with equal variance—leading to four possible outcomes in binary detection tasks: hits (correctly detecting a signal), misses (failing to detect a signal), false alarms (detecting a signal when absent), and correct rejections (correctly identifying noise).^[3] The theory originated in the mid-20th century from advancements in radar and communication engineering during World War II, where engineers like W. W. Peterson and T. G. Birdsall addressed challenges in detecting faint echoes amid electronic noise, formalizing the approach in the 1950s.^[4] Its roots trace further to 19th-century psychophysics, with Gustav Fechner's 1860 work on threshold detection using Gaussian error theory in two-alternative forced-choice tasks, later extended by L. L. Thurstone in 1927 for scaling subjective judgments.^[4] By 1966, David M. Green and John A. Swets published the seminal book Signal Detection Theory and Psychophysics, establishing SDT as a cornerstone of experimental psychology and providing tools like receiver operating characteristic (ROC) curves to visualize trade-offs between sensitivity and bias without assuming equal stimulus probabilities.^[1] SDT has broad applications across disciplines, including sensory perception (e.g., auditory tone detection in noise), neuroscience (modeling neural responses in animals like ferrets), medical diagnostics (evaluating imaging accuracy), and even eyewitness memory and forensic science, where it helps disentangle true discriminability from conservative or liberal biases.^[1] In behavioral ecology, it analyzes animal decision-making, such as predator detection, by fitting models to hit and false alarm rates.^[1] Key advancements include extensions to unequal variance models and integration with Bayesian decision theory, enhancing its utility in modern machine learning for signal processing tasks.^[4]

Fundamentals

Definition and basic principles

Detection theory, also known as signal detection theory (SDT), provides a framework for understanding decision-making under uncertainty, particularly in tasks where an observer must determine whether a stimulus (signal) is present amid background interference (noise).^[5] In this setup, two competing hypotheses are considered: H₀, representing noise alone, and H₁, representing signal plus noise. The observer's decision is based on an observation y, which could be a sensory percept or an instrumental measurement, leading to a binary response of "yes" (signal present) or "no" (signal absent).^[6] The basic principles of detection theory emphasize the inherent trade-off between maximizing detection accuracy (sensitivity) and minimizing erroneous positives (false alarms), as decisions are influenced by the observer's criterion for responding. This criterion can shift based on task demands or payoffs, allowing the same sensory evidence to yield different outcomes. An ideal observer represents the theoretical optimum, making decisions that maximize correct responses by fully utilizing knowledge of signal and noise distributions, serving as a benchmark for real-world performance.^[5]^[6] Detection tasks are probabilistic rather than deterministic because noise—arising from external sources like environmental variability or internal factors such as neural fluctuations—introduces overlap between signal and noise distributions, making perfect discrimination impossible.^[6] This variability leads to four possible outcomes, organized in a confusion matrix for binary detection:

Actual/Presented	Response: "Yes" (Signal)	Response: "No" (Noise)
Signal Present (H₁)	Hit (correct detection)	Miss (failure to detect)
Noise Only (H₀)	False Alarm (false positive)	Correct Rejection (correct non-detection)

For example, in a simple auditory task where a tone may or may not be embedded in white noise, a hit occurs when the observer correctly identifies the tone's presence, while a false alarm happens when noise alone is mistaken for the tone.^[5] These outcomes quantify performance, with measures like sensitivity (d') and bias (c or β) derived from hit and false alarm rates to assess discriminability and decision thresholds, often visualized via receiver operating characteristic (ROC) curves.^[6]

Historical development

The origins of detection theory trace back to 19th-century psychophysics, where researchers sought to quantify the relationship between physical stimuli and perceptual sensations. Ernst Weber established the concept of just-noticeable differences in the 1830s, proposing that the smallest detectable change in a stimulus is proportional to the stimulus's magnitude, laying groundwork for understanding perceptual thresholds.^[7] Gustav Fechner built on this in 1860 with Elements of Psychophysics, formalizing psychophysics through probabilistic models and introducing the two-alternative forced-choice task, which used Gaussian distributions to model sensory variability and marked the transition toward treating perception as a detection process amid noise.^[4] During World War II, detection theory emerged in practical applications for radar and sonar systems, where operators needed to distinguish aircraft signals from noise under uncertainty. In 1943, D.O. North at RCA Laboratories developed early statistical frameworks for optimizing radar detection, emphasizing ideal observer performance in noisy environments.^[8] This wartime work spurred post-war advancements; by 1953, Peterson and Birdsall introduced the theory of signal detectability at the University of Michigan, incorporating receiver operating characteristic (ROC) analysis to evaluate performance independently of decision biases.^[4] In 1954, Peterson, Birdsall, and Fox published a seminal paper formalizing the statistical foundations, while Tanner and Swets advanced its application to human observers, highlighting adjustable decision criteria.^[9] The 1960s saw the formalization of signal detection theory (SDT) for psychological contexts, with Wilson Tanner and John Swets contributing key experiments on auditory detection that demonstrated how sensitivity measures like d' separate perceptual ability from response bias.^[4] David Green and John Swets' 1966 book, Signal Detection Theory and Psychophysics, synthesized these ideas into a comprehensive framework, becoming a seminal text that integrated probabilistic modeling with empirical psychophysics and influenced subsequent research across disciplines.^[10] By the 1970s, SDT expanded into cognitive psychology, applied to memory recognition and decision-making tasks to model internal noise and criteria shifts.^[11] The 1980s and 1990s witnessed its integration into neuroscience, where single-unit recordings validated neural noise assumptions, and into machine learning, where ROC curves became standard for classifier evaluation.^[4] Post-2000, SDT has driven applications in artificial intelligence and big data analysis, informing anomaly detection in vast datasets and adaptive systems.^[12]

Core Concepts

Signal, noise, and discriminability

In signal detection theory, a signal refers to a meaningful stimulus or informative pattern that an observer must detect, such as a faint auditory tone or a brief visual flash, while noise denotes the random, irrelevant variability or background interference that obscures the signal, like ambient sounds or neural fluctuations.^[2] This framework models detection as the observer's internal response to either noise alone or signal-plus-noise, assuming an additive Gaussian noise model where the signal shifts the mean of the underlying noise distribution without altering its variance, representing a common idealization in psychophysical tasks.^[13] Discriminability quantifies how well an observer can separate signal-plus-noise responses from noise-alone responses, with the sensitivity index d' serving as the primary measure, defined as the standardized distance between the means of these two distributions in units of standard deviation.^[2] Intuitively, d' is calculated as the difference between the z-scores of the hit rate (correctly detecting the signal) and the false alarm rate (incorrectly detecting noise as signal), such that higher values indicate greater separation and better performance; for instance, d' = 0 reflects chance-level discriminability, while d' > 2 suggests strong detectability.^[13] This metric isolates perceptual sensitivity from decision biases, allowing direct comparison across experiments.^[14] Receiver operating characteristic (ROC) curves provide a comprehensive visualization of discriminability by plotting the true positive rate (sensitivity, or hit rate) against the false positive rate (1 - specificity, or false alarm rate) as the decision criterion varies.^[15] The curve's shape reflects overall performance, with a diagonal line indicating random guessing and a curve bowing toward the upper-left corner showing superior discriminability. The area under the ROC curve (AUC) summarizes this as a single scalar from 0 to 1, where AUC = 0.5 denotes no better than chance performance and AUC = 1 represents perfect separation of signal from noise.^[15] Several factors influence discriminability in detection tasks, including signal strength, which increases d' by widening the separation between distributions, and noise variance, which decreases d' by promoting overlap between signal-plus-noise and noise-alone responses.^[13] Observer variability, such as differences in attention or sensory acuity, further modulates these effects; for example, in visual detection, a brighter target light elevates d' by enhancing signal strength against dark-adapted noise, while in auditory detection, a louder tone amid white noise reduces discriminability if internal neural noise variance is high.^[16]

Decision criteria and bias

In signal detection theory, the decision criterion represents an adjustable threshold that observers apply to their internal evidence to determine whether a stimulus contains a signal or is merely noise. This threshold influences the trade-off between hits (correctly identifying signals) and false alarms (incorrectly identifying noise as signals), such that shifting the criterion toward a more conservative position reduces both hits and false alarms, while a liberal shift increases them.^[5] Two primary measures quantify response bias under the equal-variance Gaussian assumption: the criterion index c, defined as the location of the decision criterion relative to the point of equal variance between signal-plus-noise and noise distributions, and the likelihood ratio \beta, which is the ratio of the likelihoods of the observation under the signal-plus-noise hypothesis to the noise-only hypothesis at the criterion point. Specifically, c = -\frac{1}{2} (z(H) + z(F)), where z(H) and z(F) are the inverse normal transforms of the hit rate and false alarm rate, respectively; positive values of c indicate a conservative bias (favoring the noise hypothesis, H_0), while negative values reflect a liberal bias. Similarly, \beta > 1 signifies a conservative bias favoring H_0, \beta < 1 a liberal bias favoring the signal hypothesis H_1, and \beta = 1 neutrality.^[5] Bias is influenced by external factors such as payoffs for decision outcomes and prior probabilities of signal presence. Payoffs are formalized in a matrix specifying utilities for hits, correct rejections, misses, and false alarms; for instance, in a symmetric payoff scheme, correct responses might yield +1 utility while errors yield -1, but asymmetric payoffs (e.g., high cost for false alarms in medical diagnostics) shift the optimal criterion toward conservatism to maximize expected utility. Prior probabilities also adjust the criterion via Bayes' rule, with low signal priors promoting conservatism even under neutral payoffs.^[17]^[5] Under the equal-variance assumption, bias interacts with discriminability in receiver operating characteristic (ROC) space, where shifts in the criterion trace symmetric points along an ROC curve without altering its shape or the underlying sensitivity measure d'; this demonstrates the independence of response bias from perceptual sensitivity, as bias adjustments reflect strategic choices rather than changes in signal-noise separability.^[5]

Applications in Psychology

Psychophysics and perception

Signal detection theory (SDT) has revolutionized psychophysics by providing a framework to quantify sensory sensitivity and decision biases separately, moving beyond classical threshold models that conflated detection ability with response tendencies.^[5] In psychophysical experiments, SDT is applied to measure absolute thresholds—the minimum stimulus intensity detectable against noise—and difference limens—the smallest detectable change between stimuli—by analyzing hit rates and false alarms across varying signal strengths. The method of constant stimuli, adapted within SDT, presents signal trials at fixed intensities interspersed with noise-only trials, allowing estimation of the psychometric function that reflects discriminability without assuming a strict threshold.^[18] In perceptual decision-making, SDT elucidates how observers detect targets amid distractions, informing processes like visual search, where attention modulates sensitivity to faint stimuli such as dots embedded in dynamic noise.^[19] For instance, in auditory tasks, SDT models tone detection under masking noise, revealing how perceptual sensitivity improves with signal-to-noise ratios.^[5] The theory also applies to multisensory integration, where combined cues from vision and audition enhance overall discriminability, as seen in cue-combination experiments yielding superadditive benefits beyond unimodal performance.^[20] Experimental paradigms in SDT psychophysics typically employ yes/no tasks, where observers report stimulus presence or absence, or forced-choice tasks, such as two-alternative forced choice (2AFC), requiring selection of the interval containing the signal to minimize bias effects. Confidence ratings extend these designs, producing rating-scale data that generate receiver operating characteristic (ROC) curves for assessing discriminability across criterion shifts.^[21] Trial outcomes are fitted to SDT models using maximum likelihood estimation to derive parameters like d', a bias-free measure of sensitivity, and c, the decision criterion, enabling precise characterization of perceptual limits. Neural correlates of SDT processes involve primary and secondary sensory cortices, where neuronal firing rates encode signal strength and noise, correlating with behavioral discriminability in tasks like visual contrast detection. For example, activity in early visual areas (V1) scales with perceptual reports of faint targets, supporting SDT's prediction that detection arises from probabilistic neural evidence accumulation.^[22]

Clinical and cognitive assessments

Detection theory provides a robust framework for clinical and cognitive assessments by disentangling sensory sensitivity from response biases, allowing clinicians to identify perceptual impairments more precisely than traditional threshold methods. In these contexts, measures like discriminability (d') quantify an individual's ability to distinguish signals from noise, while criterion (c) or bias (\beta) capture decision-making tendencies, such as conservatism in reporting stimuli. This approach is particularly valuable for evaluating deviations in patient populations, where biases may inflate or mask true deficits. In audiology, signal detection theory enhances hearing assessments by adjusting for response bias in tasks where patients detect tones amid noise, as seen in SDT-based audiograms that reveal hidden hearing loss through reduced d' despite passing standard pure-tone tests. For example, models predict perceptual consequences of synaptic damage in the cochlea, showing how noise exposure lowers sensitivity without altering thresholds.^[23] Similarly, in multitasking scenarios like detecting auditory signals during visual tasks, SDT indices demonstrate preserved d' but liberalized criteria under divided attention, informing diagnostics for attention-related hearing complaints.^[24] Ophthalmology applies detection theory to visual field analysis, using perimetry tests to generate ROC curves that assess glaucoma progression independent of patient bias. Signal-to-noise ratios from these tasks compare test sensitivities, with SDT enabling bias-corrected estimates of defect depth in conditions like retinitis pigmentosa.^[25] This method highlights how instructional sets influence criterion placement, ensuring more reliable discriminability measures for early intervention.^[26] In cognitive psychology, detection theory evaluates memory tasks by modeling recognition as signal detection, where d' indicates memory strength separate from guessing biases; this is critical for neuropsychological batteries assessing amnesia or dementia.^[27] Vigilance paradigms further reveal sustained attention lapses, with the decrement manifesting as declining d' over time due to resource depletion.^[28] Fatigue exacerbates these effects by reducing d' and shifting criteria toward conservatism, while pharmacological agents like noradrenaline agonists enhance perceptual sensitivity in visual tasks.^[29]30687-0) Standardized tools leverage these principles, such as the Auditory Detection Task, which presents words in noise to compute d' and bias, validated for distinguishing clinical groups like those with auditory processing disorders from healthy controls.^[30] ROC analyses from detection theory assess eyewitness reliability in forensics by plotting hit rates against false alarms in lineup simulations, revealing liberal biases under stress.^[31] For ADHD diagnosis, ROC curves optimize symptom cutoffs, improving validity by balancing sensitivity and specificity in continuous performance tests.^[32] Clinical case studies underscore detection theory's diagnostic power; in schizophrenia, patients with auditory hallucinations show reduced d' in noise-masked tone detection, suggesting sensory rather than decisional origins for symptoms.^[33] Aging impairs visual motion discriminability, with elderly individuals exhibiting lower d' than youth, an effect additive to schizophrenia's deficits in random-dot kinematograms.^[34] Interventions, including bias-recalibration training, target criterion shifts in disorders like tinnitus, where SDT-guided choices between treatments enhance decision utility and perceptual recalibration.^[35]

Applications in Engineering and Signal Processing

Radar, sonar, and communications

Detection theory plays a central role in radar and sonar systems, where the primary challenge is distinguishing targets from environmental clutter and noise in real-time scenarios. In radar, electromagnetic pulses are transmitted to detect aircraft, missiles, or ships amid sea clutter or atmospheric interference, while sonar employs acoustic waves for underwater target detection, such as submarines in ocean reverberation. These systems rely on statistical decision-making to set detection thresholds that balance the probability of detection (Pd) against the probability of false alarm (Pfa), ensuring reliable performance in dynamic military environments. Seminal work during World War II laid the foundation, with Allied radar developments like the British Chain Home system enabling early aircraft detection, which evolved into more sophisticated signal processing for target discrimination.^[36] To maintain consistent performance against varying clutter levels, constant false alarm rate (CFAR) processors adaptively adjust detection thresholds based on local noise estimates, preventing excessive false alarms in non-homogeneous environments. Common CFAR variants, such as cell-averaging CFAR (CA-CFAR), estimate the noise power from surrounding range cells and scale it by a factor to set the threshold, achieving a fixed Pfa (e.g., 10^{-6}) while maximizing Pd. In modern phased array radars, like those used in air defense systems, CFAR integrates with beamforming to handle multiple targets simultaneously, improving detection in high-clutter scenarios such as urban or littoral zones for sonar applications. This adaptive approach, rooted in post-WWII advancements, has transitioned from analog implementations to digital processors, enhancing robustness in systems like the AN/SPY-1 radar.^[37]^[38]^[39] In digital communications, detection theory underpins symbol detection in noisy channels, where receivers decide on transmitted symbols (e.g., in QAM or PSK modulation) corrupted by additive white Gaussian noise (AWGN). The bit error rate (BER) serves as a key metric, decreasing exponentially with increasing signal-to-noise ratio (SNR); for binary phase-shift keying (BPSK), BER ≈ (1/2) erfc(√SNR), illustrating how higher SNR enhances discriminability.^[40]^[41] Error-correcting codes, such as convolutional or turbo codes, leverage detection metrics like mutual information to optimize decoding, reducing BER below uncoded levels—for instance, achieving BER < 10^{-5} at SNR ≈ 2 dB with rate-1/2 codes in fading channels. These principles guide system design for reliable data transmission in wireless networks and satellite links. System design in radar and sonar emphasizes techniques like matched filtering, which correlates the received signal with a time-reversed replica of the transmitted pulse to maximize SNR at the decision point, yielding an output SNR up to 2E/N_0 (where E is signal energy and N_0 is noise spectral density). This optimal linear filter enhances pulse detection sensitivity, particularly for weak echoes. Additionally, integration time—the duration over which multiple pulses are coherently or non-coherently summed—affects overall sensitivity; longer integration (e.g., 10-100 pulses) can improve SNR by up to 10 log_{10}(N) dB for non-fluctuating targets, though it trades off against Doppler resolution in moving scenarios. In sonar, matched filtering combined with longer integration times boosts detection ranges in low-frequency active systems.^[42]^[43] Performance evaluation in these systems often employs receiver operating characteristic (ROC) curves, plotting Pd versus Pfa to tune thresholds and compare detector efficacy across operating conditions. Originating from WWII radar signal analysis, ROC curves enable engineers to select bias levels that achieve desired trade-offs, such as Pd > 0.9 at Pfa = 10^{-6}, guiding optimizations in both radar and communication receivers. The shift from analog to digital detectors in the 1970s, driven by advances in integrated circuits and DSP chips like the TMS320, revolutionized these fields by enabling programmable CFAR and matched filters, reducing processing losses and improving adaptability in real-time applications.^[44]^[45]

Medical imaging and diagnostics

Detection theory plays a pivotal role in evaluating the performance of medical imaging modalities for lesion detection, particularly in MRI and CT scans where subtle abnormalities must be distinguished from background noise. In these contexts, receiver operating characteristic (ROC) analysis quantifies radiologist accuracy by plotting true positive rates against false positive rates, providing a measure of discriminability independent of decision bias. For instance, studies on pulmonary nodule detection in CT scans have shown that experienced radiologists achieve area under the ROC curve (AUC) values around 0.90, reflecting high sensitivity to small lesions amid anatomical noise.^[46] When multiple abnormalities are present, free-response ROC (FROC) curves extend this framework by accounting for localization and multiple detections, evaluating the trade-off between true lesion identifications and false marks per image. FROC analysis has been applied to breast lesion detection in mammography, where it reveals performance limitations in dense tissue, with typical figures of merit indicating that radiologists detect about 80% of cancers while marking fewer than 1 false positive per image.^[47] In diagnostic applications beyond imaging, detection theory informs signal identification in electrophysiological signals like ECG for arrhythmias and EEG for seizures. For ECG analysis, the theory models arrhythmia detection as distinguishing irregular rhythms (signal) from normal sinus patterns (noise), with ROC curves assessing algorithm and clinician performance; early applications using hidden Markov models achieved detection sensitivities exceeding 90% for ventricular arrhythmias in benchmark datasets.^[48] Similarly, in EEG seizure detection, signal detection metrics evaluate the separation of ictal bursts from interictal activity, where sonified EEG training has improved observer hit rates from 0.50 to 0.64 while reducing response bias, as measured by criterion shifts in signal detection theory.^[49] Bias in screening tests, such as mammography, arises from decision criteria favoring high sensitivity over specificity to minimize missed cancers, leading to elevated recall rates; detection theory quantifies this via the bias parameter.^[50] Compressed sensing, developed in the mid-2000s by Candès, Romberg, and Tao, enhances detection in medical imaging by enabling underdetermined signal recovery from sparse data, crucial for accelerating MRI scans without aliasing artifacts. This technique reconstructs images via l1-norm minimization, promoting sparsity in transform domains like wavelets, allowing detection of lesions with 4- to 8-fold undersampling while preserving diagnostic quality; clinical trials in dynamic cardiac MRI have demonstrated equivalent lesion conspicuity to full sampling, with reconstruction times under 30 seconds.^[51] In sparse medical data, such as compressed k-space in MRI, it facilitates efficient abnormality detection by solving optimization problems that recover signals exceeding Nyquist rates, as formalized in the stable recovery theorem.^[52] Challenges in these applications include observer variability, which detection theory addresses through repeated ROC assessments revealing intra- and inter-radiologist agreement coefficients as low as 0.60 for subtle CT lesions, underscoring the need for standardized criteria to mitigate bias and improve reproducibility.^[53] AI-assisted detection mitigates these issues by augmenting human performance, with convolutional neural networks boosting AUC from 0.82 (radiologists alone) to 0.89 in assessment of indeterminate pulmonary nodules on chest CT, enabling more consistent signal-noise discrimination across varying expertise levels.^[54]

Mathematical Foundations

Bayesian and maximum a posteriori detection

In detection theory, the Bayesian framework provides a probabilistic approach to hypothesis testing by incorporating prior knowledge about the likelihood of competing hypotheses. Consider two mutually exclusive and exhaustive hypotheses: H_0 (absence of signal) and H_1 (presence of signal), with prior probabilities P(H_0) and P(H_1) = 1 - P(H_0). Given an observation y, the posterior probabilities are derived from Bayes' theorem: P(H_1|y) = \frac{p(y|H_1) P(H_1)}{p(y)} and P(H_0|y) = \frac{p(y|H_0) P(H_0)}{p(y)}, where p(y|H_i) is the likelihood of y under H_i and p(y) is the marginal density of y. The posterior odds ratio, which quantifies the evidence for H_1 over H_0 given y, is then P(H_1|y)/P(H_0|y) = \frac{p(y|H_1)}{p(y|H_0)} \cdot \frac{P(H_1)}{P(H_0)}, with the first factor known as the likelihood ratio \Lambda(y). The maximum a posteriori (MAP) detection rule selects the hypothesis that maximizes the posterior probability, deciding in favor of H_1 if P(H_1|y) > P(H_0|y). This is equivalent to thresholding the likelihood ratio: \Lambda(y) > \frac{P(H_0)}{P(H_1)}. To derive this, note that the overall probability of error P_e = P(\hat{H}_0|H_1)P(H_1) + P(\hat{H}_1|H_0)P(H_0), where \hat{H}_i denotes deciding H_i. Minimizing P_e leads to partitioning the observation space such that decisions minimize the contributions from false alarms and misses, yielding the MAP rule as the minimum probability of error (minimum P_e) detector when costs are equal. More generally, the Bayes criterion extends the MAP rule to account for unequal costs associated with decision errors. Define C_{ij} as the cost of deciding H_i when H_j is true, with C_{00} and C_{11} typically zero for correct decisions. The expected cost (Bayes risk) is minimized by deciding H_1 if P(H_1|y) (C_{01} - C_{11}) > P(H_0|y) (C_{10} - C_{00}), or equivalently,

\Lambda(y) = \frac{p(y|H_1)}{p(y|H_0)} > \frac{P(H_0)}{P(H_1)} \cdot \frac{C_{10} - C_{00}}{C_{01} - C_{11}}.

This threshold adjusts the decision boundary based on priors and costs, optimizing for the total risk rather than error probability alone. For binary hypothesis testing with discrete observations, consider a simple example where y \in {0, 1} under H_0 (noise only, p(y=1|H_0) = \epsilon small) and under H_1 (signal plus noise, p(y=1|H_1) = 1 - \delta, \delta small). The likelihood ratio for y=1 is \Lambda(1) = \frac{1 - \delta}{\epsilon}. If priors are equal and costs symmetric, the MAP rule decides H_1 for y=1 if \frac{1 - \delta}{\epsilon} > 1, which holds for typical small \epsilon and \delta, connecting directly to the minimum P_e detector that favors the more probable outcome given the observation.

Neyman-Pearson framework

The Neyman-Pearson framework constitutes a foundational frequentist approach in detection theory for binary hypothesis testing, where the primary objective is to construct tests that control the probability of false alarms while maximizing the probability of correct detections. Developed by Jerzy Neyman and Egon Pearson, this paradigm shifts focus from point estimation to decision-making under uncertainty, treating hypotheses as fixed states of nature rather than probabilistic events. It prioritizes the long-run performance of tests over repeated sampling, ensuring reliable error control in applications requiring strict limits on erroneous declarations of signal presence. Central to the framework are the concepts of type I and type II errors. The type I error rate, denoted α, represents the probability of incorrectly rejecting the null hypothesis H₀ (typically, no signal is present) when it is true, often termed the false alarm probability. The type II error rate, β, is the probability of failing to reject H₀ when the alternative hypothesis H₁ (signal is present) holds true, known as the miss probability. The framework seeks to minimize β subject to a pre-specified upper bound on α, thereby maximizing the detection power, defined as 1 - β. The Neyman-Pearson lemma provides the theoretical foundation for optimal tests under simple hypotheses, where both H₀ and H₁ fully specify the underlying distributions. It asserts that, among all tests with significance level α, the likelihood ratio test (LRT) achieves the highest power. The LRT rejects H₀ in favor of H₁ if the likelihood ratio exceeds a threshold η chosen to satisfy the size constraint:

\Lambda(y) = \frac{p(y \mid H_1)}{p(y \mid H_0)} > \eta,

where p denotes the probability density or mass function, and y is the observed data. This test is randomized in boundary cases to exactly meet α when the distribution is discrete. The power function of the test, which plots 1 - β as a function of α, underpins the receiver operating characteristic (ROC) curve. By varying η across possible values, one generates pairs (α, 1 - β), tracing the ROC as a parametric curve in the unit square. The ROC summarizes the inherent discriminability between H₀ and H₁, with the curve's concavity and position above the diagonal indicating test efficacy; the area under the ROC quantifies overall performance but is secondary to pointwise power in the Neyman-Pearson sense. For composite hypotheses, where H₁ encompasses a range of alternatives, the framework extends to uniformly most powerful (UMP) tests, which maintain maximum power across the entire alternative space for a fixed α. UMP tests exist for one-sided alternatives when the distribution family admits a monotone likelihood ratio (MLR) in a sufficient statistic T(y), meaning Λ(y) increases with T(y). In such cases, the test rejects H₀ if T(y) > c, where c sets the size to α. The Karlin-Rubin theorem guarantees this LRT form is UMP for exponential families and other MLR classes, enabling broad applicability without recomputing thresholds for each alternative point. Despite its optimality guarantees, the Neyman-Pearson framework has notable limitations as a frequentist method. It does not incorporate prior probabilities on the hypotheses, focusing solely on error rates under repeated sampling rather than updating beliefs based on data, which contrasts with Bayesian approaches that integrate priors for posterior decision-making. Additionally, in scenarios involving multiple simultaneous tests, the framework requires adjustments to control the overall false alarm rate, such as the Bonferroni correction, which divides α by the number of tests to maintain family-wise error control at the desired level.

Models with normal distributions

In detection theory, models assuming normal (Gaussian) distributions for noise and signal-plus-noise are foundational, particularly under the equal-variance assumption where the noise alone follows a distribution \mathcal{N}(0, \sigma^2) and the signal-plus-noise follows \mathcal{N}(\mu, \sigma^2), with \mu > 0 representing the signal strength.^[6] This setup posits that sensory evidence arises from additive Gaussian noise, enabling closed-form expressions for performance metrics and decision rules.^[55] The equal-variance case simplifies analysis because the distributions differ only in their means, reflecting scenarios where signal addition shifts the mean without altering variability.^[6] A key sensitivity measure in these models is d' = \mu / \sigma, which quantifies the separability of the two distributions in standardized units.^[6] Empirically, d' is estimated as d' = z(H) - z(FA), where H is the hit rate, FA is the false alarm rate, and z(\cdot) is the inverse cumulative distribution function of the standard normal.^[55] The receiver operating characteristic (ROC) curve under equal variances plots hit rate against false alarm rate as a concave function, specifically the cumulative distribution of a standard normal evaluated at points shifted by d', yielding H = \Phi(z(FA) + d'), where \Phi is the standard normal CDF.^[6] Bias is captured by the criterion c = -\frac{1}{2} [z(H) + z(FA)], which locates the decision threshold relative to the midpoint between distributions; c = 0 indicates unbiased responding.^[13] The likelihood ratio test for Gaussian observations derives from the Neyman-Pearson lemma, yielding \Lambda(y) = \frac{p(y \mid \text{signal})}{p(y \mid \text{noise})} = \exp\left( \frac{y \mu}{\sigma^2} - \frac{\mu^2}{2\sigma^2} \right), compared to a threshold determined by desired error rates.^[6] Decision-making proceeds by accepting the signal hypothesis if \Lambda(y) > \eta, equivalent to a linear threshold on y due to the monotonicity of the exponential.^[55] Extensions to unequal variances relax the assumption, modeling noise as \mathcal{N}(0, \sigma_n^2) and signal-plus-noise as \mathcal{N}(\mu, \sigma_s^2) with \sigma_s \neq \sigma_n, common in perceptual tasks where signals inflate variability.^[6] Here, d' generalizes to d_e' = \frac{\mu}{\sqrt{\sigma_n^2 + \sigma_s^2}/2} or similar adjustments, but ROC curves become asymmetric and curved on normal-deviate axes, complicating bias measures like c, which now depend on the variance ratio.^[55] The likelihood ratio simplifies to \Lambda(y) = \exp\left( \frac{y \mu}{\sigma_s^2} - \frac{\mu^2}{2\sigma_s^2} + \frac{y^2}{2} \left( \frac{1}{\sigma_n^2} - \frac{1}{\sigma_s^2} \right) \right), resulting in a quadratic decision boundary.^[6] For sequential testing under Gaussian observations, the sequential probability ratio test (SPRT) accumulates log-likelihood ratios from independent samples until crossing bounds set by error rates, minimizing expected sample size.^[56] In the Gaussian case, each observation contributes additively to the log-ratio, yielding a random walk with drift \mu / \sigma^2, linking SPRT to drift-diffusion models in detection tasks.^[56] Detection in correlated Gaussian noise extends the model to multivariate observations \mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{R}) under noise and \mathcal{N}(\mathbf{s}, \mathbf{R}) under signal, where \mathbf{R} is the covariance matrix.^[57] The optimal likelihood ratio involves the quadratic form \Lambda(\mathbf{y}) = \exp\left( \mathbf{y}^T \mathbf{R}^{-1} \mathbf{s} - \frac{1}{2} \mathbf{s}^T \mathbf{R}^{-1} \mathbf{s} \right), reducible to a whitened linear detector via eigendecomposition of \mathbf{R}.^[57] This accounts for spatial or temporal dependencies, improving performance over ignoring correlations.^[58]