Probit
The probit model, also known as probit regression, is a statistical method used to analyze binary outcome variables by modeling the probability of a positive outcome as a function of predictor variables through the cumulative distribution function of the standard normal distribution.[1] In this framework, the inverse standard normal cumulative distribution function, termed the probit link, transforms a linear predictor into a probability bounded between 0 and 1, assuming an underlying latent normal variable that determines the observed binary response.[2] Originating in the field of bioassay and toxicology, the probit model was introduced by Chester I. Bliss in 1934 as a tool to quantify dose-response relationships, such as the proportion of organisms affected by varying concentrations of a toxic agent.[3] Bliss coined the term "probit" from "probability unit," building on earlier work by J.H. Gaddum to linearize sigmoid-shaped response curves for easier statistical analysis.[4] This approach facilitated maximum likelihood estimation and hypothesis testing, marking a shift from graphical methods to parametric modeling in quantitative biology.[4] The probit model has since become widely applied in econometrics, social sciences, and medicine for scenarios involving dichotomous choices, such as labor force participation, medical treatment efficacy, and voting behavior.[5] Unlike the closely related logit model, which employs the logistic cumulative distribution function, the probit assumes normality in the error term of the latent variable, leading to slightly steeper probability transitions near 0.5 but similar overall predictions in most practical settings.[5] Estimation typically involves maximum likelihood methods, with extensions like random-effects probit accommodating panel data or clustered observations in longitudinal studies.[6]Definition and History
Definition of the Probit Function
The probit function, denoted \operatorname{probit}(p) or \Phi^{-1}(p), is defined as the inverse of the cumulative distribution function (CDF) \Phi of the standard normal distribution N(0,1). It transforms a probability p \in (0,1) into the corresponding quantile z on the real line such that \Phi(z) = p. This mapping allows the probit to convert bounded probabilities into unbounded z-scores, which are useful for linearizing sigmoid response curves in statistical analysis. For instance, \operatorname{probit}(0.5) = 0, reflecting the mean of the standard normal distribution, and \operatorname{probit}(0.975) \approx 1.96, which marks the approximate upper limit of the central 95% probability interval. Equivalently, the probit function can be expressed in terms of the inverse error function: \operatorname{probit}(p) = \sqrt{2} \cdot \erf^{-1}(2p - 1), where \erf^{-1} denotes the inverse error function, leveraging the known relation between the normal CDF and the error function \erf(z) = 2\Phi(\sqrt{2}z) - 1. In probabilistic interpretation, the probit function associates cumulative probabilities with normal quantiles, enabling standardization of data in models that assume underlying normal latent variables. The term "probit" originated from Chester Ittner Bliss, who defined it as 5 plus the normal deviate to ensure positive values for early computational tables, though the contemporary form omits this shift in favor of the direct inverse CDF.[7]Historical Development
The concept of the probit transformation emerged in the early 1930s as a method to linearize sigmoid dose-response curves in biological assays, particularly for quantal responses such as survival or mortality. In 1933, John H. Gaddum proposed using the inverse of the cumulative normal distribution function to model such relationships in his report on methods for biological assays depending on quantal responses, providing an early foundation for handling binary outcomes in toxicology and pharmacology.[8] This approach was formalized and popularized by Chester Ittner Bliss in 1934, who introduced the term "probit"—a contraction of "probability unit"—in his seminal paper analyzing pesticide effectiveness on insects. Bliss defined the probit as the normal equivalent deviate plus 5 to avoid negative values, applying it to transform empirical proportions of affected subjects into a scale suitable for linear regression in dose-response studies. His work, rooted in bioassays for agricultural and toxicological applications, marked the inception of probit analysis as a standard tool for estimating median effective doses.[9] David J. Finney significantly advanced the methodology in his 1947 book Probit Analysis: A Statistical Treatment of the Sigmoid Response Curve, which provided comprehensive tables, maximum likelihood estimation procedures, and extensions to multi-dose designs. Finney's contributions standardized probit methods, shifting emphasis toward rigorous statistical inference while retaining Bliss's framework; a second edition in 1962 and a third edition in 1971 incorporated computational refinements but preserved the core approach. Their collaboration, beginning in the late 1940s, further refined estimation techniques for quantal data.[10] Following World War II, probit analysis gained traction in econometrics and broader statistics during the 1950s, integrating into regression frameworks for modeling binary choices and probabilities beyond bioassays. This period saw its adaptation for economic applications, such as labor market decisions and consumer behavior, leveraging the normal distribution's interpretability. By the mid-20th century, probit had become a cornerstone of limited dependent variable modeling, with key milestones including its incorporation into statistical software and textbooks. Theoretical developments remained stable after Finney's 1971 edition, with no major shifts in the probit paradigm, though computational advances in the 1980s enhanced practicality. Notably, Michael J. Wichura's 1988 algorithm provided a high-precision method for evaluating the inverse normal cumulative distribution function, enabling efficient probit computations in numerical software and facilitating the transition to unshifted probits in modern implementations.Mathematical Properties
Functional Form and Symmetries
The probit function, denoted as \text{probit}(p) = \Phi^{-1}(p), where \Phi is the cumulative distribution function (CDF) of the standard normal distribution N(0,1), derives its functional form directly from the inversion of this CDF. The standard normal CDF exhibits even symmetry in its probability density function (PDF), \phi(-x) = \phi(x), which implies the key symmetry property \Phi(-x) = 1 - \Phi(x) for all x \in \mathbb{R}. Consequently, applying the inverse yields the probit symmetry: \text{probit}(1 - p) = -\text{probit}(p) for p \in (0,1).[11][12] This symmetry establishes the probit function as odd with respect to the midpoint p = 0.5, satisfying \text{probit}(1 - p) + \text{probit}(p) = 0. At p = 0.5, \text{probit}(0.5) = 0, anchoring the function's antisymmetry around this point. In statistical modeling, this odd property facilitates balanced interpretation of binary outcomes, where deviations from 0.5 in probability correspond to symmetric positive and negative shifts in the underlying latent normal variable, promoting equitable treatment of complementary events such as success and failure.[12] A probit-specific relation arises in the ratio of the standard normal PDF to the CDF, \phi(z)/\Phi(z) where z = \text{probit}(p), which serves as the reverse hazard rate for the normal distribution and informs hazard-like interpretations in probit-based survival or selection analyses. The derivative of the probit function underscores this connection: d/dp \, \text{probit}(p) = 1 / \phi(\text{probit}(p)). Near the boundaries, the function displays unbounded asymptotic behavior: as p \to 0^+, \text{probit}(p) \to -\infty, and as p \to 1^-, \text{probit}(p) \to +\infty, reflecting the infinite tails of the normal distribution without finite limits.[13][12]Relation to the Normal Distribution
The probit function serves as the quantile function of the standard normal distribution, defined such that \operatorname{probit}(p) = z where P(Z \leq z) = p and Z \sim N(0,1). This formulation positions the probit as the inverse of the cumulative distribution function (CDF) \Phi(z), directly linking it to z-scores that quantify deviations from the mean in units of standard deviation. In this context, applying the probit transformation standardizes probabilities to a normal scale, facilitating comparisons across distributions and enabling the interpretation of p as the area under the standard normal curve up to z. This connection underpins the probit's utility in statistical modeling, where it maps bounded probabilities to the unbounded real line while preserving the properties of normality. A precise mathematical relation exists between the probit and the error function, a fundamental special function in probability theory. The error function is given by \operatorname{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z e^{-t^2} \, dt, and its inverse allows expression of the probit as \operatorname{probit}(p) = \sqrt{2} \cdot \operatorname{erf}^{-1}(2p - 1). This derivation follows from the identity \Phi(z) = \frac{1}{2} + \frac{1}{2} \operatorname{erf}\left(\frac{z}{\sqrt{2}}\right); inverting both sides yields the probit in terms of the inverse error function, providing an analytical bridge to tabulated values and computational routines for the normal quantile. In generalized linear models for binary outcomes, the probit link function leverages this normal connection by assuming a latent continuous variable y^* = \mathbf{x}\boldsymbol{\beta} + \epsilon with \epsilon \sim N(0,1), where the observed binary response is y = 1 if y^* > 0 and y = 0 otherwise. The probability P(y=1 \mid \mathbf{x}) = \Phi(\mathbf{x}\boldsymbol{\beta}) thus normalizes non-normal probabilities via the inverse CDF, effectively transforming the linear predictor to the probability scale under latent normal errors. This setup extends linear regression principles to dichotomous data while maintaining the normalizing properties of the standard normal. The selection of the normal CDF for the probit over alternatives like the logistic emphasizes theoretical foundations rooted in symmetry and the central limit theorem. The normal distribution's symmetry ensures balanced tail behavior, aligning with assumptions of equitable error distributions in latent models. Furthermore, when binary outcomes result from aggregating numerous independent small effects—such as in random utility maximization—the central limit theorem justifies approximating the latent error as normal, as the sum of such effects converges to normality under mild conditions. This rationale supports the probit's prominence in bioassay, econometrics, and choice modeling, where aggregation is common.Computation Methods
Numerical Algorithms
The probit function, defined as the inverse of the standard normal cumulative distribution function \Phi^{-1}(p), has no closed-form expression and relies on numerical inversion techniques for evaluation.[14] These methods typically involve solving the root-finding problem \Phi(z) - p = 0 iteratively, starting from an initial guess for z based on the probability p. Common approaches include the Newton-Raphson method, which updates the estimate as z_{n+1} = z_n - \frac{\Phi(z_n) - p}{\phi(z_n)} where \phi is the standard normal density, and Halley's method, a higher-order variant that incorporates the second derivative for cubic convergence: z_{n+1} = z_n - \frac{(\Phi(z_n) - p) [1 + \frac{\phi'(z_n) (\Phi(z_n) - p)}{2 \phi(z_n)^2}]}{\phi(z_n) [1 + \frac{\phi'(z_n) (\Phi(z_n) - p)}{\phi(z_n)^2}]}. Both methods converge rapidly near p = 0.5 but require careful initial approximations for tail probabilities to avoid slow convergence or overflow.[15] A widely adopted high-precision algorithm is AS 241 by Wichura (1988), which computes the inverse for p \in (0,1) using a minimax rational approximation in the central region combined with asymptotic expansions and continued fractions in the tails, achieving an absolute error bound of less than $10^{-15} across the range.[14] For efficient approximations, rational function series derived from Chebyshev polynomials are employed; since \Phi^{-1}(p) = \sqrt{2} \erf^{-1}(2p - 1), these provide relative errors below $10^{-7} in double precision for most practical p. Continued fractions offer another approximation strategy, particularly effective near p = 0.5, by expanding the inverse as a series that converges uniformly in the moderate tails. In modern software, these algorithms are implemented for seamless computation. The R functionqnorm(p) in the base stats package uses Wichura's AS 241 algorithm, returning \Phi^{-1}(p) with machine-precision accuracy and handling edge cases by yielding -\infty for p = 0 and +\infty for p = 1.[16] Similarly, Python's SciPy library provides scipy.stats.norm.ppf(p) as the percent point function, relying on optimized C implementations of rational approximations and iterative refinement for the inverse CDF, with limits to \pm \infty at the boundaries. Accuracy is constrained by floating-point precision (typically 15-16 decimal digits in double precision), beyond which underflow or overflow issues arise in the tails. Historically, prior to electronic computers, probit values were derived from manually computed tables, such as the extensive working probits and weights tabulated by Finney and Stevens (1948) using mechanical calculators for bioassay applications.[17][18]
Differential Equation Formulation
The probit function w(p), defined as the inverse of the standard normal cumulative distribution function \Phi, satisfies the first-order nonlinear ordinary differential equation (ODE) \frac{dw}{dp} = \frac{1}{\phi(w)}, where \phi(w) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{w^2}{2}\right) is the probability density function of the standard normal distribution.[19] This ODE follows from the defining relation p = \Phi(w(p)), upon differentiation with respect to p, since \frac{dp}{dw} = \phi(w). The initial condition is w(0.5) = 0, reflecting the symmetry of the normal distribution where \Phi(0) = 0.5.[19] A power series solution to this ODE can be obtained via Taylor expansion centered at p = 0.5: w(p) = \sum_{k=0}^{\infty} c_k (p - 0.5)^k, with c_0 = 0 and subsequent coefficients c_k determined recursively by substituting the series into the ODE and matching powers of (p - 0.5). The recursion leverages the structure of \phi(w) to compute higher-order terms efficiently, enabling high-order accuracy; for example, up to 20 terms yield precision on the order of $10^{-20}.[19] This ODE-based power series approach circumvents the need for root-finding iterations typical in numerical inversion of \Phi, making it particularly useful for symbolic computations or repeated evaluations where p varies. If direct numerical integration of the ODE is required, standard solvers can be applied, though the series often provides superior efficiency for the probit due to the expansion point at the median. The formulation of such ODEs and their power series solutions for quantile functions, including the probit, was developed in the context of quantile mechanics by Steinbrecher and Shaw.[19]Applications
Probit Regression Models
Probit regression models are used to analyze binary outcome variables, where the probability of the outcome equals 1 given covariates is modeled using the cumulative distribution function of the standard normal distribution.[20] The model assumes an underlying latent variable Y^* = \mathbf{X}\boldsymbol{\beta} + \varepsilon, where \varepsilon \sim N(0,1), and the observed binary variable Y = 1 if Y^* > 0 and Y = 0 otherwise, leading to P(Y=1|\mathbf{X}) = \Phi(\mathbf{X}\boldsymbol{\beta}), with \Phi denoting the standard normal CDF.[20] Equivalently, the probit link function is g(p) = \Phi^{-1}(p) = \mathbf{X}\boldsymbol{\beta}, transforming the probability p to the linear predictor.[20] Estimation of the probit model parameters \boldsymbol{\beta} is typically performed via maximum likelihood, maximizing the log-likelihood \ell(\boldsymbol{\beta}) = \sum_{i=1}^n \left[ y_i \log \Phi(\mathbf{X}_i \boldsymbol{\beta}) + (1 - y_i) \log (1 - \Phi(\mathbf{X}_i \boldsymbol{\beta})) \right], or equivalently the likelihood L = \prod_{i=1}^n \left[ \Phi(\mathbf{X}_i \boldsymbol{\beta}) \right]^{y_i} \left[ 1 - \Phi(\mathbf{X}_i \boldsymbol{\beta}) \right]^{1-y_i}.[20] Numerical optimization methods such as Newton-Raphson or BFGS are employed due to the absence of closed-form solutions, with the score function and Hessian facilitating iterative convergence.[20] The coefficients \boldsymbol{\beta} represent changes in the latent variable scale, but interpretation focuses on marginal effects, given by \frac{\partial P(Y=1|\mathbf{X})}{\partial X_j} = \phi(\mathbf{X}\boldsymbol{\beta}) \beta_j, where \phi is the standard normal PDF; these effects vary across observations, unlike the constant marginal effects in linear probability models.[20] The model's normalization of the error variance to 1 introduces scale invariance issues, as \boldsymbol{\beta} estimates are identified only up to this fixed variance, precluding direct comparisons of effect magnitudes across models without rescaling.[20] Probit models offer advantages over ordinary least squares for binary outcomes by avoiding predicted probabilities outside [0,1] and heteroscedasticity inherent in linear approximations of nonlinear relationships.[20] Their adoption in econometrics surged post-1950s, building on early applications like probit analysis in bioassay, with influential work extending to economic choice modeling. In modern extensions, probit regression is widely applied in labor economics to model participation decisions, such as female labor force entry, where covariates like education and wages predict binary employment status.[21] Implementation is supported in software like Stata'sprobit command for maximum likelihood fitting and R's glm function with family=binomial(link="probit") for generalized linear modeling.[22]