Fact-checked by Grok 2 weeks ago

Segmented regression

Segmented regression, also known as regression or broken-stick regression, is a statistical method in that models the relationship between a response and one or more explanatory variables by fitting separate linear (or nonlinear) functions to distinct segments of the data, divided by one or more breakpoints where the slope or intercept changes. This approach allows for capturing structural changes or thresholds in the data, such as shifts in trends due to interventions or natural discontinuities, making it particularly useful for datasets exhibiting non-uniform relationships across the range of the explanatory . The technique originated in the late 1950s with Richard E. Quandt's work on estimating switching regressions, which laid the foundation for modeling piecewise linear relationships in economic data. Over time, segmented regression has evolved to include iterative estimation methods, such as those proposed by Muggeo in 2003, which simultaneously optimize locations and segment parameters using or maximum likelihood approaches. Key features include the ability to impose continuity constraints at breakpoints (ensuring segments connect smoothly) or allow discontinuities, as well as options for estimating confidence intervals around breakpoints via or hypothesis testing like the Davies test. In practice, segmented regression is widely applied in fields such as health services research for analyses—evaluating policy interventions like smoking regulations— for detecting thresholds in species responses to environmental gradients, and for modeling phase transitions in material properties. often relies on criteria like the (BIC) to compare fits with varying numbers of segments, ensuring parsimony while accounting for potential . Despite its flexibility, challenges include sensitivity to initial parameter guesses and the risk of multiple local optima, which modern implementations address through restarting algorithms.

Definition and Fundamentals

Overview and Terminology

Segmented regression is a statistical modeling in which the relationship between a dependent and one or more independent is represented by piecewise functions that are joined at specific transition points known as breakpoints or change points. These models allow the functional form of the relationship to differ across segments of the data, capturing abrupt changes in the pattern that a single continuous function might not adequately describe. Alternative terms for segmented regression include piecewise regression, broken-stick regression, broken-line regression, and threshold regression, reflecting variations in emphasis on the discontinuous or regime-shifting nature of the model. As a prerequisite, segmented regression builds upon standard , where the relationship is assumed to be a single straight line across all data points, but extends this by permitting multiple linear (or sometimes nonlinear) segments to better fit heterogeneous data structures. This approach is particularly appropriate in scenarios where the effect of the independent variable on the dependent variable shifts abruptly, such as in dose-response analyses in and , where risk may plateau or accelerate beyond a certain , or in evaluating policy interventions through studies, where pre- and post-intervention trends differ markedly. Key components of segmented regression include the segments themselves (distinct functional pieces, often linear), breakpoints (the points where segments connect and slopes may change), segment-specific slopes (rates of change within each piece), and intercepts (starting values for each segment). Compared to polynomial regression, which uses a single higher-degree polynomial to approximate nonlinearity across the entire dataset, segmented regression offers advantages in avoiding overfitting by employing simpler linear pieces within defined intervals and providing more interpretable thresholds at breakpoints, which directly correspond to meaningful changes in the underlying process rather than abstract coefficients.

Historical Context

Segmented regression, also known as piecewise linear regression, originated in the mid-20th century as a statistical technique to address non-linear relationships by dividing the domain of the independent variable into segments with distinct linear models, particularly for capturing threshold or breakpoint effects. Early developments focused on estimation challenges when breakpoints were unknown, with John Quandt's 1958 work on estimating parameters in switching regressions providing an initial foundation in econometric applications. This was followed by Derek J. Hudson's 1966 paper providing a seminal iterative algorithm for fitting segmented curves by minimizing residuals across joined submodels. This work built on prior recognition of structural changes in regression, such as Gregory C. Chow's 1960 test for equality of coefficients between two linear regimes, which formalized hypothesis testing for potential breaks in econometric models. In , the technique evolved alongside analysis to model regime shifts in economic , such as policy-induced changes in growth trends. The and marked a transition from manual graphical identification to computational methods, with advancements in optimization algorithms facilitating broader in statistical software. A key milestone came in with Vito M. R. Muggeo's proposal of an efficient iterative procedure for estimating unknown breakpoints in generalized linear models, which directly inspired the development of the influential package 'segmented' and revitalized practical use across disciplines. Post-2000, segmented regression gained prominence in through its integration with designs for evaluating , as exemplified by Wagner et al.'s 2002 framework for analyzing medication use trends before and after policy changes. Since 2010, this application has expanded significantly in and , supporting rigorous assessments of treatment thresholds and intervention impacts, while the method's spread continued from biological growth modeling to economic structural analyses and beyond.

Model Formulation

Basic Linear Model with One Breakpoint

The basic linear model with one breakpoint, also known as a two-segment piecewise linear regression, models the relationship between a response variable Y and a predictor X as two connected linear segments separated by a threshold value \psi, known as the breakpoint. This formulation assumes continuity at the breakpoint, ensuring no abrupt jump in the response function, and allows the slope to differ before and after \psi. The model can be expressed in a compact form using the positive part function: Y_i = \beta_0 + \beta_1 X_i + \beta_2 (X_i - \psi)_+ + \epsilon_i where (X_i - \psi)_+ = \max(X_i - \psi, 0), \beta_0 is the intercept, \beta_1 is the slope for X \leq \psi, \beta_2 is the change in slope after the breakpoint, \psi is the unknown breakpoint, and \epsilon_i are the errors. Equivalently, the model specifies:
  • For X_i \leq \psi: E(Y_i) = \beta_0 + \beta_1 X_i
  • For X_i > \psi: E(Y_i) = \beta_0 + \beta_1 X_i + \beta_2 (X_i - \psi) = \beta_0 + (\beta_1 + \beta_2) X_i - \beta_2 \psi
This parameterization enforces at \psi, as both segments yield the same there. For discontinuous models, an additional term \alpha \cdot I(X_i > \psi) is included to allow a jump in the intercept at the breakpoint, where \alpha represents the discontinuity and I(\cdot) is the . The parameters \beta_1 and \beta_1 + \beta_2 represent the pre- and post-breakpoint slopes, respectively, capturing a shift in the between Y and X. A non-zero \beta_2 indicates a change in the rate of increase or decrease across the breakpoint, while \psi identifies the at which this behavioral alteration occurs, such as a transition in economic or biological processes. For instance, if \beta_2 > 0, the positive between Y and X strengthens after \psi; conversely, a negative \beta_2 suggests a or reversal. The intercept \beta_0 sets the baseline level when X = 0, assuming it falls within the pre-breakpoint regime. Key assumptions underlying this model mirror those of ordinary within each segment: linearity of the relationship between Y and X holds separately for X \leq \psi and X > \psi, errors \epsilon_i are and identically distributed as N(0, \sigma^2) (homoscedasticity and ), and the \psi lies within the range of observed X values. These ensure unbiased and efficient estimation via when \psi is known, though the model becomes nonlinear in \psi when it is unknown, requiring specialized fitting approaches. This segmented model extends standard by imposing a structural at \psi, transforming it into a nonlinear solved through minimization of the , while maintaining interpretability as piecewise linear functions. Graphically, the fitted model appears as two straight lines joined at the point (\psi, \beta_0 + \beta_1 \psi), with the pre-breakpoint line extending leftward and the post-breakpoint line rightward, visually highlighting the regime shift for diagnostic purposes.

Generalizations and Extensions

Segmented regression extends beyond the basic single-breakpoint to accommodate multiple breakpoints, allowing for k segments defined by k-1 ordered breakpoints ψ_1 < ψ_2 < ... < ψ_{k-1}. In this formulation, the response is modeled as a where each segment has its own intercept and , often expressed using the positive part operator to chain the segments:
Y_i = \beta_0 + \beta_1 x_i + \sum_{j=1}^{k-1} \gamma_j (x_i - \psi_j)_+ + \epsilon_i,
where (x_i - \psi_j)_+ = \max(0, x_i - \psi_j) and \gamma_j represents the change in at the j-th breakpoint. This extension is particularly useful for capturing multiple structural changes in the relationship between variables, such as in economic with several policy shifts.
Further generalizations incorporate non-linear functions within individual segments, transforming the model into Y = f(x) for x < \psi and Y = g(x) for x \geq \psi, where f and g may be polynomials, exponentials, or other non-linear forms. For multiple breakpoints, this extends to a sequence of such functions across segments, enabling flexible modeling of complex relationships like growth curves in biology that exhibit distinct phases. Similarly, segmented regression integrates with generalized linear models (GLMs) for non-normal responses, such as binary or count data, by applying the piecewise structure to the linear predictor in the link function; for instance, in segmented logistic regression, the logit link adapts across segments to model probability changes at breakpoints. Segmented Poisson regression follows analogously for count data, adjusting rates at transition points. To mitigate abrupt changes, smooth transitions can replace sharp breakpoints using spline-based or indicator functions that gradually blend segments, such as logistic or kernel-weighted transitions. Constraints enhance model interpretability and stability, including enforcing by equating function values at breakpoints (e.g., matching endpoints of adjacent segments) or imposing monotonicity through non-negative restrictions. Fixed breakpoints can also be specified a priori based on , reducing estimation burden. For discontinuous models with multiple breakpoints, additional jump terms can be included at each ψ_j using indicator functions. These extensions introduce challenges, particularly increased complexity in identifiability as the number of segments grows, since sparse within segments can lead to unstable estimates or non-convergence in optimization. Overlapping intervals for nearby exacerbate this, requiring careful initialization and validation to ensure reliable .

Estimation Techniques

Breakpoint Identification Methods

identification in segmented regression involves algorithms designed to locate the transition points, or s, where the between the predictor and response variables changes. These methods assume a structure, such as a with one or more segments, and focus on estimating the ψ by minimizing a criterion like the (RSS) or through probabilistic . Common approaches include grid search, iterative optimization, Bayesian , and non-parametric techniques, each balancing computational efficiency, accuracy, and assumptions about the . Selection among candidate often incorporates information criteria to avoid . Grid search evaluates a finite set of candidate values for ψ across the range of the predictor variable, fitting the segmented model for each and selecting the one that minimizes the RSS. This exhaustive approach is computationally straightforward for a single breakpoint and provides the global optimum under the least squares criterion, as it directly searches the parameter space without relying on initial guesses. For two-segment linear models, maximum likelihood estimation can be achieved via a grid-search-type algorithm, ensuring exact solutions for the breakpoint location. However, grid search becomes inefficient for multiple breakpoints or large datasets, as the number of evaluations grows with the grid resolution. Iterative optimization methods start with an initial estimate of ψ, often obtained from a preliminary test like the Davies test, and refine it through numerical procedures such as Newton-Raphson or gradient descent to minimize the objective function. These algorithms treat the breakpoint as a parameter in a nonlinear regression framework, iteratively updating ψ and the segment-specific coefficients until convergence. A widely adopted iterative procedure linearizes the segmented relationship around an initial breakpoint guess, enabling ordinary least squares updates in each step, and has been shown to converge reliably when starting values are reasonable. The Davies test, which assesses the presence of a breakpoint by integrating the score statistic over possible locations, serves as an effective initializer by identifying regions of potential change without assuming a specific ψ. Despite their efficiency, iterative methods can converge to local minima if the initial guess is poor, particularly in models with multiple breakpoints. Bayesian approaches model the breakpoint as a random variable with a prior distribution, typically uniform over the data range, and estimate its posterior via (MCMC) sampling, incorporating uncertainty in both ψ and the regression parameters. This framework allows for flexible priors on the number and location of breakpoints, enabling posterior inference on their positions through or reversible jump MCMC. Hierarchical Bayesian models have been particularly useful for changepoint problems in , treating segments as a mixture and updating breakpoints conditionally on the data likelihood. Bayesian methods excel in providing credible intervals for ψ and handling multiple changepoints probabilistically, though they require substantial computational resources for MCMC convergence. Non-parametric methods, such as kernel-based or specialized changepoint , detect breakpoints without assuming a specific form for the segments, making them suitable for initial screening in segmented setups. The Pruned Exact Linear Time (PELT) , for instance, efficiently identifies multiple changepoints by minimizing a penalized over possible partitions, unlikely paths to achieve linear-time complexity. PELT adapts well to contexts by using segment-specific likelihoods or as costs, outperforming binary segmentation in speed and accuracy for large samples. These methods are valuable for exploratory analysis, where the underlying segment structure is unknown, but may require subsequent fitting for precise in segmented models. Criteria for selecting the optimal breakpoint or number of breakpoints often involve information-theoretic penalties like the (AIC) or (BIC), which balance model fit against complexity by adding terms proportional to the number of parameters, including breakpoints. In segmented regression, BIC is frequently preferred for its stronger penalty on extra segments, promoting , and has been implemented to select breakpoint counts by evaluating models up to a maximum K. These criteria help mitigate , especially when comparing grid or iterative results across multiple candidate ψ values. Limitations of breakpoint identification include bias toward the center of the data in small samples, where estimates of ψ can be imprecise or inconsistent due to insufficient observations per segment. estimates for are reliable only in large samples or when the segmented relationship is clearly defined, as small-sample variability leads to wide intervals. Additionally, the objective landscape often features multiple local minima, complicating in iterative methods and necessitating robust starting strategies. Recent advances as of 2025 include robust estimation techniques addressing heteroscedasticity, such as the sliding setsquare method for optimizing breakpoints via and parabola fitting, combined with a sliding approach for modeling variance structures, reducing the influence of outliers. Additionally, segmented trees (SLRT) improve change point detection through with linear leaf models and conditional Kendall’s τ for split selection, offering better predictive accuracy than traditional grid search or in simulations. extensions enhance segmented models by estimating conditional quantiles per segment, improving robustness to non-normal errors and heteroscedasticity, as implemented in packages like segmented and quantreg.

Parameter Fitting Procedures

In segmented regression, parameter fitting typically involves minimizing the sum of squared residuals across all segments once potential breakpoints have been identified. The objective function is formulated as the criterion: RSS(\boldsymbol{\beta}, \boldsymbol{\psi}) = \sum_{i=1}^n \left( y_i - \sum_{k=1}^K I_{k,i} (\beta_{0k} + \beta_{1k} (x_i - \psi_{k-1})) \right)^2 where I_{k,i} is an indicator for the k-th segment, \boldsymbol{\beta} includes intercepts and slopes per segment, and \boldsymbol{\psi} denotes breakpoints, with \psi_0 = -\infty and \psi_K = \infty. This non-linear optimization problem is solved using methods like the Gauss-Newton algorithm, which iteratively linearizes the model and updates parameters by solving subproblems until convergence. A common approach is conditional least squares fitting, where breakpoints are fixed at initial estimates (e.g., from grid search or Davies' test) and linear regressions are fit separately to each segment. The process iterates by updating breakpoint locations based on the fitted slopes—for instance, in a single-breakpoint model, the new breakpoint is \hat{\psi} = \tilde{\psi} + \hat{\gamma}/\hat{\beta}_2, where \tilde{\psi} is the provisional value, \hat{\gamma} adjusts the intercept, and \hat{\beta}_2 is the slope change—until the adjustment \hat{\gamma} approaches zero, indicating stability. This method, implemented in tools like the R package segmented, accommodates multiple breakpoints by sequentially adding piecewise terms and provides an approximate for parameters at . Constraints such as at breakpoints (ensuring segments join seamlessly, e.g., \beta_{02} = \beta_{01} + \beta_{11} \psi_1) or non-negativity of slopes can be enforced by reparameterizing the model or using penalized objectives, preventing unrealistic discontinuities or negative trends in applications like growth modeling. For generalized linear models (GLMs) extended to segmented structures, replaces ordinary , adapting the (IRLS) algorithm to account for segment-specific link functions and variance structures. The log-likelihood is maximized iteratively: provisional breakpoints condition the working model, weights are updated based on the dispersion, and parameters are refined until the score equations stabilize. This approach handles non-normal responses, such as counts in epidemiological data, while maintaining segment continuity through . Uncertainty in fitted parameters, including slopes and intercepts, is often quantified using the bootstrap, where residuals are resampled within segments (or pairs for dependence) to generate replicate datasets, refit the model, and compute empirical standard errors or confidence intervals from the of estimates. This non-parametric method is particularly useful when asymptotic approximations fail due to small sample sizes per segment. However, the objective function's non-convexity—arising from the nature—can lead to local minima, addressed by initializing optimization from multiple starting points (e.g., a grid of provisional breakpoints) and selecting the fit with the global minimum , with bootstrap restarting enhancing robustness in flat regions of the likelihood.

Statistical Inference

Hypothesis Testing for Segments

In segmented regression, hypothesis testing is essential to assess whether the data support the presence of breakpoints and associated slope changes, distinguishing the model from a simpler single-regression alternative. These tests evaluate the of no segmentation against alternatives where structural changes occur at one or more points, often accounting for the non-standard distribution arising from unknown breakpoint locations. Common approaches include likelihood-based, F-based, and supremum tests, each suited to different assumptions about model parameters and sample characteristics. The (LRT) compares the fitted segmented model to a single by examining the difference in log-likelihoods. Under the of no , the is defined as -2 (\log L_0 - \log L_1), where L_0 is the likelihood of the null model (single ) and L_1 is the likelihood of the alternative segmented model; this statistic follows a with equal to the difference in the number of parameters between models. The LRT is asymptotically valid for known or estimated breakpoints but requires careful handling of the profiled likelihood when the breakpoint is unknown, as the distribution may deviate from chi-squared due to boundary issues. This test is particularly useful for confirming overall model improvement after breakpoint estimation. For testing specific slope changes post-breakpoint, the F-test assesses whether the additional parameter representing the change is zero. Consider a basic segmented model Y = \beta_0 + \beta_1 X + \beta_2 (X - \psi)_+ + \epsilon, where \psi is the breakpoint and (X - \psi)_+ = \max(0, X - \psi); the null hypothesis is H_0: \beta_2 = 0, implying no slope change after \psi. The F-statistic is computed as F = \frac{(RSS_0 - RSS_1)/1}{RSS_1/(n - p)}, where RSS_0 and RSS_1 are the residual sums of squares under the null and alternative, n is the sample size, and p is the number of parameters in the full model; under H_0, it follows an F-distribution with (1, n-p) degrees of freedom. This test is straightforward for fixed or estimated \psi and is commonly applied in contexts like interrupted time series to detect intervention effects on trends. Davies' test addresses the challenge of detecting an unknown by constructing a supremum-based over possible locations. For a with potential change in , the test computes the supremum over a of candidate breakpoint locations \psi_k of the of the Wald for the slope difference, \sup_k |\hat{\beta}_2(\psi_k) / \se(\hat{\beta}_2(\psi_k))|, where the p-value is approximated using Davies' formula: \Phi(-M) + V \phi(M)/(2\pi)^{1/2}, with M = \max_k |S(\psi_k)|, V the in the statistics across the grid, \Phi the standard normal cdf, and \phi the pdf; this provides a conservative upper bound under the null of no breakpoint. The approach is robust to the parameter of the breakpoint location and is recommended when the change point is not a priori specified, though it assumes of errors. Permutation tests provide a nonparametric for robust , especially in small samples where asymptotic approximations fail. These tests generate the by randomly permuting residuals or response values while preserving the design structure, then recomputing the (e.g., LRT or F-statistic) across permutations to obtain an empirical . In segmented regression, permutations are adapted to respect the ordered covariate space, making them suitable for testing existence or slope differences when is questionable or segments are unbalanced. This method enhances reliability in small-sample scenarios, such as those with fewer than 20 observations per segment. Power considerations in these tests depend on factors like the magnitude of the slope change, the distance of the breakpoint from data edges, and segment-specific sample sizes. Power of tests like the LRT and depends on the magnitude of the slope change relative to error variance, the location of the breakpoint, and the sample sizes in each segment. Imbalanced segments generally reduce power, and simulation-based analyses (e.g., methods) are recommended to assess performance under specific conditions. Davies' test, being conservative, often requires larger samples for comparable power to other tests. When testing multiple potential breakpoints, adjustments for multiple testing are necessary to control the , as unadjusted procedures inflate Type I errors. Common methods include , which divides the significance level by the number of tests (e.g., \alpha / m for m breakpoints), or (FDR) procedures like Benjamini-Hochberg for less stringent control in exploratory settings. In changepoint contexts, these adjustments are integrated into sequential testing frameworks to select the number of segments without excessive conservatism, though they can reduce power for detecting true changes. Note that recent analyses emphasize the importance of parametrization in segmented models; for instance, in , the coefficient on the slope change term may represent the difference from baseline or the post-intervention slope directly, affecting inference.

No-Effect Range Determination

In segmented regression, the no-effect range refers to the interval surrounding the estimated breakpoint \psi within which the slope difference between segments is statistically undetectable, indicating no significant change in the relationship between the predictor and response variables. This concept is particularly relevant in models where one segment assumes a zero slope, representing a region of no observable effect. Such ranges are often constructed using Fieller's theorem to derive confidence intervals for the breakpoint or through confidence bands that assess slope constancy. The construction of no-effect ranges typically begins with estimating the confidence interval for the breakpoint \psi, followed by evaluating slope constancy within adjacent sub-ranges to delineate the boundaries where effects are negligible. In frequentist approaches, Fieller's theorem solves a quadratic equation derived from the ratio of correlated normal variables to obtain the interval limits, ensuring the range accounts for parameter covariances and avoids unbounded estimates. For instance, in threshold models, the lower segment slope is fixed at zero, and the no-effect range extends below \psi until the upper confidence limit of the pre-breakpoint slope includes zero, confirmed via testing procedures like likelihood ratio tests on sub-range data. Confidence bands around the fitted segments further refine this by identifying overlap regions where predicted responses do not differ significantly. Bayesian methods provide credible intervals for no-effect ranges by incorporating prior distributions and posterior sampling to quantify uncertainty in probabilistic terms. In piecewise linear models, (MCMC) simulations generate posterior distributions for \psi and segment slopes; a for \psi is then obtained from the 2.5th and 97.5th percentiles of the samples, with no-effect regions identified where the posterior for the slope includes zero. This approach allows for flexible priors on breakpoints and slopes, enhancing robustness in sparse data scenarios. In , no-effect ranges from segmented regression are used to identify safe exposure levels below which no adverse effects occur, modeling dose-response relationships with a zero-slope pre- segment. For example, linear models estimate the no-observed-effect concentration (NOEC) or as the breakpoint \psi, below which responses remain at baseline, aiding in regulatory assessments of . An approximate no-effect range can be formulated as \psi \pm \delta, where \delta is derived from the variance of the \psi estimate, such as \delta = k \sqrt{\widehat{\text{Var}}(\hat{\psi})}, with k \approx 2 for approximate 90% coverage based on the asymptotic of the estimator. More precise bounds use Fieller's method, solving: g = b_L^2 V_{22}, \quad \text{limits} = t \pm \frac{g V_{11} t_{\alpha/2, n-3}}{1-g} \pm \sqrt{ \frac{V_{22} - b_L^2 (1-g) (V_{11} - 2t V_{12} + t^2 V_{22} - g (V_{11} - g))}{(1-g)} }, where t is the estimate, b_L the lower , and V_{ij} the elements of the inverse matrix scaled by \sigma^2. These ranges facilitate policy-making by defining empirically supported "safe" zones, such as maximum permissible exposure limits in environmental regulations, where exceedance probabilities are minimized based on the interval's lower bound.

Applications and Examples

Interrupted Time Series Analysis

(ITS) analysis applies segmented regression to evaluate the impact of on outcomes measured over time, dividing the data into pre- and post-intervention segments where time serves as . The intervention time acts as the , allowing estimation of changes in both the level (immediate shift) and trend ( change) of the outcome following the intervention. This framework is particularly valuable in and policy evaluation, where randomized controlled trials are often infeasible, providing a quasi-experimental approach to infer by controlling for underlying temporal patterns. In ITS models using segmented regression, the outcome Y_t at time t is typically specified as: Y_t = \beta_0 + \beta_1 \cdot \text{time} + \beta_2 \cdot \text{intervention} + \beta_3 \cdot (\text{time} \times \text{intervention}) + \epsilon_t, where \text{intervention} is a step function indicator (0 before the intervention and 1 afterward), \beta_0 is the pre-intervention intercept, \beta_1 captures the pre-intervention trend, \beta_2 represents the immediate level change at the intervention, and \beta_3 indicates the post-intervention trend change. Time series data often exhibit autocorrelation in residuals, which can bias standard errors; this is addressed by incorporating autoregressive (AR) terms into the model or by applying robust standard errors, such as Newey-West estimators that account for both autocorrelation and heteroskedasticity. The advantages of this approach include its ability to control for secular trends and seasonality, enhancing the validity of intervention effect estimates in observational settings. Recent developments emphasize careful interpretation of coefficients in ITS segmented models, particularly distinguishing between two common parametrizations: one focusing on intercept differences (Bernal et al.) and another on immediate effects (Wagner et al.), with guidelines recommending selection based on the research question to avoid misinterpretation of the level change term. However, limitations persist, including the assumption of no concurrent events influencing the outcome and to unaddressed , which may require analyses or alternative models like for validation.

Environmental and Biological Examples

In , segmented regression has been applied to model the relationship between river pollutant concentrations and discharge rates, revealing breakpoints where high-flow conditions alter pollutant mobilization and dilution dynamics. For instance, in the Basin, Bayesian segmented models identified in concentration-discharge relationships for and , with breakpoints typically occurring at high discharge levels (e.g., above 10-20 m³/s depending on the river), marking a shift from dilution-dominated low-flow regimes to mobilization-enhanced high-flow regimes that increase pollutant export. This approach highlights environmental tipping points, such as when increased discharge exceeds a critical , leading to steeper slopes in pollutant loading and informing strategies to mitigate and runoff. Early applications in the 1970s, including analyses of river data following 1976 dam constructions, demonstrated similar slope changes in flow-related parameters, underscoring the method's utility in detecting hydrological shifts affecting . In biological contexts, particularly toxicology and enzyme kinetics, segmented regression captures dose-response relationships with distinct breakpoints, such as thresholds for toxicity onset or substrate saturation. This mirrors dose-response curves in broader toxicology, where segmented models estimate no-effect thresholds analogous to the Michaelis constant (K_m) in enzyme kinetics, representing the substrate concentration at half-maximal velocity; beyond this point (e.g., K_m ≈ 5 mM for many enzymes), the response plateaus due to saturation, shifting from proportional to constant activity. To illustrate fitting, consider a hypothetical of enzyme activity (Y) versus substrate concentration (X), comprising 20 observations with X ranging from 0 to 10 units. Segmented regression estimates the ψ ≈ 5, yielding a pre-breakpoint of 0.2 (indicating linear increase up to ) and a post-breakpoint of 1.1 (reflecting enhanced response or plateau ), minimizing via iterative search algorithms. The fitted model segments the data into two lines meeting at ψ, providing a to nonlinear ; visually, the pre-ψ segment rises gradually through lower X points, while the post-ψ segment steepens or flattens against clustered high-X data, with residuals evenly distributed to validate the . Such interpretations emphasize biological thresholds as saturation points akin to K_m, guiding toxicological risk assessments by quantifying safe exposure limits, while environmental breakpoints signal tipping points for pollution control. Real-world applications include water quality analyses using SegReg software, which has processed river datasets for breakpoint detection in pollutant trends, such as total nitrogen concentrations over time in agricultural watersheds, confirming change-points that align with land-use impacts.

Computational Tools

R-Based Implementations

The segmented package, developed by Vito M. R. Muggeo, is a primary tool in for fitting regression models with piecewise linear relationships, allowing one or more covariates to exhibit segmented effects alongside standard linear terms. It supports both linear models via the segmented.lm() function and generalized linear models (GLMs) through the more general segmented() function, which updates an initial lm() or glm() object by estimating breakpoints. The package accommodates multiple breakpoints for the same variable, enabling complex structures without imposing limits on their number, and extends to mixed-effects models with random breakpoints using segmented.lme(). Version 2.1-4, released on February 28, 2025, supports data, including analysis, through methods for handling sequential observations such as stepmented.ts and integration with models. The strucchange package complements segmented regression by focusing on testing and dating structural breaks in models, using methods like the generalized fluctuation test and Chow's to detect change points before estimation. Key functions such as breakpoints() identify potential structural shifts and provide intervals, which can inform the initial guesses required for fitting models in the segmented package, thus integrating the two for a complete from detection to estimation. A basic example of fitting a two-segment linear model using the segmented package involves loading the library and applying the function to an initial linear model:
r
library(segmented)
# Assume data with response y and predictor x
lm_model <- lm(y ~ x)
out <- segmented(lm_model, seg.Z = ~x, psi = list(x = 10))  # Initial guess for breakpoint at x=10
summary(out)
This code estimates the breakpoint iteratively and outputs the segmented coefficients, with seg.Z specifying the variable(s) to segment. The segmented package includes features for , such as bootstrap methods to compute confidence intervals for breakpoints and slopes (boot() function), integrated plotting tools to visualize fitted segments and residuals (plot.segmented()), and model selection via information criteria like AIC to choose the number of segments. Similarly, strucchange offers bootstrap-based testing for break stability and graphical diagnostics for fluctuation processes. While powerful for linear and GLM-based segmented regression, the packages are primarily oriented toward these frameworks, requiring custom extensions or integration with other tools like nlme for nonlinear or highly specialized cases. Detailed guidance on advanced options, including multiple segments and mixed models, is available in Muggeo's s, accessible via vignette("segmented") in .

Implementations in Other Languages

In , segmented regression is implemented through dedicated packages that facilitate piecewise linear modeling. The piecewise-regression package employs an based on Taylor expansion to simultaneously estimate breakpoints and fit line segments, incorporating bootstrap restarting to avoid optima and providing confidence intervals via standard errors. It supports hypothesis testing using the Davies test and model comparison via the , with customizable visualizations for fitted models. This implementation draws from foundational approaches in Muggeo (2003) for automated breakpoint detection. Another tool, the segmented package, focuses on parametric changepoint characterization with support for connected linear models under and links, limited initially to up to two segments and one predictor. It includes a Bayesian variant using via the emcee sampler for flexible segment counts and a demonstration class for log-linear models. MATLAB provides segmented regression capabilities through user-contributed toolboxes and built-in nonlinear fitting procedures. The Piecewise Linear Model toolbox extends two-segment least-squares fitting to n segments, estimating coefficients, breakpoints, and R-squared values by minimizing residuals across intercepting line segments. It builds on the described in Bogartz (1968) for intercepting segments and supports verbose output for diagnostic purposes. Additionally, MATLAB's nlinfit function in the Statistics and Toolbox enables custom segmented models by defining piecewise equations as nonlinear functions, allowing estimation of parameters like slopes and transition points via least-squares optimization. In , the LinearSegmentation.jl package offers algorithms for fitting linear functions with automatic breakpoint identification. It includes three methods: a sliding window approach for initial ation based on fit thresholds like R-squared or RMSE, a top-down recursive splitting for hierarchical partitioning, and a shortest-path dynamic programming technique using graph-based A* search for globally optimal . Parameters such as minimum length and overlap control ensure robust fits, with outputs as index arrays defining boundaries. This implementation is inspired by dynamic programming strategies in related packages like dpseg. Statistical software like and also support segmented regression through procedural commands for nonlinear and piecewise modeling. In , PROC NLIN fits segmented models by specifying piecewise functional forms, such as two connected linear segments with a shared , using nonlinear least-squares to estimate parameters like intercepts, slopes, and transition points. For time-series applications, PROC AUTOREG extends this to autocorrelation-adjusted segmented regression. In , the nl command performs nonlinear least-squares estimation for user-defined piecewise equations, while mkspline generates piecewise linear splines with knots at specified points, enabling regression of outcomes on segmented predictors. The itsa user-written command further applies segmented regression to interrupted time-series analysis, parameterizing level and trend changes at intervention points.

References

  1. [1]
    Interpretation of coefficients in segmented regression for interrupted ...
    Apr 16, 2025 · Segmented regression, also known as piecewise regression or broken-stick regression, is a method in regression analysis in which a series of ...
  2. [2]
    [PDF] piecewise-regression (aka segmented regression) in Python
    Dec 2, 2021 · Piecewise regression (also known as segmented regression, broken-line regression, or break- point analysis) fits a linear regression model ...<|control11|><|separator|>
  3. [3]
    [PDF] Parameter Estimation in Linear-Linear Segmented Regression
    Segmented regression is a type of nonlinear regression that allows differing functional forms to be fit over different ranges of the explanatory variable.
  4. [4]
  5. [5]
  6. [6]
    [PDF] segmented: An R Package to Fit Regression Models with Broken ...
    Segmented or broken-line models are regression models where the relationships between the re- sponse and one or more explanatory variables are.
  7. [7]
    chngpt: threshold regression model estimation and inference
    Oct 16, 2017 · Threshold regression models are a diverse set of non ... Segmented regression model · Regression kink · Jump-type · Change point.
  8. [8]
    Fitting Segmented Curves Whose Join Points Have to Be Estimated
    Apr 10, 2012 · The overall least squares solution is found when a complete curve to be fitted consists of two or more submodels, and these have to be joined at points whose ...
  9. [9]
    Tests of Equality Between Sets of Coefficients in Two Linear ... - jstor
    Econometrica, Vol. 28, 3 (July 1960). TESTS OF EQUALITY BETWEEN SETS OF COEFFICIENTS IN TWO. LINEAR REGRESSIONS1. BY GREGORY C. CHOW. Having estimated a linear ...
  10. [10]
    [PDF] Structural Breaks in Time Series∗ - Boston University
    Abstract. This chapter covers methodological issues related to estimation, testing and com- putation for models involving structural changes.
  11. [11]
    Segmented regression analysis of interrupted time series studies in ...
    Segmented regression analysis is a powerful statistical method for estimating intervention effects in interrupted time series studies.
  12. [12]
    8.8 - Piecewise Linear Regression Models | STAT 501
    We discuss what is called piecewise linear regression models here because they utilize interaction terms containing dummy variables.
  13. [13]
    Fitting Segmented Regression Models by Grid Search - jstor
    A grid-search method is described for fitting segmented regression curves with unknown transition points, suitable for a wider class of models.
  14. [14]
    [1311.6392] A Comprehensive Approach to Universal Piecewise ...
    Nov 25, 2013 · A Comprehensive Approach to Universal Piecewise Nonlinear Regression Based on Trees. Authors:N. Denizcan Vanli, Suleyman S. Kozat.
  15. [15]
    Estimating regression models with unknown break‐points - Muggeo
    Sep 8, 2003 · This paper deals with fitting piecewise terms in regression models where one or more break-points are true parameters of the model.
  16. [16]
    Inference in Two-Phase Regression - Taylor & Francis Online
    Procedures are outlined for obtaining maximum likelihood estimates and likelihood confidence regions in the intersecting two-phase linear regression model.
  17. [17]
    (PDF) Segmented mixed models with random changepoints
    Aug 7, 2025 · We present a simple and effective iterative procedure to estimate segmented mixed models in a likelihood based framework.
  18. [18]
    The Log Likelihood Ratio in Segmented Regression - Project Euclid
    "The Log Likelihood Ratio in Segmented Regression." Ann. Statist. 3 (1) 84 - 97, January, 1975. https://doi.org/10.1214/aos/1176343000. Information. Published ...
  19. [19]
    The use of segmented regression in analysing interrupted time ...
    Jun 19, 2014 · Segmented regression analysis is the recommended approach for analysing data from an interrupted time series study.
  20. [20]
    [PDF] SELECTING THE NUMBER OF CHANGE-POINTS IN SEGMENTED ...
    Abstract: Segmented line regression has been used in many applications, and the problem of estimating the number of change-points in segmented line ...
  21. [21]
    SELECTING THE NUMBER OF CHANGE-POINTS IN SEGMENTED ...
    For k0 = 1, Hinkley (1971) studied asymptotic behavior of the maximum likelihood estimators under normal theory, and provided the asymptotic variances of the ...
  22. [22]
    (PDF) Power analysis in segmented regression - ResearchGate
    Nov 3, 2021 · This paper deals with power analysis in segmented regression, namely estimation of sample size or power level when the study data being ...Missing: considerations | Show results with:considerations
  23. [23]
    Statistical Power of Piecewise Regression Analyses of Single-Case ...
    Jul 5, 2022 · Our study addresses the interplay between the test power of piecewise regression analysis and important design specifications of single-case research designs.
  24. [24]
    Multiple Change-Point Detection: A Selective Overview - Project Euclid
    In particular, we present a strategy to gather and aggregate local information for change-point detection that has become the cornerstone of several emerging ...
  25. [25]
    None
    ### Summary of Threshold Point Models Using Segmented Regression
  26. [26]
    [PDF] Confidence Intervals on the Join Point in Segmented Regression
    Comparisons are made to a previously derived method based on Fieller's theorem, using a simulated and an observed ... Fitting segmented regression models by grid ...
  27. [27]
    Comparing statistical analyses to estimate thresholds in ecotoxicology
    Apr 8, 2020 · The first slope was set to zero, so the model has three parameters and it assumes that there is no effect before the threshold. Note that ...
  28. [28]
    A Bayesian piecewise linear model for the detection of breakpoints ...
    In this paper, we defined a piecewise-linear regression model which can detect two thresholds in the relationships via changes in slopes.Missing: seminal | Show results with:seminal
  29. [29]
    Guidance on aneugenicity assessment - - 2021 - EFSA Journal - Wiley
    Aug 5, 2021 · The piecewise linear regression analysis (or breakpoint regression) ... no-effect concentration of 85 ng/ml was identified from in vitro ...<|separator|>
  30. [30]
    Interrupted time series regression for the evaluation of public health ...
    Jun 8, 2016 · In this tutorial we use a worked example to demonstrate a robust approach to ITS analysis using segmented regression. We begin by describing the ...Abstract · Introduction · The interrupted time series... · Summary
  31. [31]
    Interrupted time series regression for the evaluation of public health ...
    Jun 8, 2016 · In this tutorial we use a worked example to demonstrate a robust approach to ITS analysis using segmented regression. We begin by describing the ...
  32. [32]
  33. [33]
    Dependence of Guaiacol Peroxidase Activity and Lipid Peroxidation ...
    Jun 10, 2015 · Therefore, segmented regression (piecewise regression) was used, that is, this dose–response dependence was divided into 2 parts, and an ...
  34. [34]
    Nonlinear Regression Modelling: A Primer with Applications and ...
    Mar 15, 2024 · The nonlinear model discussed here is a segmented regression function model (also called a broken-stick, piecewise, or change-point model).
  35. [35]
    [PDF] Water quality trend and change-point analyses using integration of ...
    The newly developed integrated LWPR-SegReg approach is not only limited to the assessment of trends and change-points of water quality parameters but also has a ...
  36. [36]
    SegReg : calculator for segmented (piecewise) linear regression (in ...
    SegReg performs segmented (piecewise) linear regression (in splines) This calculator is totally free for download.
  37. [37]
    CRAN: Package segmented
    Feb 28, 2025 · The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <https://www.r-project.org/doc ...
  38. [38]
    [PDF] segmented: Regression Models with Break-Points / Change-Points ...
    Feb 28, 2025 · Description Fitting regression models where, in addition to possible linear terms, one or more covari- ates have segmented (i.e., broken-line ...
  39. [39]
    [PDF] strucchange: Testing, Monitoring, and Dating Structural Changes
    Kleiber (2002), strucchange: An R Package for Testing for. Structural Change in Linear Regression Models. ... ∗ segmented regression breakpoints, 10. 68 ...
  40. [40]
  41. [41]
    segmented - PyPI
    segmented is a Python toolbox for performing segmented regression, with an initial focus on characterizing parametric changepoints.
  42. [42]
    Piecewise linear model - File Exchange - MATLAB Central
    Piecewiselm(x,y,n) performs n-segmented linear regression. This is an extension of two-segmented linear regression described in Bogartz (1968).
  43. [43]
    stelmo/LinearSegmentation.jl: Linear segmented regression - GitHub
    This is a small package that performs linear segmented regression: fitting piecewise linear functions to data, and simultaneously estimating the best ...
  44. [44]
  45. [45]
  46. [46]
    How can I run a piecewise regression in Stata? - OARC Stats - UCLA
    How can I run a piecewise regression in Stata? | Stata FAQ · Try 1: Separate regressions · Try 2: Separate regression with age centered at 14 · Try 3: Combined ...