Fact-checked by Grok 2 weeks ago

Semiparametric regression

Semiparametric regression refers to a class of statistical models that combine finite-dimensional components with infinite-dimensional nonparametric elements to flexibly capture relationships between a response variable and predictors, balancing the interpretability of parametric forms with the adaptability of nonparametric approaches. These models address limitations of purely regression, such as linear or logistic models, which assume specific functional forms that may not hold, and fully nonparametric methods, which impose minimal structure but often suffer from the curse of dimensionality and reduced precision. By estimating key parameters of interest efficiently while allowing nonparametric smoothing for nuisance functions, semiparametric enables robust inference in diverse applications, including , , and . A of semiparametric regression is the partially linear model, formulated as Y = X'\beta + g(Z) + \epsilon, where \beta is a finite-dimensional capturing linear effects, g(Z) is an unknown nonparametric function of covariates Z, and \epsilon is an error term. This model, pioneered by Robinson in 1988, achieves root-n-consistent and asymptotically normal estimation of \beta through nonparametric residualization techniques, such as kernel-based subtraction, without requiring full specification of g. Extensions include varying models, where coefficients depend nonparametrically on additional variables, and single-index models, which reduce dimensionality by projecting predictors onto a parametric before nonparametric . Estimation in semiparametric regression typically involves procedures: first, nonparametric estimation of components using methods like smoothing or splines, followed by refinement of the parameters of interest. Popular techniques include backfitting algorithms for additive models and penalized splines, which enforce via penalties akin to mixed-effects models, facilitating computational and automatic selection. These approaches ensure asymptotic under mild conditions, such as of the nonparametric functions, and support via consistent variance estimators. Semiparametric models have found wide application in handling complex data structures, such as longitudinal studies, dose-response curves in pharmacokinetics, and economic analyses with endogenous regressors. For instance, in biostatistics, they model nonlinear trends in survival data while parametrically estimating treatment effects; in econometrics, they accommodate unobserved heterogeneity through nonparametric components. Ongoing developments focus on high-dimensional settings, variable selection, and integration with machine learning, enhancing their relevance in big data contexts.

Fundamentals

Definition and characteristics

Semiparametric regression encompasses statistical models that integrate finite-dimensional components, such as linear terms for specific predictors, with infinite-dimensional nonparametric components, like smooth functions of other covariates, to describe the relationship between response variables and predictors. This hybrid approach avoids the rigid functional form assumptions of fully models while preserving interpretability and efficiency in estimating parameters of interest. A defining characteristic of semiparametric regression is the presence of a finite-dimensional of interest—often structural coefficients like slopes—alongside infinite-dimensional parameters, such as unknown distributions or flexible functional forms, which are estimated nonparametrically. These models achieve -n consistent and asymptotically estimators for the components, offering gains over purely nonparametric methods by reducing variance without imposing overly restrictive assumptions. Additionally, they provide robustness to misspecification in the nonparametric parts, as the estimates remain reliable even if the components are approximated flexibly. In general form, semiparametric regression can be expressed as Y = g(X, Z) + \epsilon, where Y is the response, Z enters parametrically (e.g., via \beta Z with finite-dimensional \beta), X enters nonparametrically (e.g., via m(X) as a smooth function), g(\cdot) combines these elements, and \epsilon satisfies E(\epsilon \mid X, Z) = 0. This structure allows targeted inference on \beta while flexibly capturing nonlinear effects in m(X). An illustrative example of this hybrid nature is the , a special case where the nonparametric component decomposes into a sum of univariate smooth functions, such as E(Y \mid X) = \alpha + \sum_{j=1}^p f_j(X_j), balancing interpretability with flexibility for multiple predictors.

Historical development

Semiparametric regression emerged in the fields of and during the late 1970s and early 1980s as a methodological response to the limitations of fully parametric models, which risked misspecification, and fully nonparametric approaches, which suffered from of dimensionality in high-dimensional data. Early developments drew on foundational ideas in smoothing techniques, including Grace Wahba's work on smoothing splines in the 1980s, which provided a theoretical basis for flexible nonparametric components within structured models. A pivotal contribution came from Peter Robinson in 1988, who introduced root-n-consistent estimation for partially linear models, bridging parametric linearity in some covariates with nonparametric smoothing for others, thus establishing a cornerstone for semiparametric efficiency. The 1990s marked a period of rapid advancement through rigorous asymptotic theory and efficient estimators for diverse model classes. Peter Bickel and colleagues' 1993 monograph formalized efficient and adaptive frameworks for semiparametric models, deriving semiparametric bounds and influencing subsequent methods across statistics and . For single-index models, Hidehiko Ichimura developed semiparametric in 1993, achieving root-n consistency while relaxing distributional assumptions, and Roger Klein and Richard Spady proposed an efficient estimator for binary response variants, attaining the semiparametric bound. Concurrently, and introduced varying-coefficient models in 1993, extending generalized additive models to allow coefficients to vary smoothly as functions of covariates, enhancing flexibility in applications. By the 2000s, semiparametric regression evolved from ad-hoc implementations to a mature framework integrated with techniques, such as boosting and penalized splines, facilitating scalable computation and representations for large datasets. This shift emphasized practical implementation in high-dimensional settings while preserving theoretical guarantees, with influential contributions from researchers like Robinson, Ichimura, Klein, Spady, and Wahba shaping the field's trajectory toward broader interdisciplinary adoption.

Comparisons with Other Approaches

Parametric regression

Parametric regression models specify the relationship between the response variable and predictors through a fixed functional form defined by a finite number of parameters. In these models, the of the response Y is given by E(Y | X) = f(X; \beta), where f is a known function and \beta is a of unknown parameters to be estimated from data. A example is the model, Y = X\beta + \epsilon, where X includes an intercept and predictors, and \epsilon represents random errors. Key assumptions underlying parametric regression include linearity in parameters (for linear models), independence of observations, homoscedasticity (constant variance of errors), and normality of errors for valid inference. These assumptions enable estimation via methods such as ordinary least squares (OLS) for linear models or maximum likelihood for generalized cases. Under OLS, the parameter estimator is \hat{\beta} = (X^T X)^{-1} X^T Y, which minimizes the sum of squared residuals and provides unbiased estimates when assumptions hold. Violation of these assumptions, such as non-constant variance or non-normal errors, can invalidate inference, though diagnostics like residual plots help assess model fit. Parametric models offer advantages in interpretability, as parameters directly quantify effects (e.g., \beta_j as the change in Y per unit increase in predictor j), and computational efficiency, requiring fewer data points for reliable estimates compared to more flexible approaches. is straightforward, with well-established asymptotic properties for testing and intervals. However, these models are sensitive to misspecification of the functional form, leading to biased estimates and poor predictive performance, particularly with nonlinear relationships or high-dimensional data where the assumed structure may not capture underlying complexities. Common examples include multiple linear regression for continuous outcomes, which extends to several predictors while maintaining the same estimation framework, and for binary outcomes, where the link models P(Y=1 | X) = \frac{1}{1 + e^{-X\beta}} via maximum likelihood. Semiparametric regression addresses some limitations of fully models by incorporating nonparametric components for added flexibility, as detailed in the fundamentals section.

Nonparametric regression

Nonparametric regression encompasses statistical models that estimate the of a response variable given predictors without imposing a predefined form on the underlying regression . Specifically, it aims to approximate m(x) = E(Y \mid X = x) directly from data using flexible, data-driven techniques such as or spline-based methods, allowing the function to adapt to the observed patterns without assuming linearity or other rigid structures. Key methods in nonparametric regression include local polynomial regression and kernel smoothing, where a bandwidth parameter h controls the degree of local averaging. A foundational approach is the Nadaraya-Watson estimator, formulated as \hat{m}(x) = \frac{\sum_{i=1}^n K\left( \frac{x_i - x}{h} \right) y_i}{\sum_{i=1}^n K\left( \frac{x_i - x}{h} \right)}, which weights observations by a kernel function K based on their proximity to x, providing a smoothed estimate of the regression function. These techniques offer high flexibility in capturing complex, nonlinear relationships in the data, avoiding the bias that arises from misspecifying a . However, suffers from the curse of dimensionality, where convergence rates slow dramatically as the number of predictors increases, often requiring exponentially more data for reliable in high dimensions. Additionally, it is computationally intensive due to the need for local computations across the data and poses challenges for , as the lack of parametric structure complicates deriving standard errors or confidence intervals. Representative examples include (locally estimated scatterplot smoothing), which extends weighted local fitting to handle robustness against outliers, and regression, which models the regression function as a draw from a prior to enable probabilistic predictions. Semiparametric regression mitigates some limitations of fully nonparametric models by incorporating parametric components for improved efficiency and inference, particularly in higher dimensions.

Core Model Classes

Partially linear models

Partially linear models represent a foundational class of semiparametric regression frameworks, combining a parametric linear component with a nonparametric to capture both structured and flexible relationships in . The model is typically specified as Y = X\beta + g(Z) + \varepsilon, where Y is the response , X is a of covariates with finite-dimensional parameter \beta, g(Z) is an unknown smooth of the covariate Z, and \varepsilon is an error term satisfying E(\varepsilon \mid X, Z) = 0. This allows researchers to impose linearity on variables of primary interest while permitting arbitrary nonlinearity in others, balancing interpretability and flexibility. Identification of the model relies on conditional mean independence of the error term and a rank condition ensuring that the parametric component is not fully explained by the nonparametric one, specifically that E[(X - E(X \mid Z))(X - E(X \mid Z))'] is positive definite. This partial linearity enables the separation of linear effects in X from nonlinear effects in Z, facilitating efficient of \beta without fully specifying g. If g(Z) happens to be linear, the model reduces to a fully , recovering ordinary efficiency under standard conditions. Additionally, by parametrizing key covariates linearly, the approach mitigates the curse of dimensionality that plagues fully nonparametric methods, particularly when the dimension of Z is moderate. A representative application appears in labor economics, where the model has been used to analyze wage determination as \log(\text{Wage}) = \beta \cdot \text{Education} + g(\text{Experience}) + \varepsilon, treating education's effect as linear while allowing experience's influence to follow an unknown nonlinear pattern. Partially linear models serve as a special case of more general additive models, where the nonparametric component is restricted to a single function rather than sums over multiple covariates.

Single-index models

Single-index models constitute a fundamental class of semiparametric regression frameworks in which the of the response variable depends on the predictors solely through a one-dimensional linear , combined with an unspecified nonparametric function. This structure enables flexible modeling of potentially nonlinear relationships while imposing sufficient restrictions to ensure practical estimation. Seminal developments in this area include early work on estimating index coefficients via average derivatives and subsequent advancements in least-squares and maximum-likelihood approaches for both continuous and discrete outcomes.90114-K) The canonical formulation for a continuous response Y is given by Y = g(\alpha^\top X) + \varepsilon, where X is a of predictors, \alpha is an unknown parameter defining the index direction, g(\cdot) is an unknown function, and \varepsilon satisfies E(\varepsilon \mid X) = 0. For outcomes, the model specifies the success probability as P(Y=1 \mid X) = G(\alpha^\top X), with G(\cdot) a nonparametric . These formulations capture the essence of single- models by reducing the dimensionality of the nonparametric component to a univariate problem. Identification of \alpha is achieved only up to scale, as multiplying \alpha by a constant and adjusting g or G accordingly yields an equivalent model; a common is \|\alpha\| = 1 to resolve this indeterminacy.90114-K) The primary purpose of single-index models is to circumvent the curse of dimensionality that plagues fully nonparametric regression with high-dimensional predictors, where estimation efficiency deteriorates rapidly as the number of covariates increases. By restricting the dependence to a single index, these models permit reliable nonparametric estimation of the link function along one dimension, balancing interpretability from the parametric index with the flexibility of nonparametric methods. If the link function is assumed parametric—such as logistic for binary responses—the model nests within the class of generalized linear models, allowing for specification tests of linearity by comparing the semiparametric fit to a fully parametric linear alternative. This nesting property enhances the models' utility for inference on the direction of influence while relaxing strong functional form assumptions.90114-K) A prominent application arises in econometric binary choice models, where individual utility or participation decisions depend on an unobserved index of observable covariates, such as , , and market wages, through an unknown monotonic link. For instance, estimating labor force participation probabilities using on demographic factors leverages the single-index structure to avoid misspecification from assumed parametric forms like or , while still recovering the relative importance of covariates via the parameter. Such models have been instrumental in empirical studies of economic behavior, providing robust insights into decision-making processes without over-reliance on distributional assumptions.

Varying coefficient models

Varying coefficient models represent a class of semiparametric frameworks where the coefficients are permitted to vary smoothly as nonparametric functions of an additional covariate, typically denoted as U. The general form of the model is given by Y = \sum_{j=1}^p \beta_j(U) X_j + \epsilon, where Y is the response , X = (X_1, \dots, X_p)^T are the predictor , \beta_j(U) are smooth, unknown functions capturing the varying effects, and \epsilon is a mean-zero error term with finite variance. This structure blends the interpretability of with the flexibility of nonparametric methods, allowing the linear relationships between predictors and the response to adapt locally across levels of the modulating U. The interpretation of these models emphasizes how the coefficients \beta_j(U) evolve with U, thereby nonparametrically accommodating interactions between the predictors X_j and the modulator U without assuming a specific functional form for the variation. For instance, at different values of U, the influence of each X_j on Y can strengthen, weaken, or even change sign, providing insights into heterogeneous effects that fixed-coefficient models cannot capture. This approach generalizes traditional linear models by relaxing the constancy assumption on coefficients while maintaining a parametric linear form conditional on U. A key property is its ability to encompass partially linear models as a special case, where one or more \beta_j(U) are constant (i.e., independent of U), thus bridging parametric and fully nonparametric specifications. Estimation of the varying coefficients \beta_j(U) is commonly achieved through local methods, which approximate each \beta_j locally around a point u in the support of U using a linear . This involves minimizing a objective, such as \sum_i K_h(U_i - u) [Y_i - \{\beta(u) + \beta'(u)(U_i - u)\}^T X_i]^2, where K_h is a with h, yielding local linear estimators for both the coefficient and its slope. Such techniques enable the reconstruction of smooth coefficient functions while leveraging smoothing for the nonparametric components (as detailed in the section). An illustrative application arises in growth curve analysis, where regression slopes representing growth rates vary with age or time as the modulator U. For example, in modeling across countries, the coefficients on predictors like or population can be expressed as smooth functions of a development index U, revealing how these effects differ between developing and mature economies. This setup allows for the detection of dynamic patterns in longitudinal or , such as evolving relationships in studies where slopes for nutritional factors change over age groups.

Estimation and Inference

Kernel smoothing techniques

Kernel smoothing techniques provide a flexible approach to estimating the nonparametric components in semiparametric regression models by computing weighted local averages of the data. These methods rely on a K(\cdot), which assigns weights to observations based on their proximity to the point of evaluation, and a h that controls the degree of smoothing—smaller values of h yield more flexible but noisier estimates, while larger values produce smoother approximations. This nonparametric estimation is particularly suited for the unknown functions in semiparametric models, allowing the parametric parts to be estimated efficiently without strong assumptions on the form of the nonparametric components. In partially linear models, kernel smoothing is applied through a two-step procedure to separate the and nonparametric effects. First, the covariates associated with the component (X) are regressed on those linked to the nonparametric part (Z) to obtain residuals, effectively partialling out the influence of Z on X. Then, ordinary is used on the residuals of Y and X to estimate the coefficients \hat{\beta}, followed by on the residuals of Y - X\hat{\beta} against Z to estimate the nonparametric function g(\cdot). This approach, introduced by Robinson, ensures root-n consistency for \hat{\beta} while accommodating the nonparametric estimation of g. For single-index models, kernel smoothing is employed after estimating the index direction \alpha, often via average derivative methods or series approximations, to recover the link function. Once \hat{\alpha} is obtained, the projected data W = X\hat{\alpha} are used in a to estimate the unknown function g(W), minimizing a objective that weights observations locally around evaluation points. Ichimura's semiparametric least squares estimator integrates this kernel step to achieve and asymptotic for the index coefficients up to scale. Bandwidth selection in kernel smoothing for semiparametric models typically involves cross-validation or rules to minimize by balancing and variance. Cross-validation assesses candidate bandwidths by leaving out each observation and predicting it using the smoother fitted on the rest, selecting the h that minimizes the average prediction error. A common kernel estimator for the univariate nonparametric component takes the form \hat{g}(z) = \frac{1}{nh} \sum_{i=1}^n K\left( \frac{Z_i - z}{h} \right) (Y_i - X_i \hat{\beta}), where the summation weights residuals adjusted for the parametric fit. methods estimate the optimal h by approximating the unknown regression function and its derivatives from pilot estimates. These selectors adapt to the data's structure, ensuring reliable smoothing across model classes. The primary advantage of kernel smoothing in semiparametric regression lies in its ability to consistently estimate parameters in the nonparametric components without compromising the of the parametric estimates, achieving the same rates as if the nonparametric parts were known. This separation preserves the \sqrt{n}- and asymptotic of the parametric estimators, making kernel methods a for in models like partially linear and single-index structures.

Profile likelihood methods

Profile likelihood methods in semiparametric regression involve concentrating out the nonparametric parameters to derive a profiled objective function focused on the components, which is then maximized to obtain efficient estimates. This approach reduces the infinite-dimensional optimization problem to a finite-dimensional one by solving for the nonparametric part that maximizes the likelihood for each fixed value of the parameters, thereby facilitating on the parameters of interest. The resulting profile likelihood behaves asymptotically like a standard likelihood, enabling the use of conventional likelihood-based tools such as confidence intervals and hypothesis tests. In generalized semiparametric models of the form Y_i \sim F(\mu_i), where \mu_i = X_i \beta + g(Z_i) and g is an unknown nonparametric function, the profile log-likelihood for \beta is given by l(\beta) = \sup_g \sum_i \log f(Y_i \mid X_i \beta + g(Z_i)), which is obtained by maximizing over g for fixed \beta. This profiled objective allows for joint estimation of parametric and nonparametric components while emphasizing the efficiency of the parametric estimator. Such methods are particularly useful in models where the nonparametric component acts as a nuisance parameter that must be accounted for without fully specifying its form. For single-index models, Ichimura's semiparametric (SLS) method provides a specific implementation by minimizing \sum_i [Y_i - m(\alpha^T X_i)]^2, where m is estimated nonparametrically (e.g., via ) conditional on the index \alpha^T X_i, subject to a normalization constraint such as \alpha_1 = 1 to identify the direction up to . This profiling approach yields a \sqrt{n}- for \alpha in continuous response settings. In the context of binary single-index models, the Klein-Spady estimator extends profile likelihood principles by approximating the conditional probabilities using a multinomial logit structure integrated with nonparametric density estimation, maximizing a profiled quasi-likelihood to achieve semiparametric efficiency. This method profiles out the unknown link function while ensuring the estimator attains the information bound for the index parameters under correct model specification. Overall, profile likelihood methods deliver estimators that achieve semiparametric efficiency bounds when the model is correctly specified, providing a robust framework for inference in diverse semiparametric settings.

Asymptotic theory

In semiparametric regression models, the asymptotic of estimators differs markedly between the parametric and nonparametric components. The parametric parameters, such as the vector \beta, achieve \sqrt{n}- under suitable conditions that decouple their from the nonparametric parameters. In contrast, the nonparametric components, often estimated via smoothing with a second-order assuming twice-differentiable functions, converge at slower rates, typically O_p(n^{-2/5}) or O_p(n^{-4/5}) in L_2 , reflecting the curse of dimensionality inherent in nonparametric . The semiparametric efficiency bound delineates the minimal asymptotic variance attainable for estimating \beta. This bound arises from the efficient information matrix, obtained by projecting the parametric score function onto the orthogonal complement of the nuisance tangent space, which captures all possible nonparametric variations. The resulting lower bound on the asymptotic variance is \operatorname{Var}(\hat{\beta}) \geq I(\beta)^{-1}, where I(\beta) denotes the efficient information matrix; for the partially linear model Y = X'\beta + g(Z) + \epsilon, this yields I(\beta) = E[ \sigma^2 (X - E[X|Z])(X - E[X|Z])' ]^{-1}, with \sigma^2 = \operatorname{Var}(\epsilon). Asymptotic inference for \beta relies on the central limit theorem, yielding \sqrt{n} (\hat{\beta} - \beta) \overset{d}{\to} N(0, V), where V is the efficient variance estimated through influence functions derived from the efficient score. For the nonparametric components, inference often employs bootstrap methods to approximate their distributions, accommodating the slower convergence rates. Key theoretical results hinge on Neyman orthogonality, which ensures that the first-order bias from nonparametric estimation errors does not affect the \sqrt{n}-consistency of \hat{\beta}, provided the estimating equations are insensitive to local perturbations in the nuisance parameters. Valid inference further requires undersmoothing the nonparametric estimates—choosing bandwidths smaller than optimal to eliminate bias terms of order o_p(n^{-1/2}). Challenges in achieving these asymptotics include managing in the nonparametric , addressed through techniques like higher-order that reduce from O(h^2) to O(h^k) for k > 2, albeit at the cost of increased variance. Such methods preserve the parametric rate while enhancing overall efficiency, though they demand careful tuning to balance trade-offs.

Applications

Econometric examples

Semiparametric regression models have been widely applied in labor economics to estimate the returns to while accounting for nonlinear effects of other covariates, such as labor market experience, on . In partially linear models, the return to education is parametrically specified as a linear term, while the influence of experience is captured nonparametrically to avoid functional form misspecification. This approach enables consistent of the education under mild conditions, as demonstrated in early theoretical work that laid the foundation for such applications. For instance, empirical analyses of wage data from surveys like the have shown concave experience profiles that peak around mid-career. In demand analysis, single-index models serve as a tool for in problems, projecting multiple covariates onto a single index to model choice probabilities semiparametrically. A prominent example is the estimation of transportation mode choices, where unobserved heterogeneity affects decisions between bus and car usage. Using data from expenditure surveys, researchers estimated the probability of bus usage as a nonparametric of an index comprising income, travel times, and distances, revealing that increases in relative bus travel time reduce usage probability, while accounting for flexible heterogeneity improves fit over models. This method handles high-dimensional data efficiently without assuming a specific link . Varying coefficient models in semiparametric regression allow treatment effects to vary smoothly with covariates, providing a flexible framework for policy evaluation where impacts differ by demographics. In assessing job programs, these models estimate how on post-training earnings varies with age or prior , using kernel-based local to capture interactions nonparametrically. For example, analyses of randomized experiments, such as the National JTPA Study, indicate heterogeneous policy responses, with positive effects for some subgroups like adult women and zero or negative effects for youth or certain male groups, highlighting variations that parametric models overlook. This approach aids in targeting interventions to subgroups with positive returns. A key advantage of semiparametric regression in is its ability to address through nonparametric controls for factors, mitigating from omitted variables or measurement error without fully specifying the conditional distribution. In settings with fixed effects, these models incorporate individual-specific intercepts nonparametrically while estimating slopes, allowing for consistent even when regressors correlate with unobservables. Applications to dynamic labor supply models demonstrate that such controls reduce in elasticity estimates compared to linear fixed effects, enhancing reliability in policy impacts like changes. As a case study, Ichimura's semiparametric least squares method has been employed in models to analyze bidder , projecting bids onto a single index that includes private values and covariates while nonparametrically estimating the distribution of valuations. In first-price sealed-bid s, this approach estimates bidder risk parameters and strategic adjustments, such as bid shading relative to value. Such applications inform auction design in and sales by quantifying behavioral responses to reserve prices.

Biostatistical uses

In , semiparametric regression plays a crucial role in , where the Cox proportional hazards model serves as a foundational semiparametric approach by specifying the hazard function nonparametrically while parameterizing the effects of covariates. This model enables flexible estimation of ratios without assuming a specific for the hazard, making it robust to misspecification in times observed under right censoring. Extensions incorporate varying models to capture time-dependent effects, allowing covariate influences to evolve dynamically, such as in studies of efficacy over time in clinical trials. For longitudinal data, semiparametric methods like partially linear mixed models facilitate the analysis of growth trajectories by combining parametric fixed effects with nonparametric components for random curves, accommodating within-subject correlations and irregular measurement times. These models are particularly useful in tracking outcomes in repeated measures studies, such as progression, where the nonparametric element captures nonlinear patterns without imposing rigid functional forms. In dose-response modeling for toxicity studies, single-index semiparametric models reduce multidimensional covariates into a single effective index while smoothing the response surface, enabling estimation of nonlinear relationships between exposure levels and adverse outcomes like thresholds. This approach is valuable in research, where it helps identify safe exposure limits by flexibly modeling monotonic or sigmoidal response curves without full parametric constraints. Seminal applications include Hastie and Tibshirani's varying coefficient models in , used to assess how risk factors, such as behavioral risks, vary by age or time in large-scale data like the Italian PASSI . These models have been applied to quantify age-varying effects on outcomes like , providing insights into population-level trends. Implementation in often relies on software like R's gam package, which supports fitting generalized additive models—a class encompassing many semiparametric regressions—for routine analysis of complex health datasets. A key benefit of semiparametric regression in these biostatistical contexts is its ability to handle censoring in survival data and clustering in longitudinal studies without requiring complete specifications for or structures, thereby enhancing robustness and interpretability in . Recent applications as of 2023 include semiparametric models in analyzing efficacy with time-varying effects in longitudinal biostatistical studies, allowing nonparametric baseline hazards while estimating treatment effects.

References

  1. [1]
    [PDF] 7 Semiparametric Methods and Partially Linear Regression
    The seminal papers are Carroll (1982, Annals of Statistics) and Robinson (1987,. Econometrica). The setting is a linear regression yi. = X/i + ei. E(ei j Xi) ...
  2. [2]
    Semiparametric Regression - David Ruppert, M. P. Wand, R. J. Carroll
    Semiparametric regression is concerned with the flexible incorporation of non-linear functional relationships in regression analyses.
  3. [3]
    Semiparametric Regression
    Book description​​ Semiparametric regression is concerned with the flexible incorporation of non-linear functional relationships in regression analyses. Any ...Missing: influential | Show results with:influential
  4. [4]
    [PDF] Root-N-Consistent Semiparametric Regression
    P. M. ROBINSON. STONE, C. J. (1981): "Admissible Selection of an Accurate and Parsimonious Normal Linear. Regression Model," Annals of Statistics, 9, 475-485.
  5. [5]
    [PDF] ESTIMATION OF SEMIPARAMETRIC MODELS*
    A semiparametric model for observational data combines a parametric form for some component of the data generating process (usually the behavioral relation.
  6. [6]
    [PDF] Semiparametric Statistics - Columbia University
    Apr 4, 2018 · A semiparametric model involves both parametric and nonparametric components, focusing on estimating a finite-dimensional parameter.
  7. [7]
  8. [8]
  9. [9]
    [PDF] semiparametric estimation :...
    In econometrics, most of the attention to semiparametric methods dates from the late 1970s and early 1980s, which saw the development of parametric models for ...
  10. [10]
    Root-N-Consistent Semiparametric Regression
    Jul 1, 1988 · While the paper focuses on the simplest interesing setting of multiple regression with independent observationsextensions to other econometric ...
  11. [11]
    Efficient and Adaptive Estimation for Semiparametric Models
    Free delivery 14-day returnsMay 8, 1998 · This book is about estimation in situations where we believe we have enough knowledge to model some features of the data parametrically.
  12. [12]
    Semiparametric least squares (SLS) and weighted SLS estimation of ...
    For the class of single-index models, I construct a semiparametric estimator of coefficients up to a multiplicative constant that exhibits 1 √ n ...
  13. [13]
    An Efficient Semiparametric Estimator for Binary Response Models
    be characterized by an index. The estimator is shown to be consistent, asymptotically normally distributed, and to achieve the semiparametric efficiency ...
  14. [14]
    Varying‐Coefficient Models - Hastie - 1993 - Wiley Online Library
    We explore a class of regression and generalized regression models in which the coefficients are allowed to vary as smooth functions of other variables.
  15. [15]
    Semiparametric regression during 2003–2007 - PubMed Central - NIH
    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and ...Missing: characteristics | Show results with:characteristics
  16. [16]
    (PDF) Parametric versus Semi and Nonparametric Regression Models
    In this article, differences between models, common methods of estimation, robust estimation, and applications are introduced.Missing: limitations | Show results with:limitations
  17. [17]
    5.3 - The Multiple Linear Regression Model | STAT 501
    The multiple linear regression model to represent non-linear relationships between the response variable and the predictor variables.Missing: limitations | Show results with:limitations
  18. [18]
    [PDF] OLS: Estimation and Standard Errors - MIT OpenCourseWare
    The model: y = Xβ +ε where y and ε are column vectors of length n (the number of observations), X is a matrix of dimensions n by k (k is the.
  19. [19]
    [PDF] Nonparametric Regression 1 Introduction - Statistics & Data Science
    Kernel estimators and local polynomial estimator are examples of linear smoothers. Definition: An estimator bm of m is a linear smoother if, for each x, there ...
  20. [20]
    [PDF] 3 Nonparametric Regression
    3.1 Nadaraya-Watson Regression. Let the data be (yi;Xi) where yi is real ... In general, the kernel regression estimator takes this form, where k(u) is ...
  21. [21]
    E. A. Nadaraya, “On Estimating Regression”, Teor. Veroyatnost. i ...
    Document Type: Article. Language: Russian. Citation: E. A. Nadaraya, “On Estimating Regression”, Teor. Veroyatnost. i Primenen., 9:1 (1964), 157–159; Theory ...
  22. [22]
    Robust Locally Weighted Regression and Smoothing Scatterplots
    Apr 5, 2012 · Robust Locally Weighted Regression and Smoothing Scatterplots. William S. Cleveland Bell Telephone Laboratories, Murray Hill, NJ, 07974, USA.Missing: original | Show results with:original
  23. [23]
    [PDF] Nonparametric Regression With Gaussian Processes - Brown CS
    This is a generalization of the previous equation; to work with it we need to be able to define priors over the infinite space of functions y.
  24. [24]
    Root-N-Consistent Semiparametric Regression - jstor
    P. M. ROBINSON. STONE, C. J. (1981): "Admissible Selection of an Accurate and Parsimonious Normal Linear. Regression Model," Annals of Statistics, 9, 475-485.
  25. [25]
    [PDF] High Dimensional Inference in Partially Linear Models
    This data set contains the wage information of 534 workers and their years of experience, education, living region, gender, race, occupation and marriage.
  26. [26]
    Estimation and Variable Selection for Semiparametric Additive ... - NIH
    Semiparametric additive partial linear models, containing both linear and nonlinear additive components, are more flexible compared to linear models, ...
  27. [27]
    Single-Index Models | SpringerLink
    Semiparametric Methods in Econometrics; Chapter. Single-Index Models. Chapter. pp 5–53; Cite this chapter. Download book PDF · Semiparametric Methods in ...
  28. [28]
    [PDF] 8 Semiparametric Single Index Models
    In his PhD thesis, Ichimura proposed a semiparametric estimator, published later in the Journal of Econometrics (1993). Ichimura suggested replacing g with the ...
  29. [29]
  30. [30]
    Kernel Smoothing in Semiparametric Regression - IntechOpen
    The scope of this chapter is to provide estimation techniques for the nonparametric regression function, including kernel smoothing, spline smoothing and local ...
  31. [31]
    Bandwidth selection through cross-validation for semi-parametric ...
    We study bandwidth selection for a class of semi-parametric models. The proper choice of optimal bandwidth minimizes the prediction errors of the model.
  32. [32]
    On Profile Likelihood: Journal of the American Statistical Association
    We show that semiparametric profile likelihoods, where the nuisance parameter has been profiled out, behave like ordinary likelihoods in that they have a ...
  33. [33]
    Efficiency of profile likelihood in semi-parametric models
    Mar 31, 2010 · Profile likelihood is a popular method of estimation in the presence of an infinite-dimensional nuisance parameter, as the method reduces ...
  34. [34]
    Semiparametric Efficiency Bounds - jstor
    Semiparametric models are those where the functional form of some components is unknown. Effi bounds are of fundamental importance for such models.
  35. [35]
    SEMIPARAMETRIC INFERENCE IN A PARTIAL LINEAR MODEL
    In this paper an asymptotically efficient estimator of β is constructed solely under mild smoothness assumptions on the unknown η, f and g, thereby removing the ...
  36. [36]
    [PDF] Double/Debiased Machine Learning for Treatment and Structural ...
    Neyman orthogonality is a joint property of the score ψ(W; θ, η), the true parameter value η0, the parameter set T, and the distribution of W. It is not ...
  37. [37]
    large sample theory for semiparametric regression models with two ...
    Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by.Missing: seminal | Show results with:seminal
  38. [38]
    [PDF] Estimating Marginal and Average Returns to Education
    Nov 3, 2006 · 14 Combining the model for S with the model for Y implies a partially linear model for the conditional expectation of Y :
  39. [39]
    [PDF] a praise for varying-coefficient models in causal analysis - RERO DOC
    Apr 18, 2015 · We have explicitly shown the inadequacy of traditional econometric inference to estimate an average effect when the coefficients are endogenous.
  40. [40]
    [PDF] Endogeneity in Nonparametric and Semiparametric Regression ...
    Abstract. This paper considers the nonparametric and semiparametric methods for estimating regression models with continuous endogenous regressors.
  41. [41]
    [PDF] Series Estimation of Partially Linear Panel Data Models with Fixed ...
    This paper considers the problem of estimating a partially linear semipara- metric fixed effects panel data model with possible endogeneity. Using the.
  42. [42]
    Econometrics of Auctions and Nonlinear Pricing - Annual Reviews
    Aug 2, 2019 · This review surveys the growing literature on the econometrics of first-price sealed-bid auctions and nonlinear pricing.
  43. [43]
    Semi-parametric regression model for survival data - PubMed Central
    Cox proportional hazards model is a semi-parametric model that leaves its baseline hazard function unspecified. The rationale to use Cox proportional hazards ...
  44. [44]
    [PDF] Statistical Methods in Medical Research - Trevor Hastie
    Of particular interest in the proportional hazards setting is the varying coefficient model of Hastie and Tibshirani,15 in which the parameter effects can ...
  45. [45]
    Analysis of Longitudinal Data with Semiparametric Estimation ... - NIH
    This paper uses a semiparametric varying-coefficient partially linear model to analyze longitudinal data, focusing on estimating covariance functions.
  46. [46]
    [PDF] Analysis of Longitudinal Data With Semiparametric Estimation of ...
    A semiparametric varying coefficient partially linear model for longitudinal data is introduced, and an estimation procedure for model coefficients using a ...
  47. [47]
    [PDF] Semiparametric profile likelihood estimation for continuous ...
    For the subset of observations greater than zero, we fit a semiparametric single-index model [26], implemented ... Threshold dose-response models in toxicology.
  48. [48]
    Application of the varying coefficient model to the behaviour risk ...
    May 13, 2015 · In this case, the coefficients are a function of time. This model is referred to by Hastie and Tibshirani [27] as a varying coefficient model ...
  49. [49]
    [PDF] Package 'gam' - Generalized Additive Models - CRAN
    gam is used to fit generalized additive models, specified by giving a symbolic description of the additive predictor and a description of the error distribution ...