Fact-checked by Grok 2 weeks ago

Parametric model

A parametric model is a family of probability distributions indexed by a finite-dimensional \theta \in \Theta \subset \mathbb{R}^k, where each distribution P_\theta is uniquely associated with a specific value of the , often expressed through a density function p(x; \theta) with respect to a dominating measure. These models assume that the underlying data-generating process can be fully described by a fixed number of , enabling the specification of the entire once the parameters are estimated. In statistics and , parametric models are characterized by their reliance on distributional assumptions, such as approximate for many common forms, which allows for efficient methods like (MLE). Under regularity conditions—such as smoothness of the log-density and of the —M LE estimators are consistent and asymptotically , facilitating hypothesis testing and construction via likelihood ratio statistics that follow a \chi^2 . Advantages include computational and statistical power when assumptions hold, as fewer parameters lead to more precise estimates with smaller sample sizes compared to nonparametric alternatives. However, these models can be inflexible, potentially leading to biased results or misleading (e.g., incorrect p-values) if the true deviates from the assumed form, such as in cases of or outliers. Common examples of parametric models include the normal distribution N(\mu, \sigma^2), parameterized by mean \mu and variance \sigma^2; the Poisson distribution for count data, with rate parameter \theta where the maximum likelihood estimator is the sample mean; and the Bernoulli distribution for binary outcomes, often part of broader exponential families. In contrast to nonparametric models, which make no fixed assumptions about the distribution shape and require larger samples for reliability, parametric approaches are preferred for normally distributed data or when prior knowledge justifies the parameterization, though nonparametric methods are more robust for small or non-normal samples. The conceptual foundations of parametric modeling trace back to early probabilistic work, including ' contributions in the 1700s, as highlighted in historical analyses of statistical development.

Core Concepts

Definition

A parametric model is a in which the probability distributions for the observed are fully specified by a fixed, finite number of parameters, irrespective of the amount of available. This approach assumes that the underlying data-generating belongs to a predefined family of distributions, where the characteristics of the distribution are captured entirely by these parameters. Formally, the data are modeled as arising from a family of probability distributions \{P_\theta : \theta \in \Theta\}, where \Theta is a fixed, finite-dimensional , typically an open subset of \mathbb{R}^k for some k. The \theta indexes the distributions, allowing the model's complexity to remain constant and independent of the sample size n. This fixed-dimensional structure enables efficient by concentrating estimation efforts on the low-dimensional \Theta. In contrast to models where the representational capacity expands with the dataset—such as nonparametric approaches—parametric models impose a predetermined functional form, limiting flexibility but simplifying and interpretation. This distinction highlights the paradigm's reliance on strong distributional assumptions to achieve . The development of parametric statistical inference has roots dating back to the , with significant advancements in the early through the foundational contributions of Ronald A. Fisher and on likelihood-based methods for parameter estimation and hypothesis testing. These works established the framework for treating parameters as fixed entities within specified distributional families, laying the groundwork for modern parametric inference.

Mathematical Formulation

In a parametric model, the data-generating process is formalized through a family of probability distributions indexed by a finite-dimensional vector. Specifically, let y = (y_1, \dots, y_n) denote a sample of n observations, each drawn independently from a distribution with probability or mass f(y_i \mid \theta), where \theta = (\theta_1, \dots, \theta_p)^\top \in \Theta \subseteq \mathbb{R}^p is the vector belonging to a fixed-dimensional \Theta. The distribution of the sample is then given by p(y_1, \dots, y_n \mid \theta) = \prod_{i=1}^n f(y_i \mid \theta), assuming , which parameterizes the entire likelihood of the data under the model. The , central to parametric modeling, is defined as L(\theta; y) = \prod_{i=1}^n f(y_i \mid \theta), viewed as a of \theta for fixed observed y. This formulation encapsulates how the model specifies the probability of the observed as a of the parameters, enabling by maximizing or analyzing L(\theta; y) to \theta or derive properties of the . In this notation, \theta represents the unknown parameters to be inferred, while y denotes the realized ; of \theta can proceed via point estimates (yielding a single value) or interval estimates (providing a range with associated confidence). A broad class of models belongs to the , which admits a parameterization for analytical tractability. The general form for a in this family is f(y \mid \theta) = \exp\left[ \eta(\theta)^\top T(y) - A(\theta) + B(y) \right], where \eta(\theta) is the natural parameter (a function of \theta), T(y) is the , A(\theta) is the log-partition ensuring , and B(y) is a base measure term. This structure highlights the nature by confining variation to the finite-dimensional \theta, facilitating derivations in likelihood-based inference.

Examples

Statistical Examples

One prominent example of a parametric model is the normal distribution, which models continuous data assuming a bell-shaped curve defined by two parameters: the μ, representing the , and the variance σ², capturing the spread around the . This distribution underpins many statistical procedures, such as the t-test for comparing means of small samples from normally distributed populations, originally developed by in 1908, and analysis of variance (ANOVA) for assessing differences across multiple group means under normality assumptions, as formalized by in the 1920s. Another classic parametric model is the , suitable for count data where events occur independently at a constant average rate λ, the single denoting the expected number of occurrences in a fixed . It is particularly applied in modeling , such as the number of arrivals in a or defects in , where the probability of zero events is e^{-λ} and higher counts become increasingly unlikely as λ is small. In , the model exemplifies a approach by assuming the response y relates to predictors X through a fixed-dimensional β, expressed as y = Xβ + ε, where ε follows a with 0 and variance σ²I, enabling estimation of β via . This formulation, first detailed by in 1805 for orbital predictions, assumes a linear functional form known a priori. The serves as a parametric model for outcomes, parameterized by n, the fixed number of independent trials, and , the success probability per trial, with the giving the likelihood of exactly k successes as \binom{n}{k} ^k (1-)^{n-k}. It is commonly used in , such as polling or , where the sample proportion \hat{} = k/n provides an unbiased of . These models, including , , , and distributions, assume the functional form is known a priori, which facilitates exact —such as intervals and tests—relying on sufficient statistics when sample sizes are adequate.

Machine Learning Examples

In , parametric models represent a of algorithms where the model's expressiveness is determined by a fixed set of learnable parameters, allowing for scalable on large datasets without the complexity growing with data volume. These models are particularly suited for predictive tasks such as and clustering, where the parameter space remains constant regardless of input dimensionality. Logistic regression serves as a foundational parametric model for , modeling the probability of a positive outcome as P(y=1|x) = \sigma(x^T \beta), where \sigma(z) = \frac{1}{1 + e^{-z}} is the and \beta is the fixed-dimensional vector of parameters to be estimated. This approach assumes a linear relationship in the log-odds space, enabling efficient computation for high-dimensional features in applications like detection and . Gaussian mixture models (GMMs) extend parametric modeling to and clustering by representing data as a finite weighted of multivariate Gaussian distributions, parameterized by component means \mu_k, matrices \Sigma_k, and mixing coefficients \pi_k for k = 1, \dots, K, where K is predefined. GMMs are widely applied in tasks, such as speaker identification and , due to their ability to capture data distributions with a compact set. Linear support vector machines (SVMs) formulate as finding an optimal defined by w \cdot x + b = 0, where w and b are fixed-dimensional parameters that maximize the margin between classes while minimizing errors. This parametric structure excels in high-dimensional spaces, such as text categorization, by focusing on support vectors to define the without requiring the full dataset during inference. Neural networks with fixed architectures, such as multilayer perceptrons, operate as models by predetermining the number of layers, neurons, and connections, resulting in a of weights and biases that are optimized during . Unlike non-parametric methods, this fixed parameterization allows for rapid evaluation and deployment in tasks like image recognition, where the model complexity does not scale with data size. In practice, the fixed parameter count of these models facilitates efficient through optimization, as exemplified in frameworks like introduced in 2015, which support distributed computation for large-scale parametric learning.

Properties and Assumptions

Key Properties

Parametric models are characterized by their finite-dimensional parameter space, where the of the is fully specified by a fixed, finite number of parameters, regardless of the sample size. This finite-dimensionality promotes , as the model's remains constant and does not scale with the volume of , allowing for simpler representations that capture essential patterns without unnecessary elaboration. A key advantage of parametric models is their interpretability, as the parameters often carry direct statistical or physical meaning. For instance, in models, the coefficients represent effect sizes, quantifying the change in the response variable associated with a unit change in a predictor while holding others constant. This interpretability facilitates understanding of underlying relationships and aids in scientific inference. The fixed parameter structure also enables computational efficiency, particularly for large datasets, by permitting closed-form solutions or rapid optimization algorithms. In cases like for linear models or maximum likelihood for distributions, parameters can be computed directly without iterative numerical methods, reducing both time and resource demands. Under regularity conditions, parametric models support asymptotically efficient estimators, such as the maximum likelihood estimator, which achieve the Cramér-Rao lower bound on variance as the sample size grows. This bound provides the minimal possible variance for unbiased estimators, ensuring optimal precision in large-sample settings. models are identifiable when the mapping from the parameter vector to the induced is , guaranteeing that distinct parameter values produce distinct distributions and allowing unique recovery of the true parameters from observed data.

Underlying Assumptions

Parametric models fundamentally rely on the assumption that the chosen parametric family accurately represents the underlying data-generating process, meaning the functional form specified by the model correctly captures the true relationship between variables. This correct specification is essential for the validity of inferences drawn from the model, as deviations from the true form can lead to biased results. For instance, in regression contexts, assuming a linear relationship when the true process is nonlinear constitutes a violation of this assumption. A core prerequisite for parametric modeling is that observations are independent and identically distributed (i.i.d.), implying that each data point is drawn independently from the same distribution without systematic dependencies or variations across samples. This i.i.d. condition underpins the theoretical guarantees for estimation and in parametric frameworks, ensuring that sample converge to parameters as the sample size increases. Violations, such as in time series , undermine the model's reliability. For in parametric models, particularly via methods like , several regularity conditions must hold to ensure asymptotic properties such as and of estimators. These include the differentiability of the log-likelihood function with respect to the parameters and the existence of finite moments for the score function, which facilitate the application of central limit theorems and delta methods. Without these conditions, estimators may not achieve their desirable theoretical behaviors. The absence of model misspecification is another critical , where any deviation—such as incorrect distributional form or omitted variables—can result in inconsistent estimates that fail to converge to the true values even as sample size grows. In models, for example, assuming homoscedastic errors when heteroscedasticity is present leads to valid but inefficient estimates of the parameters, while invalidating calculations and tests. Such misspecifications highlight the sensitivity of parametric approaches to unmodeled heterogeneity in variance. Violations of these underlying assumptions can be assessed through goodness-of-fit tests, such as the developed by in 1900, which evaluates whether observed data frequencies align with those expected under the parametric model. This test provides a quantitative measure to detect discrepancies in the functional form or distributional assumptions, enabling researchers to refine or reject the model accordingly.

Estimation and Inference

Parameter Estimation Techniques

Parameter estimation in parametric models involves determining the values of the model parameters \theta that best fit the observed , typically by optimizing a derived from the model's likelihood or moments. These techniques assume the form of the or functional relationship is known, allowing the parameters to be inferred from samples. Common methods include , the method of moments, , and Bayesian approaches, each with distinct theoretical foundations and computational properties. Maximum likelihood estimation (MLE) seeks the parameter value \hat{\theta} that maximizes the likelihood function L(\theta; y) for observed data y, formally defined as \hat{\theta} = \arg\max_{\theta} L(\theta; y), or equivalently, the log-likelihood \ell(\theta; y) = \log L(\theta; y). Introduced by Ronald Fisher in 1922, MLE provides a general framework for estimation across parametric families by selecting parameters that make the data most probable under the model. Under standard regularity conditions, such as differentiability of the log-likelihood and identifiability of \theta, the MLE is consistent, meaning \hat{\theta} \to \theta_0 in probability as the sample size n \to \infty, and asymptotically efficient, achieving the Cramér-Rao lower bound for the variance of unbiased estimators. Additionally, \sqrt{n}(\hat{\theta} - \theta_0) converges in distribution to a normal random variable with mean zero and variance equal to the inverse Fisher information matrix, enabling asymptotic normality for large-sample inference. A key property of MLE is invariance: if \hat{\theta} is the MLE of \theta, then g(\hat{\theta}) is the MLE of g(\theta) for any one-to-one function g. For complex models like mixtures, where direct maximization is intractable, the expectation-maximization (EM) algorithm, proposed by Dempster, Laird, and Rubin in 1977, iteratively computes MLEs by alternating between expectation (E-step) and maximization (M-step) of a surrogate likelihood, converging to a local maximum under mild conditions. The method of moments equates sample moments to their population counterparts to solve for parameters, offering a straightforward, non-iterative approach suitable for distributions with explicit moment formulas. Developed by Karl Pearson in 1894, it uses the first k sample moments \hat{m}_j = n^{-1} \sum_{i=1}^n y_i^j to match the theoretical moments m_j(\theta), yielding equations solved for the k-dimensional \theta. For the normal distribution N(\mu, \sigma^2), the first two moments give \hat{\mu} = \bar{y} (sample mean) and \hat{\sigma}^2 = n^{-1} \sum_{i=1}^n (y_i - \bar{y})^2 (population variance estimator). While computationally simple, method-of-moments estimators are generally less efficient than MLE but remain consistent under moment existence. In models of the form y_i = x_i^T \beta + \epsilon_i, where \epsilon_i are independent errors, estimation minimizes the sum of squared residuals \sum_{i=1}^n (y_i - x_i^T \beta)^2 to obtain \hat{\beta}, which coincides with MLE under Gaussian errors. Attributed to Adrien-Marie Legendre's 1805 publication and independently derived by around 1795 for astronomical applications, this method yields the closed-form solution \hat{\beta} = (X^T X)^{-1} X^T y for X, assuming full column . is invariant to affine transformations and efficient in the Gaussian case, forming the basis for in heteroscedastic settings. Bayesian estimation treats parameters as random variables, computing the posterior distribution \pi(\theta | y) \propto L(\theta; y) \pi(\theta), where \pi(\theta) is a reflecting beliefs or regularization, and point estimates like the posterior mean or mode are derived therefrom. Rooted in ' 1763 theorem, modern applications emphasize conjugate priors for tractable posteriors, such as the normal-inverse-gamma for normal models, providing shrinkage toward prior means to mitigate in small samples. Unlike frequentist methods, Bayesian estimates incorporate uncertainty via the full posterior, with methods enabling computation for high-dimensional \theta.

Inference Procedures

In parametric models, inference procedures allow for drawing conclusions about the unknown parameters \theta based on the estimated values \hat{\theta}, quantifying uncertainty and testing hypotheses under the assumed model structure. One key method is the construction of confidence intervals for \theta, which leverage the asymptotic normality of maximum likelihood estimators (MLEs). Specifically, under regularity conditions, \sqrt{n} (\hat{\theta} - \theta) \xrightarrow{d} \mathcal{N}(0, I(\theta)^{-1}), where n is the sample size and I(\theta) is the Fisher information matrix; this implies that an approximate (1 - \alpha) confidence interval for \theta is given by \hat{\theta} \pm z_{\alpha/2} \sqrt{I(\hat{\theta})^{-1}/n}, with z_{\alpha/2} the standard normal quantile. Hypothesis testing in parametric models often employs test statistics derived from the . The (LRT) compares nested models by computing the statistic -2 \log \Lambda = 2 [\ell(\hat{\theta}) - \ell(\hat{\theta}_0)], where \ell denotes the log-likelihood, \hat{\theta} the unrestricted MLE, and \hat{\theta}_0 the MLE under the ; under the null, this statistic asymptotically follows a \chi^2_p distribution with p equal to the difference in dimensions. Similarly, the assesses the null H_0: \theta = \theta_0 using the (\hat{\theta} - \theta_0)^T I(\hat{\theta}) (\hat{\theta} - \theta_0), which also converges in distribution to \chi^2_p under H_0, providing a direct measure of deviation scaled by the information matrix. In a Bayesian framework for models, proceeds by deriving the posterior p(\theta | y) \propto p(y | \theta) p(\theta), from which credible are obtained as quantiles of the posterior, such as the central (1 - \alpha) [\theta_{ \alpha/2}, \theta_{1 - \alpha/2}] where P(\theta_{ \alpha/2} \leq \theta \leq \theta_{1 - \alpha/2} | y) = 1 - \alpha. For complex posteriors, (MCMC) methods generate samples to approximate these empirically, enabling even when conjugate priors are unavailable. Parametric inference distinguishes itself by permitting exact distributional results in certain cases, such as the for the sample mean under assumptions, which provides precise confidence intervals and tests without relying on large-sample approximations, in contrast to non-parametric approaches like that typically require simulation.

Comparisons and Extensions

With Non-Parametric Models

Non-parametric models are those in which the underlying functional form is not specified in advance, and the effective number of parameters grows with the sample size n, allowing the model to adapt flexibly to the data; a classic example is , where the density is estimated directly from the observations without assuming a specific . In contrast, parametric models impose a fixed, finite-dimensional structure on the data-generating process, which introduces the risk of misspecification if the chosen form does not match reality, whereas non-parametric approaches avoid such rigid assumptions but demand larger datasets for reliable estimation and often yield results that are more challenging to interpret due to their complexity. The trade-offs between the two paradigms are evident in their performance characteristics: parametric models excel in scenarios with limited data when their assumptions align with the true process, achieving faster convergence rates like O(n^{-1/2}), while non-parametric models shine in abundant data regimes without strong priors, such as using k-nearest neighbors for classification instead of logistic regression, where the former captures nonlinear patterns more readily at the cost of computational intensity. However, non-parametric models circumvent parametric assumptions at the expense of vulnerability to the curse of dimensionality, where estimation accuracy deteriorates rapidly as the input dimension increases because the effective sample size per dimension shrinks, as demonstrated in Stone's analysis of optimal minimax convergence rates for nonparametric regression, which scale as n^{-m/(2m+d)} with smoothness m and dimension d. Selection between parametric and non-parametric models depends on the context: forms are favored for their interpretability and efficiency in low-dimensional problems with plausible assumptions, whereas non-parametric methods are ideal for exploratory analyses in higher dimensions or when the is unknown and flexibility is essential.

With Semi-Parametric Models

Semi- models integrate finite-dimensional components with infinite-dimensional nonparametric elements, providing a framework that balances structure and flexibility in . This structure allows for partial specification of the data-generating process while leaving other aspects unspecified, distinguishing them from fully models that require complete functional form assumptions. A representative example is the partially linear model, given by
y = x^T \beta + g(z) + \epsilon,
where \beta is the finite-dimensional component, g is an arbitrary nonparametric function, and \epsilon denotes the term. In contrast to models, which fully specify the form for maximal efficiency under correct assumptions, semi-parametric models relax nonparametric parts to enhance robustness to misspecification, albeit with reduced efficiency relative to a well-specified alternative. A key application arises in through the Cox proportional hazards model, where the effects of covariates enter parametrically via coefficients, while the baseline hazard function is treated nonparametrically.
Parametric models can be embedded as submodels within semi-parametric frameworks to enable testing of parametric restrictions, leveraging efficient score methods that project the parametric score onto the of the nonparametric nuisance . The semi-parametric bounds, developed by Bickel et al. in , establish the asymptotic lower variance limits for such estimators, bridging the efficiency gains of parametric models with the flexibility of nonparametric components.

References

  1. [1]
    [PDF] Regular Parametric Models and Likelihood Based Inference
    Dec 6, 2006 · A parametric model is a family of probability distributions, where each distribution is associated with a parameter in a finite dimensional ...
  2. [2]
    [PDF] Basics of Statistical Machine Learning 1 Parametric vs ... - cs.wisc.edu
    A parametric model is one that can be parametrized by a finite number of parameters. We write the. PDF f(x) = f(x; θ) to emphasize the parameter θ ∈ Rd. In ...
  3. [3]
    [PDF] Parametric and Nonparametric: Demystifying the Terms
    Parametric procedures assume a normal distribution and its parameters, while nonparametric procedures do not rely on assumptions about the distribution shape.
  4. [4]
    [PDF] 1 Parametric Models.
    In statistical terminology an observation is a point x ∈ X that has been observed. The true value of the parameter is unknown, although it is known.
  5. [5]
    Parametric and Nonparametric Machine Learning Algorithms
    A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model. No ...
  6. [6]
    [PDF] 1 Parametric and Nonparametric Statistics - Stat@Duke
    When we are willing to use a specific parametric model and limit our attention to func- tions in some finite-dimensional set f ∈ F = {fθ : θ ∈ Θ}, we can use ...
  7. [7]
    A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713-1935
    ### Summary of "A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713-1935"
  8. [8]
    [PDF] 2 Writing Statistical Models
    Let a random variable X ∼ P = Pθ, where the parameter θ belongs to a set (the parameter space) Θ. If Θ is a subset of some finite dimensional Euclidean space.
  9. [9]
    1.5 - Maximum Likelihood Estimation | STAT 504
    The likelihood function is essentially the distribution of a random variable (or joint distribution of all values if a sample of the random variable is obtained) ...<|control11|><|separator|>
  10. [10]
    [PDF] Chapter 8 The exponential family: Basics - People @EECS
    Our trick for revealing the canonical exponential family form, here and throughout the chapter, is to take the exponential of the logarithm of the “usual ...
  11. [11]
    Approximate is Better than “Exact” for Interval Estimation of Binomial ...
    Mar 22, 2012 · For interval estimation of a proportion, coverage probabilities tend to be too large for “exact” confidence intervals based on inverting the binomial test.
  12. [12]
    [PDF] Large-Scale Machine Learning on Heterogeneous Distributed ...
    Nov 9, 2015 · This section describes several techniques that we and others have developed in order to accomplish this, and illustrates how to use TensorFlow ...
  13. [13]
    The Regression Analysis of Binary Sequences - Cox - 1958
    Dec 5, 2018 · First published: July 1958. https://doi.org/10.1111/j.2517-6161.1958 ... Copy URL. Share a link. Share on. Email · Facebook · x · LinkedIn ...Missing: original | Show results with:original
  14. [14]
    Maximum Likelihood from Incomplete Data Via the EM Algorithm
    Maximum Likelihood from Incomplete Data Via the EM Algorithm - Dempster - 1977 - Journal of the Royal Statistical Society: Series B (Methodological) - Wiley ...Missing: Gaussian | Show results with:Gaussian
  15. [15]
    Support-vector networks | Machine Learning
    About this article. Cite this article. Cortes, C., Vapnik, V. Support-vector networks. Mach Learn 20, 273–297 (1995). https://doi.org/10.1007/BF00994018.
  16. [16]
    [PDF] Large-Scale Machine Learning on Heterogeneous Distributed ...
    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. (Preliminary White Paper, November 9, 2015). Martın Abadi, Ashish Agarwal, ...
  17. [17]
    None
    Below is a merged summary of parametric models from "All of Statistics" by Larry Wasserman (2004), consolidating all the information from the provided segments into a single, comprehensive response. To retain maximum detail and ensure clarity, I will use a structured format with text for the overview and a table in CSV format for detailed properties and advantages. This approach balances readability with the inclusion of all specifics.
  18. [18]
    [PDF] Closed-form solutions for parameter estimation in exponential ...
    May 15, 2025 · In this paper, we derive closed-form estimators for the parameters of cer- tain exponential family distributions through the maximum a ...
  19. [19]
    Parameter Identifiability in Statistical Machine Learning: A Review
    May 1, 2017 · In these models, all parameters have physically interpretable meaning. Identifiability analysis is the first step for estimating unknown ...
  20. [20]
    [PDF] Maximum Likelihood Estimation (MLE)
    To discuss asymptotic properties of MLE, which are why we study and use MLE in practice, we need some so-called regularity conditions. These conditions are to ...
  21. [21]
    Maximum Likelihood from Incomplete Data via the EM Algorithm - jstor
    A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality.
  22. [22]
    [PDF] A Tutorial on Fisher Information - arXiv
    Asymptotic normality of the MLE implies that (ˆθ−θ∗) D≈ N(0, 1 n 2), when n is large enough. Matching the variances in the approximation based on the normal ...
  23. [23]
    The Large-Sample Distribution of the Likelihood Ratio for Testing ...
    March, 1938 The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. S. S. Wilks · DOWNLOAD PDF + SAVE TO MY LIBRARY.
  24. [24]
    [PDF] wald1943.pdf
    Tests of Statistical Hypotheses Concerning Several Parameters When the Number of. Observations is Large. Author(s): Abraham Wald. Source: Transactions of the ...Missing: original paper
  25. [25]
    [PDF] Lecture Notes on Nonparametrics
    Non-parametric means infinite-dimensional. The differences are profound. Typically, parametric estimates converge at a n-1=2 rate. Non-parametric estimates ...
  26. [26]
    Optimal Global Rates of Convergence for Nonparametric Regression
    The optimal rate of convergence for ∥^Tn−T(θ)∥q is n−r if 0<q<∞ ; while (n−1logn)r is the optimal rate if q=∞.
  27. [27]
    [PDF] SEMIPARAMETRIC INFERENCE AND MODELS
    Sep 5, 2005 · Frequently semiparametric models have a (smooth) parametrization in terms of a finite - dimensional parameter θ ∈ Θ ⊂ Rk and an infinite - ...
  28. [28]
    [PDF] 7 Semiparametric Methods and Partially Linear Regression
    A model is called semiparametric if it is described by and where is finite-dimensional. (e.g. parametric) and is infinite-dimensional (nonparametric).
  29. [29]
    [PDF] Semiparametric Statistics - Columbia University
    Apr 4, 2018 · By a semiparametric model we mean a statistical model1 that involves both parametric and nonparametric (infinite-dimensional2) components.
  30. [30]
    [PDF] Regression Models and Life-Tables D. R. Cox Journal of the Royal ...
    Jan 18, 2007 · Regression Models and Life-Tables. D. R. Cox. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 34, No. 2. (1972) ...