Fact-checked by Grok 2 weeks ago

Survival analysis

Survival analysis is a branch of statistics focused on the analysis of time-to-event data, where the primary outcome is the time until a specific event occurs, such as , onset, or system failure. This methodology accounts for incomplete observations through censoring, where the exact event time is unknown for some subjects, typically due to study termination or loss to follow-up. Central to survival analysis are the , which estimates the probability of surviving beyond a given time point, and the hazard function, representing the instantaneous risk of the event at that time conditional on survival up to then. Key challenges in survival analysis arise from censored data and the need to compare survival distributions across groups or assess the impact of covariates. Right censoring is the most common form, occurring when subjects are still event-free at the study's end, assuming censoring is non-informative to avoid bias. Nonparametric methods like the Kaplan-Meier estimator provide a step-function approximation of the survival function and are widely used for descriptive purposes, while the log-rank test enables hypothesis testing to compare survival curves between groups. For modeling relationships with predictors, semiparametric approaches such as the Cox proportional hazards model are standard, assuming that hazard ratios remain constant over time and allowing adjustment for multiple factors like age or treatment effects. Originally developed in the mid-20th century for actuarial and medical applications, survival analysis has expanded to diverse fields including reliability, (e.g., time to ), and social sciences. The Kaplan-Meier method, introduced in 1958, marked a foundational advancement for handling censored nonparametrically. In , it is essential for clinical trials to estimate treatment efficacy and prognosis, often visualized through survival plots that highlight differences in event-free probabilities. Modern extensions incorporate and handle complex censoring types, but core techniques remain robust for time-to-event studies.

Introduction

Overview and Importance

Survival analysis is a branch of statistics focused on the study of time until the occurrence of a specified , such as , failure, or recovery, where the data often include incomplete observations known as censoring. This approach enables researchers to model and infer the distribution of these times-to-event while accounting for the fact that not all subjects may experience the event within the observation period. The importance of survival analysis lies in its ability to handle real-world scenarios where events do not occur uniformly or completely within the study timeframe, making it indispensable across diverse disciplines. In , it is widely applied to evaluate survival times following treatments or diagnoses, such as assessing outcomes in clinical trials for chronic diseases. In , it supports by analyzing component failure times to improve design and maintenance strategies. Similarly, in , it examines durations like time to or to inform and . Standard statistical methods, such as calculating means or proportions, often fail with time-to-event data because they cannot properly incorporate censored observations, leading to biased estimates that underestimate true event times. For instance, in a study tracking time to remission among cancer patients, some individuals may still be in remission at the study's end or drop out early, providing only partial information; ignoring this censoring would discard valuable data and distort survival probabilities.

Historical Development

The roots of survival analysis trace back to 17th-century actuarial science, where early efforts focused on quantifying mortality patterns through life tables. In 1662, John Graunt published Natural and Political Observations Made upon the Bills of Mortality, presenting the first known life table derived from empirical data on births and deaths in London parishes, which estimated survival probabilities across age groups and laid foundational principles for demographic analysis. This work marked a shift from anecdotal observations to systematic data-driven mortality assessment, influencing subsequent actuarial practices. Building on Graunt's innovations, Edmund Halley refined life table methodology in 1693 with his analysis of birth and death records from Breslau (now ), Poland, producing one of the earliest complete s that estimated the number of survivors from a birth cohort to various ages and incorporated in population estimates. Halley's table, published in the Philosophical Transactions of the Royal Society, enabled practical applications such as pricing and highlighted the variability in survival rates, prompting later refinements in the 18th and 19th centuries by demographers like and Benjamin Gompertz, who introduced parametric models for mortality trends. The saw survival analysis evolve into a rigorous statistical discipline, addressing censored data and estimation challenges. In 1926, Major Greenwood developed a variance for life table survival probabilities, providing a method to quantify uncertainty in actuarial estimates and becoming a cornerstone for non-parametric inference. This was extended in 1958 by Edward L. Kaplan and Paul Meier, who introduced the Kaplan-Meier , a non-parametric product-limit method for estimating the from incomplete observations, widely adopted in medical and reliability studies. Edmund Gehan further advanced comparative techniques in 1965 with a generalized Wilcoxon test for singly censored samples, enabling robust testing between distributions. A pivotal milestone occurred in 1972 when David Cox proposed the , a semi-parametric regression framework that relates covariates to the hazard function without specifying its baseline form, revolutionizing the analysis of survival data in clinical trials and . Post-2000 developments expanded survival analysis through Bayesian approaches, which incorporate prior knowledge for flexible inference in complex models, and integrations after 2010, including neural networks for high-dimensional survival prediction that handle non-linear effects and large datasets more effectively than traditional methods. In the 2020s, the field has shifted toward computational methods, leveraging and to scale analyses for in biomedical applications, enhancing predictive accuracy and interpretability.

Core Concepts

In survival analysis, the survival function, denoted S(t), represents the probability that a T, which denotes the time until the occurrence of an event of interest, exceeds a given time t \geq 0. Thus, S(t) = P(T > t). This function is non-increasing and right-continuous, with S(0) = 1 and \lim_{t \to \infty} S(t) = 0 under typical assumptions for positive survival times. The survival function relates directly to the (CDF) F(t) = P(T \leq t), such that F(t) = 1 - S(t). For continuous survival times, the (PDF) is derived as f(t) = -\frac{d}{dt} S(t), which describes the instantaneous rate of event occurrence at time t. These relationships provide the foundational probability framework for modeling time-to-event data. The expected lifetime, or mean survival time, is obtained by integrating the survival function over all possible times: E[T] = \int_0^\infty S(t) \, dt. This integral quantifies the average duration until the event, assuming the integral converges. Quantiles of the survival distribution offer interpretable summaries, such as the median survival time, defined as the value t_{0.5} where S(t_{0.5}) = 0.5, representing the time by which half of the is expected to experience . Higher or lower quantiles can similarly characterize the distribution's spread. Formulations of the survival function differ between continuous and time settings. In the continuous case, S(t) is differentiable , linking to the function via h(t) = -\frac{d}{dt} \log S(t). In time, where occur at times, the survival function is expressed as the product S(t) = \prod_{u=1}^t (1 - h(u)), with h(u) denoting the at time u. Common parametric families for the include the , Weibull, and log-normal distributions, each characterized by and parameters that capture different behaviors. The distribution assumes a , with S(t) = e^{-\lambda t}, where \lambda > 0 is the ( ). The Weibull distribution generalizes this, allowing increasing, decreasing, or hazards via S(t) = e^{-(t/\alpha)^\beta}, where \alpha > 0 is the and \beta > 0 is the (with \beta = 1 reducing to ). The log-normal distribution models survival times that are log-normally distributed, with S(t) = 1 - \Phi\left( \frac{\log t - \mu}{\sigma} \right), where \Phi is the standard normal CDF, \mu is the location (-), and \sigma > 0 is the . These distributions are widely used due to their flexibility in fitting empirical survival patterns across applications like reliability engineering and medical research.

Hazard Function and Cumulative Hazard

The hazard function, denoted h(t), represents the instantaneous rate at which events occur at time t, given survival up to that time. Formally, it is defined as h(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t \mid T \geq t)}{\Delta t}, where T is the survival time random variable. This limit expression captures the conditional probability of the event happening in a small interval following t, divided by the interval length, as the interval approaches zero. Equivalently, h(t) can be expressed in terms of the probability density function f(t) and the survival function S(t) as h(t) = f(t) / S(t), where S(t) = P(T > t) is the probability of surviving beyond time t. The cumulative hazard function, H(t), accumulates the hazard over time up to t and is given by the integral H(t) = \int_0^t h(u) \, du. This function relates directly to the survival function through the identity H(t) = -\log S(t), which implies S(t) = \exp(-H(t)). The relationship between the density, hazard, and survival functions follows as f(t) = h(t) S(t), since f(t) = -dS(t)/dt and substituting S(t) = \exp(-H(t)) yields the derivative form h(t) = - \frac{d}{dt} \log S(t). These interconnections highlight how the hazard provides a dynamic view of risk, contrasting with the static probability encoded in S(t). In interpretation, the hazard function h(t) is often termed the force of mortality in demographic contexts or the in , quantifying the intensity of the event risk at each instant conditional on prior . A key assumption in many survival models is that of proportional hazards, where the hazard for individuals with covariates X takes the multiplicative form h(t \mid X) = h_0(t) \exp(\beta' X), with h_0(t) as the baseline hazard for a reference group (e.g., when X = 0). This posits that covariates act to scale the underlying time-dependent hazard shape by a constant factor, independent of time. Hazard functions exhibit varying shapes depending on the underlying survival distribution. For the , the hazard is constant, h(t) = \lambda, reflecting memoryless event timing where risk does not change with survival duration. In contrast, the produces an increasing hazard when the \beta > 1, as h(t) = \lambda \beta (\lambda t)^{\beta - 1}, modeling scenarios like aging processes where failure risk accelerates over time.

Censoring Mechanisms

In survival analysis, censoring refers to the incomplete of times due to the design or external factors, which complicates the estimation of distributions and requires specialized statistical methods to avoid . Censoring arises because subjects may exit the before the of interest occurs, or the event may happen outside the window, leading to partial information about their times. Unlike complete scenarios, ignoring censoring can result in biased estimates of probabilities and underestimated variances, as it treats censored observations as events or failures, distorting the risk set and inflating the apparent incidence of events. For instance, in clinical trials, failing to account for censoring might overestimate treatment effects by excluding longer times from censored subjects. Right censoring is the most prevalent type, occurring when the event has not been observed by the end of the period or due to subject withdrawal. Type I right censoring happens at a fixed , where all remaining subjects are censored regardless of their status, common in prospective studies with predefined durations. In contrast, Type II right censoring involves censoring at a fixed number of events, often used in reliability testing where observation stops after a set number of failures, though it is less common in biomedical contexts due to ethical concerns. Dropout due to loss to follow-up also induces right censoring, assuming it is unrelated to the event risk. Left censoring occurs when the event has already happened before the period begins, providing only an upper bound on the event time. This is typical in cross-sectional studies or retrospective analyses where entry into the study follows the event, such as diagnosing chronic conditions after onset. censoring extends this idea, where the event time is known only to fall within a specific between two points, rather than an exact time or bound. For example, in periodic health screenings, the event might be detected between visits without pinpointing the exact occurrence. Truncation differs from censoring in that it entirely excludes subjects whose event times fall outside the observation window, rather than including partial information. Left truncation, for instance, removes individuals who experienced the before study entry, potentially biasing the sample toward longer survivors if not adjusted for, as seen in registry data where only post-enrollment cases are captured. Right truncation similarly omits events after the study cutoff. Unlike censoring, which retains subjects in the risk set until their censoring time, truncation alters the population representation entirely. A key assumption underlying most survival analysis methods is non-informative censoring, where the censoring mechanism is independent of the event time given the covariates, ensuring that censored subjects have the same future event risk as non-censored ones. Violation of this assumption, such as when censoring correlates with poorer (e.g., withdrawal due to worsening ), introduces informative censoring, leading to biased estimates and invalid inferences. Methods like the Kaplan-Meier estimator address right censoring by incorporating censored observations into the risk set without treating them as events, thereby providing unbiased survival curve estimates under the non-informative assumption.

Estimation Techniques

Non-parametric Methods

Non-parametric methods in survival analysis provide distribution-free approaches to estimate functions and compare groups without assuming a specific underlying for survival times. These techniques are particularly valuable when the form of the survival distribution is unknown or complex, allowing for flexible estimation in the presence of censoring. The primary tools include estimators for the and cumulative functions, as well as tests for comparing survival experiences across groups. The is a cornerstone non-parametric method for estimating the from right-censored data. It constructs the estimator as a product-limit formula: for ordered distinct event times t_1 < t_2 < \cdots < t_k, the estimated is given by \hat{S}(t) = \prod_{t_i \leq t} \frac{n_i - d_i}{n_i}, where n_i is the number of individuals at risk just prior to time t_i, and d_i is the number of events observed at t_i. This is unbiased and consistent under standard conditions, providing a step function that decreases only at observed event times. To assess its variability, Greenwood's formula offers an estimate of the variance: \hat{\sigma}^2(t) = \hat{S}^2(t) \sum_{t_i \leq t} \frac{d_i}{n_i (n_i - d_i)}, which approximates the asymptotic variance and is used to construct confidence intervals, such as those based on the normal approximation to the log-cumulative hazard. This variance estimator conditions on the observed censoring pattern and performs well even with moderate sample sizes. (Note: Greenwood's original work on variance estimation for life tables predates the Kaplan-Meier method but was adapted for its use.) Complementing the Kaplan-Meier estimator, the Nelson-Aalen estimator provides a non-parametric estimate of the cumulative hazard function, defined as \hat{H}(t) = \sum_{t_i \leq t} \frac{d_i}{n_i}. This estimator accumulates the incremental hazards at each event time and serves as a basis for deriving the Kaplan-Meier survival estimate via the relationship \hat{S}(t) = \exp(-\hat{H}(t)) in the continuous case, though the product-limit form is preferred for discrete data to avoid bias. The Nelson-Aalen approach is asymptotically equivalent to the Kaplan-Meier under the exponential transformation and is particularly useful for modeling hazard processes directly. Its variance can be estimated similarly using \sum_{t_i \leq t} \frac{d_i}{n_i^2}. For comparing survival curves between two or more groups, the is a widely used non-parametric hypothesis test that assesses whether there are differences in survival distributions. The test statistic is constructed by comparing observed events O_j in group j to expected events E_j under the null hypothesis of identical hazards across groups, typically following a with degrees of freedom equal to the number of groups minus one. At each event time, E_j = n_{j,i} \cdot \frac{\sum d_i}{n_i}, where n_{j,i} is the number at risk in group j just before t_i. The overall statistic is \sum (O_j - E_j)^2 / \mathrm{Var}(O_j - E_j), providing a sensitive test for detecting differences, especially under . A classic application of these methods is in analyzing survival data from patients with acute myelogenous leukemia (AML), as studied in early chemotherapy trials. Consider a simplified life table from the 6-mercaptopurine (6-MP) arm of a cohort of 21 AML patients, where survival times are right-censored for some individuals; the data track remission duration post-treatment (9 events, 12 censored observations). The table below illustrates the construction for the , showing event times, individuals at risk (n_i), and deaths (d_i):
Time (weeks)n_id_iSurvival Probability
62130.857
71710.807
101510.753
131210.690
161110.628
22710.538
23610.448
Here, the Kaplan-Meier curve drops stepwise at event times, yielding a 1-year survival estimate around 0.45, highlighting the method's ability to handle censoring while estimating the survival function directly from the data. This example demonstrates how non-parametric estimators reveal treatment effects without distributional assumptions. These methods rely on key assumptions, including independent censoring, where censoring times are independent of event times given covariates, ensuring the estimators remain unbiased. They are valid for right-censoring but require adjustments for ties in event times, often handled by averaging or redistribution. Limitations include reduced efficiency with heavy censoring or small samples, where variance estimates may be unstable, and they do not directly incorporate covariates beyond stratification. Recent extensions, such as non-parametric mixture models for clustered survival data, address dependencies in correlated observations (e.g., familial or longitudinal clusters) by integrating clustering with survival estimation, improving robustness in modern applications like multi-center trials post-2020.

Parametric Methods

Parametric methods in survival analysis assume that the survival times follow a specific probability distribution, such as the , , , or , allowing for the estimation of parameters that fully characterize the distribution. This assumption enables more precise inference compared to non-parametric approaches when the chosen distribution is appropriate, as it incorporates distributional knowledge to improve efficiency. The likelihood function for parametric models accounts for censoring by distinguishing between observed events and censored observations. For a sample of n independent survival times t_i, where \delta_i = 1 if the event is observed at t_i and \delta_i = 0 if censored at t_i, the likelihood is constructed as L(\theta) = \prod_{i=1}^n f(t_i \mid \theta)^{\delta_i} S(t_i \mid \theta)^{1 - \delta_i}, where f(t \mid \theta) is the probability density function and S(t \mid \theta) is the , both parameterized by \theta. This form ensures that censored observations contribute only through their survival probability up to the censoring time, while uncensored ones contribute via the density at the event time. Maximum likelihood estimation (MLE) maximizes this likelihood (or its logarithm) to obtain estimates of \theta. For common distributions, such as the exponential (constant hazard) or Weibull (monotonic hazard, flexible shape), the log-likelihood is often not analytically tractable and requires numerical optimization methods like Newton-Raphson or EM algorithms. For the exponential distribution with rate \lambda, the MLE is \hat{\lambda} = \sum \delta_i / \sum t_i, but for the Weibull with shape \alpha and scale \beta, iterative optimization is necessary due to the coupled parameters. Confidence intervals for the parameters can be derived using the profile likelihood method, which profiles out nuisance parameters to focus on the parameter of interest. The profile log-likelihood for a parameter \theta_j is obtained by maximizing the full log-likelihood over the other parameters for fixed values of \theta_j, and the 95% confidence interval consists of values where the profile log-likelihood drops by 1.92 units from its maximum. This approach provides intervals with better coverage properties than Wald intervals, especially for small samples or when parameters are on the boundary. Parametric methods offer gains in statistical efficiency over non-parametric ones, such as the , when the assumed distribution is correct, as they use the full parametric form to borrow strength across the data. However, if the model is misspecified, parametric estimates can be biased and lead to invalid inferences, whereas non-parametric methods remain consistent under fewer assumptions. In discrete-time settings, where survival times are grouped into intervals, parametric models can parameterize the hazard within each interval using logistic-like forms, such as the complementary log-log link for proportional hazards or logit for direct probability modeling. These models treat the data as a series of binary outcomes (event or survival in each interval, conditional on prior survival), enabling the use of generalized linear model frameworks for estimation. As an illustrative example, consider fitting a Weibull model to survival data from the E1684 melanoma clinical trial, which tracked recurrence-free survival in patients with high-risk melanoma treated with interferon alfa-2b versus observation. Using MLE, the estimated Weibull shape parameter \hat{\alpha} \approx 0.8 indicates a hazard that decreases over time, while a negative treatment coefficient (≈ -0.25 on the log-scale) suggests improved survival with treatment; profile likelihood confirms the difference.

Goodness-of-Fit Assessment

Goodness-of-fit assessment in survival analysis evaluates how well a fitted model captures the underlying data-generating process, ensuring assumptions hold and predictions are reliable. This involves graphical diagnostics, formal statistical tests, and model comparison criteria to detect departures from model assumptions such as proportionality or correct functional form. These methods are essential after estimation to validate non-parametric, parametric, or semi-parametric models like the , where poor fit can lead to biased inferences. Graphical diagnostics provide visual insights into model adequacy. Cox-Snell residuals, defined as r_i = \hat{H}(t_i, \hat{\theta}) for observed event times t_i under the estimated cumulative hazard \hat{H}, should follow a unit exponential distribution if the model fits well; deviations are assessed via Q-Q plots comparing empirical quantiles of \exp(-r_i) against theoretical exponentials. Martingale residuals, given by r_i^M = \delta_i - \hat{\Lambda}(t_i) where \delta_i is the censoring indicator and \hat{\Lambda} the estimated cumulative baseline hazard, are plotted against covariates to check linearity in the log-hazard; smooth curves deviating from zero indicate non-linearity. These residuals, introduced for semi-parametric models, help identify functional misspecification. Formal tests quantify specific assumption violations. The Grambsch-Therneau test assesses the proportional hazards assumption by examining correlations between scaled Schoenfeld residuals and time, using a chi-squared statistic to detect non-proportionality; significant p-values suggest time-dependent effects. For nested parametric models, likelihood ratio tests compare maximized log-likelihoods to evaluate if added parameters improve fit significantly, following a chi-squared distribution under the null. These tests are widely implemented in software for post-estimation diagnostics. Model selection criteria balance fit and complexity. The Akaike Information Criterion (AIC), computed as -2\ell + 2p where \ell is the log-likelihood and p the number of parameters, penalizes overparameterization in parametric survival models; lower AIC indicates better relative fit. The Bayesian Information Criterion (BIC), -2\ell + p \log(n) with sample size n, imposes a stronger penalty for larger datasets, favoring parsimony. Extensions like corrected AIC (AICc) address small-sample bias in survival contexts. Assumption checks extend to influential observations. DFBETA residuals measure the change in coefficient estimates when omitting the i-th observation, scaled by standard error; values exceeding $2/\sqrt{n} flag influential points that disproportionately affect the model. These are computed for each covariate in to ensure robustness. Modern validation techniques incorporate resampling for predictive performance. , particularly k-fold variants adapted for censoring via partial likelihood on held-out data, estimates out-of-sample concordance or Brier scores to assess generalization, outperforming in-sample metrics for high-dimensional survival data. , such as bias-corrected resampling from the 2010s, generate confidence intervals for survival curves by drawing samples with replacement and averaging predictions, providing robust uncertainty quantification under right-censoring. These approaches mitigate overfitting in complex models.

Regression Models

Cox Proportional Hazards Model

The Cox proportional hazards model, introduced by in 1972, is a cornerstone semi-parametric approach in survival analysis for investigating the relationship between covariates and the hazard rate while leaving the baseline hazard unspecified. It assumes that the effects of covariates are multiplicative on the hazard and constant over time, known as the proportional hazards (PH) assumption. The model expresses the hazard function for an individual with covariates \mathbf{X} as h(t \mid \mathbf{X}) = h_0(t) \exp(\boldsymbol{\beta}^\top \mathbf{X}), where h_0(t) is the unspecified baseline hazard function at time t, \boldsymbol{\beta} is the vector of regression coefficients, and \exp(\boldsymbol{\beta}^\top \mathbf{X}) represents the relative hazard or risk score. This formulation allows estimation of covariate effects without parametric assumptions on the distribution of survival times, making it widely applicable in fields like medicine and engineering. Estimation of \boldsymbol{\beta} relies on maximizing the partial likelihood, which conditions on the observed event times and risk sets, eliminating the baseline hazard from the optimization. For ordered event times t_1 < t_2 < \cdots < t_n with associated covariates \mathbf{X}_i, the partial likelihood is L(\boldsymbol{\beta}) = \prod_{i=1}^n \frac{\exp(\boldsymbol{\beta}^\top \mathbf{X}_i)}{\sum_{j \in R(t_i)} \exp(\boldsymbol{\beta}^\top \mathbf{X}_j)}, where R(t_i) denotes the risk set of individuals at risk just prior to t_i. The maximum partial likelihood estimator \hat{\boldsymbol{\beta}} is typically obtained via the , an iterative numerical method that updates estimates based on the score function (first derivative of the log-partial likelihood) and observed information matrix (negative second derivative). Once \hat{\boldsymbol{\beta}} is found, the baseline hazard h_0(t) or cumulative baseline hazard H_0(t) can be estimated non-parametrically using approximations that account for ties in event times, such as , which assumes tied events occur simultaneously and weights the risk set accordingly, or , which adjusts for the probability of multiple events at the same time to reduce bias. Under the PH assumption, the coefficients \boldsymbol{\beta} are interpreted through hazard ratios: \exp(\hat{\beta}_k) quantifies the multiplicative change in hazard for a one-unit increase in covariate X_k, holding other covariates constant, and this ratio remains time-independent. For instance, in an analysis of 17,600 primary cutaneous melanoma patients, tumor thickness (Breslow depth) and ulceration status were identified as the most significant prognostic factors, with survival rates decreasing markedly with increasing thickness. Inference for \boldsymbol{\beta} includes 95% confidence intervals, constructed via the asymptotic normality \hat{\boldsymbol{\beta}} \approx N(\boldsymbol{\beta}, \mathbf{I}(\hat{\boldsymbol{\beta}})^{-1}), where \mathbf{I}(\hat{\boldsymbol{\beta}}) is the observed information matrix from the partial likelihood, or through profile likelihood methods that maximize the likelihood over nuisance parameters for each component of \boldsymbol{\beta}. In high-dimensional settings, such as genomics where the number of covariates exceeds sample size, the standard Cox model is extended via penalized regression techniques like ridge (L2) penalization to stabilize estimates and mitigate overfitting by shrinking coefficients toward zero. This integration has become prominent in 2020s genomic studies for survival prediction, as seen in analyses of tumor profiling data where ridge-penalized Cox models improve prognostic accuracy by handling thousands of gene expression features while preserving predictive performance.

Accelerated Failure Time Models

Accelerated failure time (AFT) models provide an alternative regression framework in survival analysis, directly modeling the effect of covariates on the time to an event rather than on the hazard rate. Unlike proportional hazards models, AFT models assume that covariates act multiplicatively to accelerate or decelerate the life course of an individual, effectively scaling the time axis of the underlying survival distribution. This approach originated in reliability engineering for accelerated life testing and has been adapted for medical and biological applications. The core formulation of the AFT model is given by \log T_i = \mu + \beta^\top X_i + \sigma \epsilon_i, where T_i > 0 is the survival time for the i-th individual, \mu is the intercept, \beta is the of coefficients, X_i is the of covariates, \sigma > 0 is a , and \epsilon_i is an error term with a known distribution (e.g., the extreme value distribution for the Weibull model or the standard for the log-logistic model). The then takes the form S(t \mid X) = S_0(t \exp(-\beta^\top X / \sigma)), where S_0 is the baseline corresponding to the distribution of \epsilon_i. Interpretation of the coefficients in models focuses on time ratios rather than ratios. Specifically, \exp(\beta_j) represents the multiplicative factor by which the survival time changes for a one-unit increase in covariate X_j, holding other covariates constant; values greater than 1 indicate deceleration (longer survival times), while values less than 1 indicate acceleration (shorter survival times). For instance, if \exp(\beta_j) = 1.2, the expected survival time increases by 20% per unit increase in X_j. This direct interpretation on the time scale makes models particularly intuitive for understanding covariate effects in contexts where time acceleration is meaningful. Estimation in parametric AFT models is typically performed via maximum likelihood, assuming a specific such as log-logistic or Weibull, which allows for closed-form expressions for the likelihood under right censoring. For semi-parametric AFT models, where the is unspecified, rank-based methods like the Buckley-James or geodesic are used, providing consistent estimates without distributional assumptions. These approaches handle censoring by incorporating the ranks of observed times. Certain parametric models are mathematically equivalent to specific s. For example, the Weibull model is equivalent to the Weibull , as both arise from the same underlying extreme value error distribution, allowing the same data to be analyzed under either framework with identical inferences on covariate effects. This equivalence does not hold for other distributions, such as the log-logistic, highlighting the model's flexibility. An illustrative example of AFT modeling involves survival analysis in acute leukemia patients, such as those with (ALL) undergoing . In a of 206 ALL patients, a Weibull AFT model was fitted to assess the impact of (a treatment-related covariate) on overall time. The estimated time ratio for relapse was 0.082 (95% : 0.039–0.17, p < 0.001), indicating that patients experiencing relapse had survival times approximately 12 times shorter than those without, demonstrating the model's ability to quantify treatment effects directly on time to event. AFT models offer several advantages, including their straightforward interpretation of covariate effects as time ratios, which is especially useful in for predicting component lifetimes under stress factors. They are robust to violations of the proportional hazards assumption and provide a natural framework for analyzing how interventions alter the pace of progression to failure or death.

Extensions to Standard Regression Models

Extensions to standard models in survival analysis address limitations of the basic Cox proportional hazards and accelerated failure time models, such as violations of proportionality assumptions, unobserved heterogeneity, and the presence of multiple event types or complex trajectories. These modifications enable more flexible modeling of time-to-event data in scenarios where covariates evolve over time, baseline hazards differ across groups, or dependencies exist due to clustering or competing outcomes. By incorporating such extensions, analysts can better capture real-world complexities in fields like clinical trials and , improving the validity of inferences about hazard ratios and survival probabilities. Time-dependent covariates extend the Cox model to situations where predictor values change over follow-up time, allowing the hazard function to be expressed as h(t \mid X(t)) = h_0(t) \exp(\beta' X(t)), where X(t) denotes the covariate vector at time t. This formulation accommodates dynamic factors, such as evolving exposures or physiological markers, by updating the risk set at each event time to reflect current covariate values. Estimation proceeds via partial likelihood maximization, treating the data as multiple intervals per subject to handle changes, which requires careful data structuring to avoid bias from measurement errors or induced dependencies. The stratified Cox model relaxes the assumption of a common baseline by estimating separate h_{0k}(t) for each stratum k, while maintaining a shared vector \beta across strata: h_k(t \mid X) = h_{0k}(t) \exp(\beta' X). This approach is particularly useful when a grouping variable, such as treatment arm or genetic subtype, violates the proportional s assumption, enabling unbiased estimation of covariate effects without forcing a single . Partial likelihoods are computed independently within strata and multiplied, preserving the model's efficiency for large datasets. Frailty models introduce unobserved heterogeneity through multiplicative random effects, modifying the to h(t \mid X, u) = u h_0(t) \exp(\beta' X), where u is the , often assumed with 1 and variance \theta to account for clustering, such as in familial or recurrent events. Shared frailty models apply a common u to correlated subjects, inducing dependence and explaining decreasing hazards over time due to survivor selection of low-frailty individuals. Parameter estimation typically uses penalized likelihood or algorithms, with the chosen for its conjugacy and interpretability in capturing . In the presence of competing risks, where multiple mutually exclusive events can occur, cause-specific hazards model the instantaneous risk of each event type j as h_j(t \mid X) = h_{0j}(t) \exp(\beta_j' X), treating other events as censoring to estimate marginal effects on individual risks. This approach, rooted in the multivariate Cox framework, allows separate regression for each cause but requires cumulating incidence functions for overall probabilities. Alternatively, the Fine-Gray subdistribution hazard model directly regresses the cumulative incidence of the event of interest, accounting for competing events by reweighting at-risk subjects: h_{sub,j}(t \mid X) = h_{0,sub,j}(t) \exp(\beta_j' X). This method provides direct estimates of covariate effects on absolute risks, with estimation via weighted partial likelihoods, and is preferred when interest lies in cause-specific probabilities rather than hazards. Multi-state models extend frameworks to track transitions across health states, such as in the illness-death model, which includes healthy, , and dead states to analyze progression and recovery in longitudinal data. These models estimate cause-specific hazards for each transition, enabling prediction of occupancy probabilities and accommodating time-dependent covariates or frailties for recent applications in chronic . Recent implementations integrate relative survival to distinguish disease- from background mortality, enhancing applicability to population-based studies. Diagnostics for these extensions often rely on Schoenfeld residuals, adapted to test time-dependence by correlating scaled residuals with time or a thereof; significant trends indicate non-proportionality, prompting or time-varying coefficients. For frailty and competing risks models, generalized residuals assess fit, while graphical checks of smoothed residuals against time help verify assumptions like in multi-state transitions.

Advanced Methods

Tree-Structured and Ensemble Approaches

Tree-structured methods extend techniques to data, enabling the identification of prognostic subgroups without assuming linear covariate effects or proportional hazards. In trees, the covariate space is recursively partitioned into binary splits based on covariate thresholds, creating increasingly homogeneous child with respect to outcomes. This process begins at the root containing all observations and continues until a stopping criterion, such as a minimum size, is reached. The resulting stratifies patients into risk groups defined by terminal , where distributions are summarized non-parametrically using Kaplan-Meier estimates. The core of survival tree construction lies in the splitting , which evaluates potential splits by maximizing differences in curves between child nodes. A common approach uses the log-rank statistic to assess the divergence between Kaplan-Meier curves in the resulting subgroups, selecting the split that yields the largest value, indicating the most significant separation in times. This , adapted from non-parametric testing, ensures that splits prioritize prognostic while accounting for censoring. Early formulations of this method, such as those proposed by and Olshen, formalized the integration of tree partitioning with censored . To mitigate in grown , techniques are applied post-construction. These involve evaluating a sequence of subtrees by balancing fit to the against , often using cost- measures that penalize additional splits. The optimal pruned is selected based on cross-validation or criteria, preserving predictive accuracy while simplifying interpretability. For , new observations traverse the to a terminal , where the from that provides individualized risk estimates. An illustrative application of survival trees appears in the analysis of the (PBC) , comprising 312 patients with 12 covariates. A fitted tree with 12 terminal nodes stratified patients into distinct prognostic groups based on key variables like and , with Kaplan-Meier curves revealing significant survival differences across nodes, such as median survival times ranging from over 10 years in low-risk groups to under 2 years in high-risk ones. This example demonstrates the method's utility in clinical prognostication by uncovering non-linear interactions, such as the combined effect of liver function markers. Ensemble approaches, particularly random survival forests (RSF), address the instability of single trees by aggregating multiple survival through bagging. In RSF, numerous are grown on bootstrap samples of the , with random subsets of covariates considered at each to enhance and reduce correlation among . The ensemble prediction averages the terminal node survival functions across , yielding a more stable cumulative hazard estimate via the Nelson-Aalen form. Variable importance is assessed using out-of-bag (OOB) error rates, where permutations of covariates in OOB samples quantify their contribution to predictive performance. Ishwaran et al. introduced RSF as a robust extension for censored , showing improved accuracy over single in simulations and real datasets. These tree-structured and ensemble methods excel at capturing complex interactions and non-linear covariate effects, relaxing the proportional hazards assumption inherent in models like , and providing interpretable visualizations of risk stratification. However, single survival trees suffer from high variance and sensitivity to small data perturbations, necessitating ensembles like RSF for reliable inference, though at the cost of reduced direct interpretability.

Machine Learning and Deep Learning Models

Machine learning and deep learning models have extended survival analysis to handle high-dimensional data, non-linear interactions, and complex patterns that challenge traditional and semi-parametric approaches. These models leverage neural networks to learn flexible representations from covariates, such as genomic profiles or features, enabling more accurate risk prediction in domains like and . By integrating end-to-end learning, they surpass the linear assumptions of models like the Cox proportional hazards, while still accommodating right-censoring through specialized functions or likelihood adaptations. A foundational deep learning adaptation of the Cox model is DeepSurv, which replaces the linear predictor \beta^T x with the output of a deep neural network in the model's partial likelihood. This allows the network to capture non-linear covariate effects and interactions, improving prognostic performance on datasets with thousands of features. Evaluated on breast cancer data, DeepSurv demonstrated superior concordance indices compared to standard Cox regression, highlighting its utility for personalized treatment recommendations. For discrete-time survival analysis, deep models employ recurrent neural networks (RNNs) or convolutional neural networks (CNNs) to predict time-binned hazards, often extending to competing risks. DeepHit, for instance, uses a multi-task to directly estimate the joint distribution of event times and types across discrete intervals, trained via a combination of negative log-likelihood and ranking losses to handle censoring. This approach excels in scenarios with irregular follow-up times, as shown in simulations and real-world applications where it outperformed parametric discrete models in predictive accuracy. RNN-based variants, such as RNN-SURV, further incorporate sequential covariates by processing time-varying inputs through gated recurrent units, enhancing predictions for longitudinal data. Continuous-time deep models address the limitations of time discretization using neural ordinary differential equations (Neural ODEs), which parameterize hazard functions as solutions to differential equations for smooth, flexible survival curves. In multi-state settings, Neural ODEs model transitions between states by learning continuous-time dynamics, enabling interpretable intensity functions without assuming proportional hazards. Applied to electronic health records, these models achieved lower integrated Brier scores than discrete alternatives, underscoring their scalability for high-dimensional inputs. Censoring in deep survival models is managed through techniques like ranking losses, which optimize pairwise comparisons of predicted risks to maximize the concordance index without requiring full event observation, or of censoring (IPCW), which reweights observed events to account for informative censoring patterns. Ranking losses, often pairwise or listwise formulations, are integrated into the training objective alongside likelihood terms, as in extensions of DeepSurv for non-proportional hazards. IPCW, meanwhile, adjusts the empirical risk by estimating censoring distributions, preserving unbiasedness in high-censoring regimes common to clinical trials. These methods ensure robust generalization, with empirical studies showing improved calibration under 50% censoring rates. An illustrative application involves on (TCGA) data for personalized survival prediction across multiple cancer types. Using a Cox-based on multi-omics features from over 10,000 patients, researchers achieved hazard ratios with C-indices exceeding 0.70 for several cohorts, outperforming clinical nomograms by integrating genomic and histopathological data for risk stratification. This demonstrates the potential of deep models to uncover subtype-specific survival patterns in heterogeneous cancers. Recent advances in the include transformer-based models for sequential data in survival analysis, which use self-attention mechanisms to capture long-range dependencies in time-varying covariates like patient histories. Models like the Clinical Transformer estimate patient-specific survival distributions by processing multimodal inputs, yielding superior performance on ICU datasets with dynamic risks. Additionally, frameworks enable privacy-preserving survival modeling across institutions, aggregating updates without sharing raw data, as in federated Cox adaptations that maintain guarantees while achieving comparable accuracy to centralized training on distributed cancer registries.

Discrete-Time and Frailty Models

Discrete-time survival models are designed for data where time is observed in discrete intervals, such as grouped durations or with periodic observations. These models adapt the continuous-time hazard function to discrete settings, where the h(t \mid X) represents the of the event occurring at discrete time t given survival up to t. A common formulation for the discrete is h(t \mid X) = 1 - \exp(-\lambda_t \exp(\beta' X)), which arises from grouping continuous-time exponential hazards over intervals and accommodates covariates X through a proportional hazards . Estimation in discrete-time models often employs generalized linear models with specific link functions to relate the to covariates. The logistic link, \log\left(\frac{h(t \mid X)}{1 - h(t \mid X)}\right) = \alpha_t + \beta' X, models the directly as in repeated logistic regressions, suitable for symmetric shapes. Alternatively, the complementary log-log (cloglog) link, \log(-\log(1 - h(t \mid X))) = \log(\lambda_t) + \beta' X, approximates the continuous-time and handles right-skewed hazards effectively. For the cloglog form, partial likelihood analogous to the model can be used, treating time intervals as strata and avoiding full specification of the \lambda_t, which is particularly advantageous for with unbalanced observations. Frailty models address unobserved heterogeneity in survival data by incorporating a random effect, or frailty, that multiplies the baseline and varies across individuals or clusters. The conditional hazard is specified as h(t \mid X, b) = b \, h_0(t) \exp(\beta' X), where b is the frailty term with mean 1, allowing for and dependence in clustered events. The frailty is typically assumed to follow a with mean 1 and variance \theta, which ensures the marginal hazard remains proportional and facilitates analytical in some cases. Inference for frailty parameters in these models often relies on the , which treats frailties as and iterates between (updating frailty distributions) and maximization (optimizing fixed effects and baseline hazards). Penalized likelihood methods provide an , especially for gamma frailties, by reformulating the model as a penalized partial likelihood to estimate \theta and \beta simultaneously without explicit frailty simulation. These approaches enable robust handling of the frailty variance, with \theta testing the of no heterogeneity. Both discrete-time and frailty models find applications in analyzing clustered survival data, such as recurrent events in clinical trials or correlated failure times within families, where standard models fail to account for within-cluster dependence. In , discrete-time models are widely used for duration analysis in labor markets, while post-2015 developments have integrated them into frameworks, such as survival-adapted networks for sequential discrete data. Despite their utility, discrete-time models remain underexplored relative to continuous-time alternatives in mainstream , though they offer interpretability for grouped or high-dimensional data.

Applications and Implementation

Key Applications Across Disciplines

Survival analysis plays a pivotal role in biomedical research, particularly in clinical trials evaluating drug efficacy for diseases like . In , it is commonly applied to measure time-to-progression or overall , allowing researchers to compare treatment arms while accounting for censored data from patients or still alive at study end. For instance, phase 3 trials often use Kaplan-Meier estimates and log-rank tests to assess improvements in , where fewer than one-third of studies have shown gains of two months or more in overall survival. In engineering, survival analysis supports by modeling machine failure times and predicting claims. It enables estimation of failure probabilities over time for components like automobile parts or industrial equipment, using distributions such as Weibull to forecast the number of expected failures within periods. A data-driven approach incorporating non-parametric models has been used to address data maturation in forecasting, improving predictions for heavy-duty diesel engines by identifying failure patterns from historical claims. Economic applications of survival analysis focus on duration modeling for events like unemployment spells and customer churn. In labor economics, it examines the time until re-employment, incorporating covariates such as education and economic conditions to estimate hazard rates via models like proportional hazards. For customer churn in marketing, competing risks frameworks model multiple exit reasons, such as voluntary cancellation or involuntary termination, to predict lifetimes and inform retention strategies based on factors like usage patterns. In the social sciences, survival analysis investigates time-to-event outcomes in areas like , including marriage duration and dissolution risks. Studies often apply competing risks models to account for events like versus widowhood, revealing how factors such as age at marriage, , and influence survival probabilities. For example, research on newlywed couples in urban settings has shown that significantly affects marriage , with lower-income groups facing higher dissolution hazards within the first few years. Emerging applications extend to , where survival analysis quantifies risks by estimating mean time to from uncertain observational . Survival theory models help predict under climate pressures, incorporating covariates like and temperature changes to forecast probabilities for . In the context of the during the 2020s, survival analysis has been widely used to evaluate outcomes, such as time to or mortality, stratified by , , and comorbidities; studies in diverse populations, including , highlighted higher mortality hazards among older males. A notable involves (AML) treatment comparisons, where survival analysis via the log-rank test demonstrates differences in overall survival between patient groups. In a cohort of 98 AML patients, Kaplan-Meier curves and log-rank tests revealed significant survival disparities by cytogenetic risk categories, with favorable-risk patients achieving a survival of 97.7 months compared to 9.2 months for the intermediate/poor-risk group (including adverse-risk patients), underscoring the method's utility in guiding therapeutic decisions.

Software Tools and Computational Resources

Survival analysis benefits from a wide array of software tools that implement core estimation methods such as Kaplan-Meier curves and proportional hazards models, alongside advanced techniques like ensembles and integrations. In , the package serves as the foundational , offering functions for fitting and semi-parametric models while handling censored efficiently. The survminer package complements this by providing tools, including Kaplan-Meier plots and forest plots for hazard ratios, to aid in interpreting results. For ensemble methods, randomForestSRC implements random survival forests, enabling prediction with high-dimensional through bagging and . In , the lifelines library offers a comprehensive suite for analysis, supporting non-parametric (Kaplan-Meier), semi-parametric (), and models like Weibull and distributions, with built-in plotting and statistical testing capabilities. scikit-survival extends this by integrating models with scikit-learn's ecosystem, allowing seamless use of cross-validation, pipelines, and algorithms like random survival forests and Coxnet for penalized . Commercial software includes SAS's PROC PHREG procedure, which fits and extensions like time-dependent covariates, optimized for large-scale data processing in enterprise environments. Stata's stcox command similarly supports with robust standard errors and , commonly used in econometric and biostatistical workflows. MATLAB's Statistics and Toolbox provides functions such as fitcox for and ecdf for empirical survivor functions, facilitating integration with numerical computing tasks. Computational challenges in survival analysis arise particularly with large datasets and high-dimensional covariates, where standard can suffer from or slow convergence; penalized approaches, such as those using L1/L2 regularization in Coxnet, address this by selecting relevant features while maintaining predictive accuracy. Parallelization techniques, as implemented in randomForestSRC, mitigate runtime issues for ensemble methods on massive data by distributing tree constructions across cores. For high-dimensional settings, deep learning frameworks handle effectively, though they require substantial computational resources for training. Online resources enhance accessibility, with the CRAN Task View for Survival Analysis curating over 100 packages and providing installation guidance via the ctv package for streamlined workflows. Tutorials for implementations, such as the DeepHazard framework compatible with , offer code examples for neural network-based survival prediction. As of 2025, Probability supports Bayesian survival modeling through probabilistic distributions and variational inference, enabling uncertainty quantification in flexible error structures. Cloud-based tools like facilitate scalable survival analysis on , using built-in algorithms for and model deployment in multimodal datasets.

References

  1. [1]
    Survival Analysis Part I: Basic concepts and first analyses - PMC - NIH
    Survival analysis is a collection of statistical procedures for data analysis where the outcome variable of interest is time until an event occurs.
  2. [2]
    [PDF] What is Survival Analysis? - Cornell Statistical Consulting Unit
    1 Introduction. Survival analysis refers to a set of statistical methods for analyzing the time until an event occurs in a population.Missing: definition | Show results with:definition
  3. [3]
    Biostatistics Series Module 9: Survival Analysis - PMC - NIH
    Survival analysis is concerned with “time to event” data. Conventionally, it dealt with cancer death as the event in question, but it can handle any event ...
  4. [4]
    Survival Analysis - StatPearls - NCBI Bookshelf
    [5] Survival analysis examines the time before an individual or group experiences the event or outcome of interest or until censored.Definition/Introduction · Issues of Concern
  5. [5]
    Survival Data Analysis | College of Public Health
    The problem of analyzing time to event data arises in many applied fields, such as medicine, biology, public health, epidemiology, engineering, economics, and ...
  6. [6]
    [PDF] PREFACE TO SURVIVAL ANALYSIS - Medical College of Wisconsin
    Examples of such data arise in diverse fields such as medicine, biology, public health, epidemiology, engineering, economics and demography. While the ...
  7. [7]
    Time-To-Event (TTE) Data Analysis | Columbia Public Health
    These tests compare observed and expected number of events at each time point across groups, under the null hypothesis that the survival functions are equal ...
  8. [8]
    [PDF] an analysis of survival data when hazards are not
    An example of that would be a test subject dropping out of an experiment before the remission of their disease. A common goal in survival analysis is viewing ...
  9. [9]
    Who was Captain John Graunt? | The Actuary
    Sep 8, 2020 · In 1662 Graunt published the first known mortality table to be based on real data, survival rates being produced at ten-year age intervals.
  10. [10]
    VI. An estimate of the degrees of the mortality of mankind; drawn ...
    An estimate of the degrees of the mortality of mankind; drawn from curious tables of the births and funerals at the city of Breslaw.Missing: Edmund | Show results with:Edmund
  11. [11]
    Halley's life table (1693) - SpringerLink
    In 1693 the famous English astronomer Edmond Halley studied the birth and death records of the city of Breslau, which had been transmitted to the Royal ...Missing: survival | Show results with:survival
  12. [12]
    [PDF] Greenwood's formula - UC Berkeley Statistics
    Greenwood's formula puts a standard error on the Kaplan-Meier estimator using the delta-method, with var ˆK ≈ K2 1 − ˆpt Nt ˆpt.
  13. [13]
    Nonparametric Estimation from Incomplete Observations - jstor
    In lifetesting, medical follow-up, and other fields the observation of the time of occurrence of the event of interest (called a death) may be.
  14. [14]
    Regression Models and Life‐Tables - Cox - 1972
    The hazard function (age-specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an ...
  15. [15]
    Informed Bayesian survival analysis
    Sep 10, 2022 · We provide an overview of Bayesian estimation, hypothesis testing, and model-averaging and illustrate how they benefit parametric survival analysis.
  16. [16]
    Deep learning for survival analysis: a review | Artificial Intelligence ...
    Feb 19, 2024 · In this paper we provide a comprehensive review of currently available DL-based survival methods, addressing theoretical dimensions, such as model class and NN ...
  17. [17]
    Survivor Function - an overview | ScienceDirect Topics
    The survivor function, or survival probability, is the probability of not experiencing an event before a specific time, represented as S(t) = Pr(T > t).
  18. [18]
    Statistics review 12: Survival analysis | Critical Care | Full Text
    Sep 6, 2004 · The survival function S(t) is defined as the probability of surviving at least to time t. The hazard function h(t) is the conditional ...
  19. [19]
    [PDF] Survival Distributions, Hazard Functions, Cumulative Hazards
    ability density function (pdf) and cumulative distribution function (cdf) are most commonly used to characterize the distribution of any random variable,.<|control11|><|separator|>
  20. [20]
    [PDF] Methods of the Survival Analysis
    survival analysis covers the field where where F(x) is the cumulative distribution independence between survival times function. The survival function describes.
  21. [21]
    Life expectancy: what does it measure? - PMC - NIH
    Jul 22, 2020 · The survival function is then used to calculate the average length of life, the LE, as follows: LE = ∫ 0 ∞ l ( x ) d x, (2). LE corresponds ...
  22. [22]
    Comparing survival curves based on medians
    Mar 16, 2016 · We propose a new statistical method for comparing survival curves based on their medians. The new method can be easily implemented and applied to censored ...
  23. [23]
    Survival prediction models: an introduction to discrete-time ... - NIH
    Jul 26, 2022 · We present a guide for developing survival prediction models using discrete-time methods and assessing their predictive performance.
  24. [24]
    Parametric Models in Survival Analysis - Lawless - 2005
    Jul 15, 2005 · This entry reviews common parametric models such as the exponential, Weibull, and lognormal distributions, and associated methods of inference.<|control11|><|separator|>
  25. [25]
    5 The Parametric Models | Survival Analysis - Oxford Academic
    The types of parametric distributions employed in survival analysis include exponential, Weibull ... log-normal, and log-logistic distributions. And the ...5 5 The Parametric Models · The Weibull Model · The Piecewise Exponential...
  26. [26]
    Parametric Distributions for Survival and Reliability Analyses ... - MDPI
    Oct 21, 2022 · We fitted the exponential, Weibull, lognormal, and Pareto type I distributions in order to show the different shapes of the survival functions.
  27. [27]
    (PDF) Parametric Distributions for Survival and Reliability Analyses ...
    Oct 17, 2022 · In this paper, we comprehensively review the historical backgrounds and statistical properties of a number of parametric distributions used in survival and ...
  28. [28]
    [PDF] Regression Models and Life-Tables - DR Cox
    Mar 5, 2004 · The present paper is largely concerned with the extension of the results of Kaplan and Meier to the comparison of life tables and more generally ...
  29. [29]
    Survival Analysis and Interpretation of Time-to-Event Data
    Right censoring is the most common type of censoring in survival studies ... Bias due to left truncation and left censoring in longitudinal studies of ...
  30. [30]
    Censoring in survival analysis: Potential for bias - PMC - NIH
    Censoring in survival analysis should be “non-informative,” ie participants who drop out of the study should do so due to reasons unrelated to the study.
  31. [31]
    Handling Censoring and Censored Data in Survival Analysis: A ...
    Sep 24, 2021 · Revelation was that censoring has the potential of biasing results and reducing the statistical power of analyses if not handled with the ...Abstract · Introduction · Practical Cases of Censoring... · Developments Made to...
  32. [32]
    [PDF] 1 Definitions and Censoring
    Generally we deal with right censoring & sometimes left truncation. Two types of independent right censoring: Type I : completely random dropout (eg emigration) ...
  33. [33]
    [PDF] Survival Analysis: Left-Truncated Data Introduction
    Useful data will be excluded when data is censored but not accounted for, and biases can be introduced when data is truncated. Since censoring and truncation ...<|control11|><|separator|>
  34. [34]
    [PDF] Likelihood construction - MyWeb
    Interval censoring. Likelihood with left/right censored data. 0.0. 0.5. 1.0. 1.5 ... Left truncation is actually quite common outside of survival analysis ...
  35. [35]
    Interval censoring - PMC - NIH
    For the analysis of interval-censored data, we will first discuss nonparametric estimation of a survival function as well as a hazard function in Section 3. In ...Missing: digital wearables 2020s
  36. [36]
    Biases in Electronic Health Records Data for Generating Real-World ...
    Absence of information for a period of time with information available both before and after that period is referred to as interval censoring. The movement of ...
  37. [37]
    [PDF] A package for survival analysis in R
    The profile likelihood confidence limits for the age coefficient are the ... for example, where the profile likelihood interval is much more reliable.<|control11|><|separator|>
  38. [38]
    Exponential and Weibull Survival Analysis - SAS Help Center
    Oct 28, 2020 · This example shows you how to use PROC MCMC to analyze the treatment effect for the E1684 melanoma clinical trial data. These data were ...
  39. [39]
    Survival prediction models: an introduction to discrete-time modeling
    Jul 26, 2022 · We present a guide for developing survival prediction models using discrete-time methods and assessing their predictive performance
  40. [40]
    [PDF] Building and Checking Survival Models - David Rocke
    We use Cox-Snell residuals to test for goodness of fit. We use martingale residuals to look for non-linearity. We can also look at dfbeta for influence. David M ...
  41. [41]
  42. [42]
    Proportional hazards tests and diagnostics based on weighted ...
    Proportional hazards tests and diagnostics based on weighted residuals. PATRICIA M. GRAMBSCH,.
  43. [43]
    Improved AIC Selection Strategy for Survival Analysis - PMC - NIH
    In this paper, we extend the AIC C selection procedure of Hurvich and Tsai to survival models to improve the traditional AIC for small sample sizes.
  44. [44]
    Influence diagnostics for the Cox proportional hazards regression ...
    This article investigates the performance of several residuals for the Cox proportional hazards regression model to diagnose the influential observations.
  45. [45]
    cross-validation in survival analysis - Wiley Online Library
    Cross-validation in survival analysis uses a cross-validated log-likelihood to measure how well each observation can be predicted using others, assessing ...
  46. [46]
    Bootstrapping the out-of-sample predictions for efficient and ... - NIH
    We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.The Tibshirani And... · Empirical Evaluation · Real Datasets<|control11|><|separator|>
  47. [47]
    Estimation of Parameters in Cox's Proportional Hazard Model
    The numerical Newton-Raphson approach used to optimize the partial likelihood function in Cox's proportional hazard model does not converge when there are ...
  48. [48]
    Covariance Analysis of Censored Survival Data - jstor
    The use of regression models for making covariance adjustments in the comparsion of survival curves is illustrated by application to a clinical trial of ...
  49. [49]
    The Efficiency of Cox's Likelihood Function for Censored Data - jstor
    D.R. Cox has suggested a simple method for the regression analysis of censored data. We carry out an information calculation which shows that Cox's method ...
  50. [50]
    Prognostic Factors Analysis of 17600 Melanoma Patients
    Factors predicting melanoma-specific survival rates were analyzed using the Cox proportional hazards regression model. Follow-up survival data for 5 years or ...
  51. [51]
    Cox's Regression Model for Counting Processes: A Large Sample ...
    The Cox regression model for censored survival data specifies that covariates have a proportional effect on the hazard function of the life-time distribution ...
  52. [52]
    Penalized Cox regression analysis in the high-dimensional and low ...
    We propose in this paper to use the LARS algorithm to obtain the solutions for the Cox model with L1 penalty in the setting of very high-dimensional covariates ...
  53. [53]
    Prognosis of lasso-like penalized Cox models with tumor profiling ...
    Oct 5, 2022 · Although there is no selection, the ridge regression has shown promise and reliability for survival prediction using high-dimensional microarray ...Missing: 2020s | Show results with:2020s<|control11|><|separator|>
  54. [54]
    Accelerated failure time models for reliability data analysis
    In this paper, a unified treatment of the accelerated failure time model is outlined for the standard reliability distributions (Weibull, log-normal, inverse ...
  55. [55]
    The accelerated failure time model: A useful alternative to the cox ...
    The accelerated failure time model has an intuitive physical interpretation and would be a useful alternative to the Cox model in survival analysis.
  56. [56]
    The Statistical Analysis of Failure Time Data - Wiley Online Library
    The Statistical Analysis of Failure Time Data. Author(s): John D. Kalbfleisch, Ross L. Prentice, First published:26 August 2002.
  57. [57]
    None
    Nothing is retrieved...<|control11|><|separator|>
  58. [58]
    Cox proportional hazard versus accelerated failure time models
    ... lymphoblastic leukemia patients: Cox proportional hazard versus accelerated failure time models ... sample size within strata) but also prevents the ...Missing: example | Show results with:example
  59. [59]
    [PDF] TIME-DEPENDENT COVARIATES IN THE COX PROPORTIONAL ...
    ABSTRACT. The Cox proportional-hazards regression model has achieved widespread use in the analysis of time-to-event data with censoring and covariates.Missing: seminal | Show results with:seminal
  60. [60]
    The Impact of Heterogeneity in Individual Frailty on the ... - jstor
    paper (Vaupel et al., 1979). Figure 2 compares cohort mortality rates with mortality rates for individuals of standard frailty (i.e., z = 1), at three.
  61. [61]
    Frailty models for survival data | Lifetime Data Analysis
    A frailty model is a random effects model for time variables, where the random effect (the frailty) has a multiplicative effect on the hazard.Missing: seminal | Show results with:seminal
  62. [62]
    A Proportional Hazards Model for the Subdistribution of a ...
    This article proposes a semiparametric proportional hazards model for the subdistribution, using partial likelihood and weighting techniques to estimate ...
  63. [63]
    Tree-structured survival analysis - PubMed - NIH
    Tree-structured survival analysis. Cancer Treat Rep. 1985 Oct;69(10):1065-9. Authors. L Gordon, R A Olshen. PMID: 4042086. Abstract. In this note, tree ...Missing: seminal paper
  64. [64]
    Random survival forests - Project Euclid
    We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing ...
  65. [65]
    DeepSurv: personalized treatment recommender system using a ...
    Feb 26, 2018 · We introduce DeepSurv, a Cox proportional hazards deep neural network and state-of-the-art survival method for modeling interactions between a patient's ...
  66. [66]
    [PDF] DeepHit: A Deep Learning Approach to Survival Analysis with ...
    This paper presents a novel approach, DeepHit, to the analysis of survival data. DeepHit trains a neural network to learn the estimated joint distribution ...
  67. [67]
    [PDF] RNN-SURV: a Deep Recurrent Model for Survival Analysis
    In this paper we present a new recurrent neural network model for personalized survival analysis called rnn-surv. Our model is able to exploit censored data to ...Missing: CNN | Show results with:CNN
  68. [68]
    A General Framework for Survival Analysis and Multi-State Modelling
    Jun 8, 2020 · We propose the use of neural ordinary differential equations as a flexible and general method for estimating multi-state survival models.Missing: illness- seminal
  69. [69]
    A deep survival analysis method based on ranking - ScienceDirect
    In this paper, we propose an innovative loss function that is defined as the sum of an extended mean squared error loss and a pairwise ranking loss based on ...
  70. [70]
    [PDF] Review Article Handling Censoring and Censored Data in Survival ...
    Sep 15, 2021 · These methods include censored network estimation; conditional mean imputation method; inverse probability of censoring weighting; maximum ...
  71. [71]
    Interpretable deep learning for improving cancer patient survival ...
    Jul 13, 2023 · In this work, we performed interpretable deep learning on cancer data from The Cancer Genome Atlas (TCGA) involving 33 primary tumor types.
  72. [72]
    Transformer-Based Deep Survival Analysis
    In this work, we propose a new Transformer-based survival model which estimates the patient-specific survival distribution. Our contributions are twofold.Missing: 2020s | Show results with:2020s
  73. [73]
    Privacy-Preserving Federated Survival Support Vector Machines for ...
    Mar 29, 2024 · This study aims to develop and validate a privacy-preserving, federated survival support vector machine (SVM) and make it accessible for researchers.
  74. [74]
    The impact of heterogeneity in individual frailty on the ... - PubMed
    The results imply that standard methods overestimate current life expectancy and potential gains in life expectancy from health and safety interventions.Missing: survival | Show results with:survival
  75. [75]
    [PDF] Penalized Survival Models and Frailty - Mayo Clinic
    Jun 5, 2000 · In particular, we demonstrate that if the frailty has a gamma distribution, then the shared frailty model can be written exactly as a ...
  76. [76]
  77. [77]
    Overall Survival in Phase 3 Clinical Trials and the Surveillance ... - NIH
    May 24, 2022 · In this systematic review of 150 phase 3 clinical trials, fewer than one-third of the trials resulted in improvement of overall survival by 2 months or more.
  78. [78]
    Data‐driven framework for warranty claims forecasting with an ...
    Sep 25, 2023 · In this research, we propose a data-driven framework for warranty forecasting that (i) addresses warranty data maturation, (ii) considers non-parametric models.
  79. [79]
    (PDF) USING SURVIVAL ANALYSIS IN ECONOMICS - ResearchGate
    ‎ In this paper‎,‎ the concepts of survival analysis have been used to model the factors affecting the duration of unemployment‎.‎ To investigate the ...
  80. [80]
    [PDF] Modeling Customer Lifetimes with Multiple Causes of Churn
    Sep 3, 2010 · In this article, we draw both on research that has examined the underlying factors that contribute to customer churn and on research on ...
  81. [81]
    Marriage survival in new married couples: A competing risks ... - NIH
    Aug 17, 2022 · This study aimed at utilizing competing risks survival analysis to investigate the marriage survival of new couples in Tabriz.
  82. [82]
    Using survival theory models to quantify extinctions - ScienceDirect
    In this study, we introduce the use of “survival theory” to estimate the mean time to extinction (MTE) of a species given uncertain observational records in ...
  83. [83]
    Survival Analysis of Patients With COVID-19 in India by ...
    May 6, 2021 · The aim of this study was to conduct a survival analysis to establish the variability in survivorship of patients with COVID-19 in India by age group and sex ...Missing: 2020s | Show results with:2020s<|separator|>
  84. [84]
    Acute myeloid leukemia: survival analysis of patients at a ... - NIH
    Methods. The overall survival and disease-free survival were statistically evaluated using the Kaplan–Meier method, the log-rank test and multivariate ...
  85. [85]
    lifelines — lifelines 0.30.0 documentation
    ### Summary of lifelines Python Library
  86. [86]
    scikit-survival — scikit-survival 0.25.0
    **Summary of scikit-survival Python Package:**
  87. [87]
    What Is Survival Analysis? - MATLAB & Simulink
    ### Summary of MATLAB Survival Analysis Tools in Statistics Toolbox
  88. [88]
    High-Dimensional Survival Analysis: Methods and Applications - PMC
    We will review various cutting-edge methods that handle survival outcome data with high-dimensional predictors, highlighting recent innovations in machine ...
  89. [89]
    Deep Learning-Based Survival Analysis for High-Dimensional ...
    In this paper, we evaluated the prediction performance of the DNNSurv model using ultra-high-dimensional and high-dimensional survival datasets.
  90. [90]
    CRAN Task View: Survival Analysis
    ### Summary of Key R Packages for Survival Analysis
  91. [91]
    [PDF] DeepHazard: A Tensorflow/Keras and PyTorch- Compatible Deep ...
    Representing a versatile and user-friendly framework, DeepHazard enables researchers to easily develop, customize, and deploy deep learning models for survival ...
  92. [92]
    TensorFlow Probability
    ### Summary of TensorFlow Probability (TFP) Content
  93. [93]
    Predict lung cancer survival status using multimodal data on ...
    Nov 8, 2022 · We're announcing a new solution on lung cancer survival prediction in Amazon SageMaker JumpStart, based on the blog posts Building Scalable ...Predict Lung Cancer Survival... · Feature Engineering · Medical Imaging