April 22, 2026
Written by,
In this article, they studied regression analysis of arbitrarily censored and left-truncated data under a popular semiparametric proportional odds model. They developed a new estimation approach via an expectation and maximization algorithm (EM) based on a novel data augmentation involving exponential and multinomial latent variables. The proposed method has been incorporated into the R package regPOspline for public use. Survival data are often subject to left truncation in addition to censoring. They employed a proportional odds (PO) model, which specifies a proportional relationship in terms of odds of failure times associated with different covariate values. The PO model implies that the hazard ratio at different covariate values converges to 1 as time progresses. Through their approach, they approximate the baseline odds function in the PO model with the integrated splines to reduce the number of unknown parameters involves in the baseline odds function while maintaining adequate modeling flexibility.
In their approach they assumed a non-informative censoring mechanism where the failure time is conditionally independent of the observational process given the covariates. The modeled the derivative of the baseline odds function using monotone splines (Ramsey, 1988). These are I-spline basis functions which are nonnegative piecewise polynomials determined by the degree and knot placement, which can control the smoothness and flexibility of the splines. They suggested using AIC or BIC to select optimal degree and number of knots.
Their first stage of the data augmentation takes advantage of the fact that the survival function under the PO model can be written as the marginal survival function from a frailty PH model with frailty following an exponential distribution (McMahan et al.2013; Murphy et al.1997; X. T. Shen 1998). This augmented likelihood was then used for their EM algorithm development.
This allowed for direct evaluation of the variance computation, which is another promising feature for their approach. Using these results, Wald inferences can be used to evaluate the covariate effects. Also, the outputs from the EM algorithm readily enabled the estimation of other functions, such as the survival functions for different subgroups with different covariates.
They conducted two sets of simulations. The first set considers left-truncated and arbitrarily censored data, while the second focuses on arbitrarily censored data only. They concluded that the simulations showed excellent estimation performance. They had also employed a real dataset application. They also calculated the Brier Score to assess predictive performance since the growing influence of machine learning hinges on predictive assessment, especially out-of-sample evaluation. Their current method does not accommodate time-varying covariates so it is limited to being applied in longitudinal settings. Also, possible violations of the non-informative censoring could be problematic.
Written by,
Usha Govindarajulu
Keywords: survival, regression, proportional odds, censoring, left-truncation
References:
McMahan, C. S., L. Wang, and J. M. Tebbs. 2013. “Regression Analysis for Current Status Data Using the EM Algorithm.” Statistics in Medicine 32: 4452–4466.
Murphy, S. A., A. J. Rossini, and A. W. van der Vaart. 1997. “Maximum Likelihood Estimation in the Proportional Odds Model.” Journal of the American Statistical Association 92: 968–976.
T Shen, X. 1998. “Proportional Odds Regression and Sieve Maximum Likelihood Estimation.” Biometrika 85: 165–177.
Wang L and Wang L (2026) “Regression Analysis of Arbitratily Censored and Left-Truncated Data Under the Proportional Odds Model” Biometrical Journal
https://doi.org/10.1002/bimj.70132