Select Page

January 14, 2026

To evaluate the performance of a prediction model in time-to-event outcomes with censoring is very difficult. Interval censoring and competing risks present additional challenges. They proposed two methods to deal with interval censoring: a model-based approach and inverse probability of censoring weighting (IPCW) approach, focusing on 3 key time-dependent metrics: rea under the receiver operating characteristic curve, Brier score, and expected predictive cross-entropy.

They defined the progression-specific sensitivity for a threshold value, c, as the probability that the predicted risk is larger or equal to c for patients with cancer progression in the interval of interest, [t,tt]. For the specificity, used to evaluate the ability to identify the negative instances, it is defined as the probability that the predicted progression-specific risk for patients who “survive” the interval of interest is lower than the threshold, c.  Since in the presence of interval censoring, however, it is often not clear whether a patient is a case, a control, or even had the event before the interval of interest, they, therefore, proposed two methods to deal with this uncertainty when estimating the progression-specific sensitivity and specificity: model-based approach and IPCW.

In the model-based approach, all subjects at risk at time t are considered when calculating the sensitivity but they contribute with different weights which depend on their estimated probability of experiencing cancer progression during the interval of interest for the patient i in the test set. The weights themselves are derived from the cumulative incidence function estimated by the model.  In the IPCW approach, one can utilize only the subset of patients for whom the event is known to be at the interval of interest, that is, the absolute cases and weigh them to also represent the patients who were censored before experiencing the primary event of interest. These weights are the inverse of the probability of not being censored before time t, obtained using the KM estimator.

The Brier score was also calculated and it is a metric that combines both discrimination and calibration by quantifying how close the predicted probabilities are to the actual binary outcomes.  A lower score indicates better model performance. This score can be calculated by both their model-based approach as well as their IPCW approach, using the same subsets of patients and weights for the AUC. Finally, the expected predictive cross-entropy (EPCE) was calculated and this is an evaluation metric from information theory which describes the expected value of the cross-entropy between the true and predicted risk distributions (Commenges et al, 2012). The results of a real data application showed that the model-based and IPCW approaches can result in relatively different estimates of AUC and Brier score. They then conducted simulations to compare these metrics using longitudinal information on the measure from the real data. They wanted to test if model misspecification might lead to overestimation of the model’s predictive performance.

Their simulation study showed that the IPCW estimates typically have greater variability but are more robust to model misspecification. The model-based approach was more sensitive to the model misspecification but was more robust to the frequency of the examinations (resulting in interval censoring). In terms of biopsy schedules, for sparser biopsy schedules, not only the uncertainty about the true event time becomes larger, but, as a result, fewer patients will be included in the IPCW approach since it is less likely that the risk interval between a negative and positive biopsy which is fully contained in the interval of interest. In most of the scenarios, especially the Brier score and the combination of precision and variability viewed through root mean squared error (RMSE), the model-based approach outperformed the IPCW approach. The EPCE can only be calculated for the model-based approach. Even with interval censoring and competing risks, the model-based EPCE can recover the reference EPCE (i.e., in the ideal scenario without any censoring).

The model-based approach appeared to have been able to handle more deviations than the IPCW approach, like informative censoring and timing of examination not dependent on any covariate information or previously observed repeated measurements. This is why they concluded that the model-based approach is a reliable method in terms of bias and variability for evaluating prediction models with time-varying covariates, competing risks, and interval censoring. This is especially true when interval censoring results in losing the information of a subject being a case or control that is essential in the IPCW approach.

Written by,

Usha Govindarajulu

Keywords:  time-dependent predictive accuracy, interval censoring, competing risks, Brier score

References:

Commenges, D., B. Liquet, and C. Proust-Lima. 2012. “Choice of Prognostic Estimators in Joint Models by Estimating Differences of Expected Conditional Kullback–Leibler Risks.” Biometrics 68, no. 2: 380–387. https://doi.org/10.1111/j.1541-0420.2012.01753.x.

Yang Z, Rizopoulos D, Newcomb LF, and Erler NS (2026) “Time-Dependent Predictive Accuracy Metrics in the Context of Interval Censoring and Competing Risks” Biometrical Journal.  https://doi.org/10.1002/bimj.70108Digital Object Identifier (DOI)

https://onlinelibrary.wiley.com/cms/asset/ebc0d69f-bdf9-462c-af06-35d85d8a5919/bimj70108-fig-0002-m.jpg