Select Page

October 23, 2024

In terms of the Cox proportional hazards model, adding in a lasso feature for variable selection or other penalized method was typically done in the partial likelihood. Their method has sought to add the lasso penalty into the full likelihood. As the authors have stated, despite the predominance of the partial likelihood in existing R routines, there are some advantages when using the full likelihood which they pointed out. The first is that the baseline hazard can be modeled explicitly, for example, using a basis function approach such as B-splines (see, e.g., Eilers and Marx 1996),  The second is that the full likelihood model can easily be extended by a wide class of frailty distributions including random intercepts and random slopes, The third is that time-varying covariates can be naturally incorporated.

The authors stated that the partial likelihood ignores the nonfailure intervals which might influence survival outcomes so information could be lost and estimates could be biased.  They have developed a function in the R software, coxFL, which implements the unregularized Cox full likelihood approach, allowing for changing covariates, frailties, and time-varying coefficients but their main contribution is a function, coxlasso, which uses the classical lasso penalization and can adapt the same items as the other function.  They wrote their likelihood in terms of an individual predictor for the jth individual belonging to the ith cluster. The random effects part of this individual predictor, ηij, followed a Gaussian distribution. It was maximized via a penalized quasi-likelihood approach proposed by Breslow and Clayton (1993), which applied a Laplace approximation to the penalty term. They modeled the baseline hazard as a smooth function using B-splines. The authors used the spline coefficients corresponding to the baseline hazard and time-varying effects.  They used frailty on the grouping structure with a log-normal distribution. Also they used a lasso penalty.  The estimation was done by an iterative Newton-Raphson algorithm.

They conducted simulations to compare the performance of their method to predominant survival packages.  They compared to coxph, coxme, penalized, and glmnet.  They could really only compare to coxph and coxme in terms of the random effects, but these do not perform variable selection and also coxme does not allow to extract the baseline hazard so they couldn’t compare that part to their package, coxlasso so they could only compare that part to the glmnet and penalized.  In all respects, their package performed the best overall in terms of providing the most robust estimates. Also, in a real dataset genetic lung cancer as well as a breastfeeding data application, it also performed well.

Their work essentially proposed a flexible regularized Cox frailty model based on the full likelihood.  Using that framework allowed them to directly estimate the smooth baseline hazard via p-splines and also include time-varying covariates and effects. The smoothing was carried out via a mixed model representation of the spline coefficient and the covariates were regularized using a lasso penalty with adaptive weights where categorical variables were penalized using a group lasso.  All of this was done in their function, coxlasso.  They recommend using their function in a high dimensional setting with time-varying covariates and/or cluster structure and/or also important variables of interest.

 

Written by,

Usha Govindarajulu, PhD

 

Keywords: survival, Cox model, full likelihood, frailty, time-varying, lasso, penalty, splines

References

Eilers, P. H. C., and B. D. Marx. 1996. “Flexible Smoothing With B -Splines and Penalties.” Statistical Science 11, no. 2: 89–121.

Hohberg M and Groll A (2024).  “A Flexible Adaptive Lasso Cox Frailty Model Based on the Full Likelihood” Biometrical Journal. https://doi.org/10.1002/bimj.202300020

https://onlinelibrary.wiley.com/cms/asset/7536eec0-df01-4344-850a-26f446c7aa7d/bimj2617-fig-0001-m.jpg