Select Page

December 9, 2021

The new variant for Omicron is causing a lot of stir around the world.  Most of the data analysis conducted thus far has been preliminary and other planned data analyses are taking weeks in order to understand the potential severity of the variant and its effect on efficacy.

A preprint article coming out of South Africa has been a primary source of information from the data analysis and results they reported on this new variant.  They conducted retrospective analysis of routine epidemiological surveillance data on SARS-CoV-2 that has specimen receipt dates between 04 March 2020 and 27 November 2021, collected through South Africa’s National Notifiable Medical Conditions Surveillance System.

Their statistics section is very confusing in that they don’t actually state what models they used but we are lead to infer through various terms they dropped.  They do not actually state how they decided to calculate the probability of infection. They use some reinfection hazard coefficient, lambda.  They state they used a model fitted to observed reinfection incidence through

30 September 2020 or 28 February 2021, assuming data are negative binomially-distributed with a mean. They then fitted the lambda and the inverse of the negative binomial dispersion parameter to the data using a Metropolis-Hastings Monte Carlo Markov Chain (MCMC) estimation procedure implemented in the R Statistical Programming Language. Next they discuss convergence of the model, but never once stated what model. Even if they are using Bayesian techniques, there is still some model to be stated here which they do not. Then once they fit the joint posterior distribution they were able to simulate a time series for time to reinfection.

In a second part, they allowed time-varying hazards and they calculated various susceptible probabilities based on assumed probabilities of re-infection amongst different type of people.  They then say they used a generalized linear mixed model (GLMM) to obtain relative hazards of infection and they utilized a Poisson distribution with a log link function. Finally, some clarity on modeling is offered. What is confusing is their loose language about relative hazards. We generally think of hazard defined the was it has been in survival analyses, but they do not seem to be employing any survival model based on hazard rates so also was a source of mystery about their analyses.

It would seem they did not have a statistician write this language and they possibility secondarily obtained the writeup through their scanning what the R language function chosen produced.  They used Bayesian methods, negative binomial, time-varying hazards in an GLMM framework.

According to their results, the recent spread of the Omicron variant has been associated with a decrease in the hazard coefficient for primary infection and an increase in reinfection hazard coefficient. The estimated hazard ratio for reinfection versus primary infection for the period from 1 November 2021 to 27 November 2021 versus wave 1 was 2.39 (CI95: 1.88–3.11).  Again, I am not sure of which model this came from as they never stated using survival analysis other than discussing modeling the hazard of infection. Their analyses also suggested that after the 2nd reinfection that a 3rd one was probably related to the Omicron variant. Clearly, Omicron appears to be causing more reinfection than the delta and beta variants and also appears to evade immunity from primary infection in the population that they have studied.  However, this is all still new and this paper has also not been peer reviewed. More is coming on the horizon.

Written by,

Usha Govindarajulu


COVID-19, Omicron, variant, hazard ratio, GLMM, biostatistics, Usha Govindarajulu


Pulliam, JRC,*, Schalkwyk CV, Govender N, von Gottberg A, Cohen C, Groome MJ, Dushoff J, Mlisana K, Moultrie H. (2021) “Increased risk of SARS-CoV-2 reinfection associated with emergence of the Omicron variant in South Africa”.  medRxiv. doi:,