Forecasting outbreak of SARS-CoV-2 Omicron variant with Tajima's D

On the 26th of November, 2021, the World Health Organization declared the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant B.1.1.529, Omicron, to be a variant of concern. The variant had unusual features and unique mutations, which suggested that therapeutic monoclonal antibodies might be less effective against it. Although investigations are ongoing, there is a possibility that the new variant could have an impact on viral transmission or the severity of illness.

Study: Tajima D test accurately forecasts Omicron / COVID-19 outbreak. Image Credit: Adao/Shutterstock

Therefore, detecting it with accuracy is of paramount importance. In a new study, published on the medRxiv* preprint server, scientists reported the validity and accuracy of the Tajima’s D test score, with a threshold value of -2.50, as a predictor of new coronavirus disease 2019 (COVID-19) outbreaks.


The Omicron variant was first reported in the Gauteng province, South Africa on November 9, 2021. Preliminary evidence from genotyping tests suggested that Omicron may have been in circulation for quite some time in South Africa. A major concern is that this variant contains more than 30 mutations to the spike protein, some of which were previously identified in the Delta or Alpha variants and were linked to heightened infectivity and the ability to evade infection-blocking antibodies. The likelihood of higher transmission rates led many countries to respond quickly by taking several strict measures.

The phylogenetic and evolutionary status of the Omicron variant needs to be studied in greater detail. In the current study, scientists investigated the phylogenetic relationship and selection pressure by analyzing 131 available sequences of Omicron variant, from 10 countries, i.e., Austria, Australia, Belgium, Botswana, Canada, China, Hong Kong, Italy, South Africa, and the UK, available from the GISAID data.

Key findings

Researchers found the insertion mutation Ins214EPE at the spike protein in the Omicron variant. At least 6 major subgroups of 131 Omicron variant sequences were identified after rooting with an outgroup virus sequence of SARS-CoV-2 WIV04 from Wuhan, China. 86.3% of the cases contained an insertion of nine nucleotides (GAGCCAGAA) between nucleotide 22204 and 22205, which led to the insertion of three amino acids in the N-terminal domain (NTD) of the spike protein.

Previous research showed that most ins214 motifs were rare in sequences of different lineages of SARS-CoV-2, with 3-4 amino acids being inserted. The amino acid identity at ins214 varies across SARS-CoV-2 and SC2r-CoV lineages. However, the insertion size is conserved in Omicron. Whether or not INS214EPE affects spike protein function or immune response is something that future research should consider.

The selection pressure of the Omicron variant was first analyzed by multiple neutrality tests. Seventy-five mutation sites were identified with nucleotide diversity equal to 0.00008, which is significantly lower than the earlier outbreak of Delta in the UK or Australia. The Tajima D test was calculated to compare the nucleotide diversity and total polymorphism and the obtained values were found to be negative and significantly different from zero. This result indicated an excess of nucleotide variants of low frequency and that a strong selection and demographic expansion was operating in the Omicron variant. Taken together, the low nucleotide diversity and Tajima D values suggested that the Omicron variant possibly spread within the population weeks before the current samples were collected.

One point to note is that distinguishing between the influence of selection pressure and demographic expansion is difficult and this makes the application of Tajima D limited. To address this issue, scientists included additional tests, such as the normalized DH test and Zeng’s E test, in the analysis. Researchers previously demonstrated the limitations to applying the inter-species divergence for analyzing the selection pressure of SARS-CoV-2. In the current study, a purifying selection using a modified Tajima’s D statistics instead of dN/dS (w) test was used and the results suggested that purifying selection led to constraints on the neutral mutations at non-synonymous sites of the spike gene of the Omicron variant.


Researchers had previously observed one-three weeks after the Tajima D value fell below -2.50, Delta variant outbreaks emerged in India and the UK, and the Lambda variant outbreak emerged in South America. This led them to propose that the Tajima D test with a cut-off threshold value at -2.50, could be a strong predictor of new SARS-CoV-2 outbreaks. The sample size in the current study was small (N=131), but despite that, a strong negative value of Tajima D in the Omicron variant was detected. The findings were also alarming as they suggested that the Omicron outbreak had emerged sometime before the current breakout. In sum, the study argued that that rapid genomic sequence surveillance is absolutely crucial and Tajima D tests should be included to predict future outbreaks in different populations.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:

Content Source:

Gemma Wilson

Gemma is a journalism graduate with keen interest in covering business news – specifically startups. She has as a keen eye for technologies and has predicted quite a few successful startups over the last couple of years.

Related Articles