Considerable SARS-CoV-2 spike proteins mutants analysis recognizes subclonal variants a sign of within-host virus-like diversity

Since the first reported coronavirus disease 2019 (COVID-19) case in December 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome has continued to mutate and evolve. Some mutations in the SARS-CoV-2 genome might influence the development of effective intervention strategies.

Study: Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the need for continuous screening of virus isolates. Image Credit: Dotted Yeti/ Shutterstock

Coronaviruses are genetically stable compared to other RNA viruses due to their inherent 3′ to 5′ exoribonuclease activity. However, mutations of critical residues in the receptor-binding domain (RBD) of the spike protein (S protein) might enhance the virus transmissibility. Furthermore, mutations in RBD might interfere with the efficacy of vaccines and treatments targeting the S protein.

In their recent study published on PLOS ONE, researchers from Germany have investigated a vast SARS-CoV-2 assembly and next-generation sequencing (NGS) data set, collected across the globe, to detect non-synonymous S protein mutations and to evaluate their effect on potential antibody binding sites and known T cell epitopes.


The S protein consists of the S1 subunit at N-terminal and the S2 subunits at C-terminal. The receptor-binding domain (RBD) in the S1 subunit engages with human angiotensin-converting enzyme 2 (ACE2) as its entry receptor on the host cell surface. Subsequently, the S2 subunit helps fuse the viral envelope with the host cell membrane.

The genomic region encoding the RBD is highly conserved, making it an attractive vaccine target as it elicits high-quality, protective antibodies. Mutation N501 in RBD has been shown to enhance viral infectivity by improving the binding between SARS-CoV-2 and the human ACE2 receptor.

What did the researchers do?

The team based their investigation on more than a million SARS-CoV-2 genomic assemblies and 30,806 NGS datasets.

Pairwise alignments to the SARS-CoV-2 Wuhan-Hu-1 reference genomic sequence (MN908947.3) were performed on genomic assemblies downloaded from the global initiative on sharing avian influenza data (GISAID) database in April 2021.

The team additionally downloaded all NGS data available for SARS-CoV-2 in June 2021 from the European Nucleotide Archive (ENA). The human genome sequence was filtered out, and data were aligned to the reference MN908947.3. Variants in the S gene sequence were retrieved from the alignment files.

NGS variants with a sequencing depth of coverage of at least 30 reads coverage were selected to identify high-confidence sub-clonal mutations.

What did the researchers find?

The team discovered that only 2.5% of virus sequences contained the wild-type (WT) S protein. Mutant viruses exhibited only a few mutations in the S protein with less than ten mutations for all but 4,193 sequences.

However, the mean and median number of mutations increased over time from December 2019 (mean: 0.14, median: 0) to April 2021 (mean: 7.2, median: 7).

In total, the team detected 5,472 distinct non-synonymous mutations in the S protein. Only 22.4% of the mutations in the assembly and NGS data sets were singular events. The rest of them were recurrently distributed throughout the whole S protein.

The most common recurrent mutation was D614G, located just outside the RBD, in both the genome assemblies and the NGS data sets, followed by the Y501N mutation, located within RBD. In the region encoding RBD, the team detected 852 mutations (646 recurrent) from the assembly sequences and 259 mutations (105 recurrent) from the NGS datasets.

P681H/D614G was the most common co-occurring mutation detected in 345,808 samples from combined assembly and NGS data sets. Another frequently co-occurring mutation was P681H/T716I, observed in 324,269 samples.

Subclonal S protein mutations, which indicate either a co-infection with multiple SARS-CoV-2 strains or an intra-host evolution of the virus strain, were detected in 2.59% of the NGS data sets. Most of these subclonal events were recurrent.


Researchers from Germany identified an overall low mutation burden in the SARS-CoV-2 S protein. However, the mean and median number of mutations per sample increased over time.

We identified around 99.1% of the samples with a D614G variant, which supports a previous theory of an increasing frequency of the D614G variant in the global pandemic”, say Sahin and colleagues.

S477N mutation potentially affects the RBD stability and strengthens the binding with the human ACE2 receptor. The team found a frequent co-occurrence of S477N with D614G. As per a previous report, this combination has been estimated to spread even more rapidly than the D614G mutant alone, highlighting the need to constantly track new SARS-CoV-2 variants and their disease transmission patterns for informed preventive and therapeutic strategies.

The team also suggests that subclonal variants, indicative of viral diversity within the host, may prevent complete clearance after treatment, resulting in resistant strains.

Recurrent mutations in subclonal variants might point to co-infection with multiple strains. Sample-specific variants in turn might rather indicate that the mutation occurred after infection within the host.”

Journal reference:
  • Schrörs, B. et al. (2021) “Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the need for continuous screening of virus isolates”, PLOS ONE, 16(9), p. e0249254. doi: 10.1371/journal.pone.0249254.

Content Source:

Gemma Wilson

Gemma is a journalism graduate with keen interest in covering business news – specifically startups. She has as a keen eye for technologies and has predicted quite a few successful startups over the last couple of years.

Related Articles