The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period

Huang, Zhong-Zhou; Tan, Jing; Huang, Ping; Li, Bai-Sheng; Guo, Qing; Liang, Li-Jun

doi:10.1038/s41598-024-71349-8

The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period

Article
Open access
Published: 03 September 2024

Volume 14, article number 20418, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period

Download PDF

Zhong-Zhou Huang^1,2,3,
Jing Tan^3,5,6,
Ping Huang^2,3,4,5,
Bai-Sheng Li^3,4,5,
Qing Guo¹ &
…
Li-Jun Liang^3,4

83 Accesses
Explore all metrics

Abstract

The epidemic and outbreaks of influenza B Victoria lineage (Bv) during 2019–2022 led to an analysis of genetic, epitopes, charged amino acids and Bv outbreaks. Based on the National Influenza Surveillance Network (NISN), the Bv 72 strains isolated during 2019–2022 were selected by spatio-temporal sampling, then were sequenced. Using the Compare Means, Correlate and Cluster, the outbreak data were analyzed, including the single nucleotide variant (SNV), amino acid (AA), epitope, evolutionary rate (ER), Shannon entropy value (SV), charged amino acid and outbreak. With the emergence of COVID-19, the non-pharmaceutical interventions (NPIs) made Less distant transmission and only Bv outbreak. The 2021–2022 strains in the HA genes were located in the same subset, but were distinct from the 2019–2020 strains (P < 0.001). The codon G → A transition in nucleotide was in the highest ratio but the transversion of C → A and T → A made the most significant contribution to the outbreaks, while the increase in amino acid mutations characterized by polar, acidic and basic signatures played a key role in the Bv epidemic in 2021–2022. Both ER and SV were positively correlated in HA genes (R = 0.690) and NA genes (R = 0.711), respectively, however, the number of mutations in the HA genes was 1.59 times higher than that of the NA gene (2.15/1.36) from the beginning of 2020 to 2022. The positively selective sites 174, 199, 214 and 563 in HA genes and the sites 73 and 384 in NA genes were evolutionarily selected in the 2021–2022 influenza outbreaks. Overall, the prevalent factors related to 2021–2022 influenza outbreaks included epidemic timing, Tv, Ts, Tv/Ts, P137 (B → P), P148 (B → P), P199 (P → A), P212 (P → A), P214 (H → P) and P563 (B → P). The preference of amino acid mutations for charge/pH could influence the epidemic/outbreak trends of infectious diseases. Here was a good model of the evolution of infectious disease pathogens. This study, on account of further exploration of virology, genetics, bioinformatics and outbreak information, might facilitate further understanding of their deep interaction mechanisms in the spread of infectious diseases.

Global circulation patterns of seasonal influenza viruses vary with antigenic drift

Article 08 June 2015

Genome-wide study of globally distributed respiratory syncytial virus (RSV) strains implicates diversification utilizing phylodynamics and mutational analysis

Article Open access 19 August 2023

Antigenic drift and epidemiological severity of seasonal influenza in Canada

Article Open access 17 September 2022

Introduction

Influenza viruses are the infectious agents of acute upper respiratory tract infections that have had several global pandemics. According to the antigenic characteristics, human influenza viruses are divided into four types: A, B, C and D, of which influenza A virus gene and antigen are prone to mutation, often leading to an epidemic or pandemic; influenza B virus gene is relatively small variation, usually causing local outbreaks^1,2. In recent years, influenza type B has attracted global attention, as according to a report summarized with 47 studies, hospitalization in type B for 6.7d was longer than that in type A for 6.5d, and the clinical fatality rate (CFR) of people aged 50 years could reach to 2.5% (95%CI 0.7% ~ 7.6%, P < 0.001)³.

Influenza viruses are characterized by both surface glycoproteins, including hemagglutinin (HA) and neuraminidase (NA). The HA and NA mutations based on amino acid sequences were the basis of antigenic shift and drift⁴. Due to their lack of polymerase proofreading activity, influenza virus genes are frequently mutated without genetic correction, resulting in 1–2% annual divergence of influenza strains⁵. Genetic evolution might be demonstrated by genome structure mutations and nucleotide substitutions, while the mutations of genome structure include insertions/deletions (indels) and inversions and the nucleotide (Nt) substitutions involving single nucleotide variant (SNV) includes both transition (Ts) and transversion (Tv)⁶. Either purine (pyrimidine) ↔ purine (pyrimidine) transitions or pyrimidine ↔ purine transversions in SNV have been explored in their impact on nucleotide composition evolution in human and animal⁷. A longstanding approach for analyzing amino acid (AA) and protein is to compare rates of nonsynonymous (dN) and synonymous (dS) substitutions at each site, where these dN/dS ratios might be calculated by counting mutations or using phylogenetic substitution models⁸.

The epidemic of influenza arises from genetic and antigenic mutations, where the epitopes containing some key amino acids play a crucial role in antigenicity⁹. The D₆₁₄G substitution in SARS-CoV-2 enhances the pathogenic infectivity, of which both (Aspartic/Glutamic) belong to the acidic amino acids¹⁰. Furthermore, the conserved protective epitopes of hemagglutinin (HA) are essential to the design of a universal influenza vaccine and new targeted therapeutic agents, especially on small proteins and peptides¹¹. Thus, the amino acids that constitute the antigen domain, on account of their molecular structure, hydrophilicity and charged properties, might have different weights on their antigenicity.

Globally, there were three seasonal influenza strains (H3N2/H1N1/B) cocirculating across continents, but the non-pharmaceutical intervention (NPI) since the spring of 2020 resulted in only influenza B Victoria lineage (Bv) dominated from March 2020 till March 2021 in China¹². NPIs have been implemented worldwide, including travel restrictions, face masks, social distancing, public education on prevention measures, and school closures¹³. Due to the influence of NPI, influenza outbreaks and epidemic in southern China have become an investigation model for infectious disease transmission in an ideal closed-loop environment. Here, based on Bv epidemic information and some genetic sequence, we analyzed the data using the multiple statistical approaches based on spatio-temporal connectivity, to evaluate Ts/Tv substitutions, charged amino acid effects and variable-related outbreaks and to further explore the internal connections of genetic evolution with outbreaks.

Methods

Surveillance and gene sequence

This study was based on the National Influenza Surveillance Network (NISN, https://10.249.6.18:8881/cdc/)¹². Influenza surveillance is performed on basis of the National Influenza Surveillance Program (2017 edition)¹⁴. The definition of influenza-like illness (ILI) case is a case had body temperature ≥ 38 °C, accompanied with either cough or sore throat, but a lack of molecular detection; Influenza case is an ILI tested positive for nucleic acid of influenza virus. An influenza outbreak is defined as the occurrence of 10 or more cases of ILI in the same school, childcare institution, or other collective unit within a week.

A total of 72 Guangdong (GD) strains were selected by spatio-temporal sampling from March 2019 to April 2022, including 16, 9, 34 and 13 strains per year (no strains isolated during Apr-Dec 2020). Global strains were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) or GISAID (https://platform.epicov.org/epi3/frontend), including four vaccine strains (VSs) recommended by the World Health Organization (WHO) and five regional strains with characteristics. As for regional distribution, a total of 72 strains (42 from outbreaks and 30 from hospital sentinels) included in Chaozhou (5), Dongguan (2), Foshan (1), Guangzhou (5), Heyuan (4), Huizhou (5), Jiangmen (3), Maomin (6), Meizhou (3), Qingyuan (8), Shantou (2), Shanwei (2), Shaoguan (10), Shenzhen (6), Yunfu (2), Zhanjiang (4), Zhongshan (2) and Zhuhai (2) (Supplementary Table S1a,b). A set of primers of Bv strains was designed and synthesized, on account of the Bv strains isolated during 2016–2018 (Supplementary Table S2). The 72 strains in this study were extracted, amplified, and sequenced¹⁵, and then the genetic fragments were merged [GenBank accession PP989545-PP989616 (HA) and PQ037033-PQ037104 (NA)].

Genetic data processing

Nucleotide

The nucleotide (Nt) sequences based on open reading frame (ORF) are aligned by Clustal W while both phylogenetic trees of HA and NA genes (Figure S1) are established with Neighbor-Joining (NJ) in MEGA 11.0.13¹⁶. Nucleotide variation rates are calculated across stages and years using a single VS sequence benchmark, of which the differences were statistically compared using One-Way ANOVA. Grounded in the neutral theory of molecular evolution, the binary coalescent tree is the dual backward representation of the continuous-forward-time diffusion model of genetic drift. In species phylogeny and epidemiology, the tree structure is often used to compare different models of evolution or to fit model parameters¹⁷. Analysis of Molecular Variance (AMOVA) is a widely used method which employs variance to study the hierarchical genetic structure of populations, where the nucleotide diversity (D) is calculated as the mean pairwise genetic distance obtained¹⁷, while the statistical algorithm for genetic variability is as follows: D = [π − S/(0.5 − log(fmin))]/S. Here the pairwise dN/dS estimates were calculated for the coding regions¹⁹.

Based on the SNV, the sequenced data are clustered into 12 substitutions (4 × 3), pooling both transition G(C) ↔ A(T) or transversion G(C) ↔ C/T (A/G). Here the substitution ratio is calculated: R_i = M_i/N_i, where R is ratio, subscript i is the specific nucleotide and both M and N are the mutation Nt numbers per specific purine or pyrimidine²⁰.

The strains were sampled from both the outbreak (Ob) and the hospital sentinel (HS), while the strain sampling dates were classified into the Stage 1 (S₁, 2019 → February 2020) and the Stage 2 (S₂, March 2020 → Apr 2022), which depended on the subsequent statistical calculation (Cluster).

Amino acid

The amino acid (AA) sequences are aligned by Clustal W as well. The AA mutations are analyzed, then classified into four groups, the hydrophobic (H), polar (P), acidic (A) and basic (B) amino acids on account of their charged features (Supplementary Table S3). The Shannon entropy values per AA site are then calculated¹⁸. The equation in Shannon Entropy formulation is as follows, where L is a list of all possible amino acids in all the sequences and P_k(i) is the probability of finding the k^th amino acid at that position¹⁹.

$$ H\left( i \right) \, = - \sum\limits_{k \in L} {{P_k}(i) \times {{\log }_2}{P_k}(i)} $$

Evolutionary and cluster analysis

The rate of molecular evolution is measured by the number of Nt sequence mutations per unit time. Cumulative mutations across the whole gene region can effectively be used to estimate positive selection (or selective pressure) by calculating substitution ratio of dN/dS (ω)²¹. Sequence data including Nt and AA are analyzed using the χ² Test in categorical data and using the One-Way ANOVA in compare means. With the TwoStep and Hierarchical Clusters, the relationship among the variables related to influenza genetic and outbreak are analyzed²². Based on the Schwarz’s Bayesian Criterion (BIC), the Cluster model is as follows: − 2L_m + m × lnn, here with the maximized log-likelihood (L_m) and the sample size (n), which fitted for the minimum value.

Statistical evaluation

The data are processed using WPS tabulation and SPSS 23.0 (SPSS Inc., Chicago, IL.), where the mean value in statistics description is performed using Mean ± SD (normal distribution) or Median (P₂₅, P₇₅) (skew distribution)¹⁷.

The compare means and the nonparametric test are statistically significant relying on the P value < 0.05 in two sides, while the evolutionary selection is on the P value < 0.10. The correlation is significant based on the R value > 0.50. The statistical logic and processes of the other approaches are interpreted as needed.

Results

Nucleotide homology and evolution

Most of the HA and NA sequences in the ORF have 1749 bp and 1401 bp, respectively, but by the genetic alignment, the HA ORF in Bri/60/08 (2008 VS) being 1758 bp was deleted of Nt_529-537 (aaaaacgac) in all HA genes in the 2019–2022 strains in this study except for the HA gene in GD/1557/19 (1758 bp).

The homologies of HA and NA genes in GD strains were compared with those of four VSs (Table 1). It showed as follows, (1) The highest identities in HA gene occurred in 2020 (99.80 ± 0.07, 2019 VS) and in 2022 (99.23 ± 0.22, 2021 VS) (P < 0.001), respectively; (2) On account of HA genes of the 2021 VS, the HA genes of the GD strains during 2021–2022 were classified in the subset 2, which was different from those of other three VSs; (3) The highest identity in NA genes with the 2021 VS was in 2021 (98.57 ± 0.15) than in 2022 (98.43 ± 0.13) (P = 0.012), which indicated that the 2022 strains in both HA and NA genes further evolved than the 2021 VS.

Table 1 Homologies of HA/NA genes (ORF) between GD strains and vaccine strains (VSs).

Full size table

Both trees of HA and NA genes of 72 influenza Bv strains isolated in Guangdong (Supplementary Fig. S1) have the following characteristics, (1) The HA genes from 2019 to 2020 were closer to the 2019 VS (Was/2/19), and those in 2021–2022 were identical to that in the 2021 VS (Aus/1359417/21); (2) The NA gene in 2019 VS (Was/2/19) was closely related to those of the 2019–2020 GD strains, including the 2019–2022 ones, but these of other two VSs (Col/6/17 and Aus/1359417/21) were genetically different from those of the 2019–2022 GD strains, which suggested that the WHO recommended vaccine strains relied mainly on homologies of HA genes rather than those of NA genes.

Transition and transversion in SNV

The number and ratio of purine and pyrimidine mutations in the nucleotide of each HA and NA gene were calculated according to the reference Bri/60/08 (Table 2). It was showed that (1) Ts mutations were mostly larger than Tv ones, with three increased (A → G↑, C → T↑ and G → A↑) during two stages and one decreased (T → C↓) in HA genes, and one increased (G → A↑) in NA genes; (2) Tv mutations usually were more significant than Ts ones, including two increased (C → A↑ and T → A↑) in HA gene during two stages, while the NA genes were increased and decreased once each (A → T↑ and T → A↓); (3) Tv mutations in HA genes might contributed to the influenza outbreaks after NPI (C → A↑↑ and T → A↑↑); (4) The nucleotide variation ratios of both HA and NA genes were in the following order, G → A, A → G, C → T, T → C, A → T, T → A, A → C, G → T, T → G and C → G (no G → C substitution).

Table 2 Mutations of Ts/Tv on HA and NA genes during two stages.

Full size table

Evolutionary selection

On account of the dS/dN substitutions in the codons, evolutionary selections in both HA and NA genes in the present study were analyzed using both approaches FUBAR and MEME, shown in Table 3. Here the FUBAR had positive sites in HA genes including site 199, 214 and 563, where MEME had those including site 174, 214 and 563 (P < 0.10). Both methods were used for statistical assessment using Bayes Factor and MEME LogL. The positive sites in NA genes included the site 73 and 384 with FUBAR and only site 73 with MEME (P < 0.10) (Table 3, Supplementary Table S4). The positive selections suggested that these amino acid sites were under enormous external pressure. A lot of negative sites existed in FUBAR (Fig. 1).

Table 3 Evolutionary selection on genes of influenza viruses.

Full size table

Evolution comparison

Comparing the two evolutionary indicators [the evolutionary rate (ER) and the Shannon entropy value (SV)], both ERs and SVs were significantly correlated in HA genes (R_Pearson = 0.690), meanwhile both also in NA genes (R_Pearson = 0.711); which indicated that both the ERs and SVs were consistent in the same gene (Fig. 2). The differences between two indicators might be that the ER focuses on all nucleotide mutations (dS and dN), and the SV mainly does on the amino acid site of the dN mutation.

Based on the Tajima’ Neutrality Test, the evolutionary selections were present (Supplementary Table S5). The Tajima` D Test is based on neutrality, if positive, common allele excess and if negative, rare alleles excess. It was indicated as follows, (1) Both S_HA (195) and π_HA (0.0109) were respectively more than S_NA (176) and π_NA (0.0095), but both p_sNA (0.1257) and Θ_NA (0.0253) were respectively more than p_sHA (0.1116)and Θ_HA (0.0225), which suggested that the dN was larger in HA than in NA and the dS was larger in NA than in HA; (2) Both HA and NA showed a higher rate of synonymous evolution than nonsynonymous evolution, while the D_HA (− 1.7668) was larger than D_NA (− 2.1355) due to the HA mainly being an antigenic domain and functional region.

Genetic factors related to outbreak

For all HA gene mutations and outbreak influencing factors, the TwoStep cluster was used to explore the relationship among all the variables mentioned above, especially in the association of influenza outbreak events with genetic mutations. The findings in the TwoStep cluster were as follows, (1) Three clusters were mainly based on sampling dates, Ts and Tv, where Cluster 1 was completely separated from Cluster 2 and 3 and the Cluster 1 ended in the February 2020; (2) Cluster 1 isolated from Cluster 2 and 3 was contributed by the AA site 148 (B → P), 165(P → B), 199(P → A), 212(P → A) and 563(B → P), while Cluster 2 did from Cluster 1 and 3 was contributed by site 256 (P ↔ H); (3) Cluster 3 did from Cluster 1 and 2 was contributed by AA site 137 (B → P) and 142 (B → P), while the Cluster 3 started in the second half of 2021. According to the above analysis, the role of genetic evolution and mutation (the amino acid mutations preferring charge/pH) in the development of epidemic and outbreak could be preliminarily identified (Supplementary Tables S4 and S6).

For all continuous and categorical variables in HA genes and disease outbreaks, the K-Mean Cluster was used to further analyze the relationship among all variables (Table 4). It was shown in the One-Way ANOVA of K-Mean Cluster (label cases by strain name) as follows (only show the significant results), (1) There were two clusters, including Cluster 1 in 2019–2020 and Cluster 2 in 2021–2022 (P < 0.001); (2) Three variables (Tv, Ts and Tv/Ts) were statistically different (P < 0.001); (3) AA sites included P137, P148, P199, P212, P214 and P563 were very significantly different (P < 0.001), as P256 as well (P = 0.002), but P188 (P = 0.302) and Outbreak (P = 0.089) were not significantly different between two clusters. Overall, the factors that determine the 2021–2022 influenza outbreak included epidemic time, Tv, Ts, Tv/Ts, P137(B → P), P148(B → P), P199(P → A), P212(P → A), P214(H → P) and P563(B → P) (Table 4, Supplementary Table S6).

Table 4 Genetic variables related to the outbreaks analyzed by One-Way ANOVA.

Full size table

Discussion

The B/Victoria lineage stemmed from the 1988–1989 season, of which two distinct antigenic variants of influenza B virus were co-circulated, the B/Victoria and B/Yamagata lineages (Bv/By) with the reference strains B/Victoria/2/87 and B/Yamagata/16/88, respectively. The evolutionary dynamics of influenza B virus are complex and have been characterized by nucleotide insertions and deletions (indels) in the hemagglutinin (HA) gene and extensive reassortment events within and between the Bv and By lineages²³. In this study, only strain GD/1557/2019 inserting ₅₂₉AAAAACGAC₅₃₇ in HA gene was similar to the vaccine strain B/Bri/60/08, which was different from others. On account of the vaccine strain Aus/1359417/21, the Bv strains circulating in 2022 had the highest homology with their HA gene but were different from those in other years (99.23 ± 0.22, F_2022/Others = 74.78, P < 0.001). Some of influenza strains in the present study were isolated at the beginning of NPI (2020), being in fact a continuation of the 2019 epidemic and outbreaks. Moreover, from NPI (2020) to the end of April of 2022, only influenza Bv outbreaks (no H1N1 and no H3N2) occurred in southern China.

This study included the analyses of nucleotides (molecular cluster, transition/transversion, evolutionary rate), amino acids (AA substitution, entropy, evolutionary selection, epitope), genes (HA/NA) and prevalence (epidemic/outbreak, different dates) and the relationship among them of Bv outbreaks. Although SNV occurs at random, the results are significant for the direction of biological evolution. From the results in the present study, the mutations were highly biased toward the specific amino acid, for example, the probability of GC transversion was one in 200,000 (only once) since 2008 (reference strain), but the probability of AG transition was 1–2 per thousand, which was faster than that of GC transversion. The evolutionary rates in this study were successively G → A, A → G, C → T, T → C, A → T, T → A, A → C, G → T, T → G and C → G, with the highest rate 10,000 times faster than the lowest rate. Compared with a study on SARS-CoV-2 pandemic spread during the first months, the frequency of both G → U and C → U substitutions increased, which suggested that the substitution spectrum of SARS-CoV-2 was determined by an interplay of factors, including intrinsic biases of the replication process, avoidance of CpG dinucleotides and other constraints exerted by the new host²⁴.

In this study, the epitope domain mutations including epitope A (120 loop, 137/142/144/199), B (150 loop, 165) and D (190 helix, 212/214) had high evolutionary rates, partially similar to a previous research²³. The epidemic and outbreaks in southern China resulted from the mutations on HA genes, which were 1.59 times (2.15/1.36) faster than those on NA genes. As to the deeper reasons, the outbreaks here were associated with mutations of HA gene epitopes A, B and D. Compared with the epidemic in Germany during 2016–2020²⁵, a total of 13 substitutions were fixed over time (numbering in HA1 of Bri/60/08), including five in the 120-loop (R₁₁₆H, I₁₁₇V, N₁₂₁T, K₁₂₉N/D, K₁₃₆E) and two substitutions in the 120-loop surrounding domain (K₄₈E, N₇₅K), one in the 150-loop (V₁₄₆I), two in the 160-loop (E₁₆₄D, N₁₆₅K) and one in the 190-helix (S₁₉₇N).

Amino acids have been extensively studied as components of epitopes, while epitopes in infectious diseases involve epidemic, treatment, vaccines and so on²⁶. The ionizing properties of amino acids are associated with the charged capacity, furthermore, with pathogenic adhesion and entry and molecular interaction between antigen and antibody, etc.; where interaction between antigen and antibody is involved in the multiply charged ion signals in amino acids²⁷. Focusing on the epitope domain in this study, three polar amino acids (P₁₃₇B/P₁₉₉A/P₂₁₂A) mutations occurred from 2019–2020 to 2021–2022 (P < 0.01), which affected the antigenicity of the epitope regions.

Based on dS/dN substitution in the codon, there are certain errors in the evaluation of selective evolutionary sites. The site 214 and 563 in the HA genes and the site 73 in the NA genes in this study were the positive ones, which were evaluated by both approaches (FUBAR/MEME; P < 0.10)²⁸. This suggested that the site 214 in HA genes was an AA in the epitope D (H₂₁₄P) triggered off Bv outbreaks, and a positive selection site under the enormous external pressure in evolution as well.

Entropy is usually used to evaluate the evolution as well²⁹, while here the evolution was evaluated by both ER and SV, of which both were significantly correlated. Estimation of both rates of nucleotide substitution of HA and NA in Bv lineage were 2.05 × 10⁻³ s/s/y and 2.01 × 10⁻³ s/s/y, respectively²³, while the R_HA in this study was less than R_NA (R_HA = 0.690/R_NA = 0.711), which suggested that the amino acid variations on HA were more active than the nucleotide variations, compared with those on NA. At the same time, both D_HA (− 1.7668) and D_NA (− 2.1355) in this study showed HA genes (especially in the five epitopes in HA1 region) were prone to variation, in other words, NA genes were more likely to evolve synonymous evolution rather than nonsynonymous one.

The key role of charged amino acids has been widely studied in infectious diseases³⁰. Here was a good model of the evolution of infectious disease pathogens (NPI/Bv outbreak only/Less distant transmission). In this study, the first stage entering the second stage of the Bv outbreak involved three polar AAs (N₁₆₅K, P → H; G₁₉₉E/K, P → A/B; N₂₁₂E, P → A), substituted from the polar AAs into the basic, acidic/basic and acidic AAs, respectively; the second half of 2021 in the second stage (Cluster 3, Table S6) involved two polar AAs (H₁₃₇Q/K/N, B → P; A₁₄₂T, H → P), substituted from the basic and acidic AAs into the polar AAs, respectively. This suggested that the charge/pH preference for amino acid mutations is closely related (consistent) for the development trend of the outbreak. There are some similar reports, but with different research perspectives³¹. SNV adaptation is thus likely to have been associated with the influenza virus diversification across the outer environment and to have promoted their survival in extreme³². A genetic approach combined with potential epidemiological linkage enabled us to match data with previous reports on outbreaks or transmission chains, which may benefit public health actions³³.

Conclusions

With the advent of COVID-19, the influenza epidemic affected by NPI had a closed, time-limited pattern, and only Bv outbreaks. The HA genes of Bv strains isolated in 2022 evolved further than the vaccine strain isolated in 2021 (Bv/Aus/1359417/21). The codon G → A transition in nucleotide was in the highest ratio but the transversion of C → A and T → A made the most significant contribution to the outbreaks. The epitope domain mutations occurred in the epitope A (AA _{137/142/144/199}), B (AA₁₆₅) and D (AA _212/214). Amino acid mutations with polar, acidic and basic features are key factors in the 2021 Bv epidemic, in which the above mutational features alter the molecular structure, charged properties and molecular affinity of its epitope region. The amino acid sites 174, 199, 214 and 563 in HA genes and the sites 73 and 384 in NA genes were evolutionarily selected as the positive sites, which was under evolutionary pressure. The prevalent factors related to 2021–2022 influenza outbreak included epidemic timing, Tv, Ts, Tv/Ts, P137 (B → P), P148 (B → P), P199 (P → A), P212(P → A), P214(H → P) and P563(B → P). The preference of amino acid mutations for charge/pH could influence the trend of the epidemic/outbreak. Further exploratory studies employing mathematical and bioinformatics approaches based on clinical, public health, vaccine research and genetic information may facilitate further understanding of the deep interaction mechanisms of infectious disease transmission.

Data availability

All generated sequence data were deposited in the NCBI GenBank database using the accession numbers PP989545-PP989616 (HA) and PQ037033-PQ037104 (NA).

Abbreviations

AA:: Amino acid
AMOVA:: Analysis of molecular variance
BF:: Bayes factor
BIC:: Schwarz`s Bayesian criterion
Bv:: Influenza B Victoria-lineage
By:: Influenza B Yamagata-lineage
CFR:: Clinical fatality rate
dN:: Nonsynonymous
dS:: Synonymous
ER:: Evolutionary rate
GD:: Guangdong
HA:: Hemagglutinin
H/P/A/B:: Hydrophobic amino acid/polar amino acid/acidic amino acid/basic amino acids
HS:: Hospital sentinel
ILI:: Influenza-like illness
Indel:: Insertion and deletion
NISN:: National influenza surveillance network
NA:: Neuraminidase
NPI:: Non-pharmaceutical Intervention
Nt:: Nucleotide
Ob:: Outbreak
ORF:: Open reading frame
SNV:: Single nucleotide variant
SV:: Shannon entropy value
Ts:: Transition
Tv:: Transversion
VS:: Vaccine strain

References

Han, A. X., de Jong, S. P. J. & Russell, C. A. Co-evolution of immunity and seasonal influenza viruses. Nat. Rev. Microbiol. 21(12), 805–817. https://doi.org/10.1038/s41579-023-00945-8 (2023).
Article PubMed CAS Google Scholar
Skelton, R. M. & Huber, V. C. Comparing influenza virus biology for understanding influenza D virus. Viruses. 14(5), 1036. https://doi.org/10.3390/v14051036 (2022).
Article PubMed PubMed Central CAS Google Scholar
Pormohammad, A. et al. Comparison of influenza type A and B with COVID-19: A global systematic review and meta-analysis on clinical, laboratory and radiographic findings. Rev. Med. Virol. 31(3), e2179. https://doi.org/10.1002/rmv.2179 (2021).
Article PubMed CAS Google Scholar
Huang, Z. Z. et al. Charged amino acid variability related to N-glycosylation and epitopes in A/H3N2 influenza: Hemagglutinin and neuraminidase. PLoS One. 12(7), e0178231. https://doi.org/10.1371/journal.pone.0178231 (2017).
Article PubMed PubMed Central CAS Google Scholar
Kamlangdee, A. et al. Broad protection against avian influenza virus by using a modified vaccinia Ankara virus expressing a mosaic hemagglutinin gene. J. Virol. 88, 13300–13309. https://doi.org/10.1128/JVI.01532-14 (2014).
Article PubMed PubMed Central CAS Google Scholar
Tu, X. et al. Spontaneous mutation rates and spectra of respiratory-deficient yeast. Biomolecules. 13(3), 501. https://doi.org/10.3390/biom13030501 (2023).
Article PubMed PubMed Central CAS Google Scholar
Bergman, J. & Schierup, M. H. Population dynamics of GC-changing mutations in humans and great apes. Genetics. 218(3), iyab083. https://doi.org/10.1093/genetics/iyab083 (2021).
Article PubMed PubMed Central Google Scholar
Bloom, J. D. & Neher, R. A. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evol. 9(2), vead055. https://doi.org/10.1093/ve/vead055 (2023).
Article PubMed PubMed Central Google Scholar
Huang, P. et al. Highly conserved antigenic epitope regions of hemagglutinin and neuraminidase genes between 2009 H1N1 and seasonal H1N1 influenza: Vaccine considerations. J. Transl. Med. 11(1), 47. https://doi.org/10.1186/1479-5876-11-47 (2013).
Article PubMed PubMed Central CAS Google Scholar
Hou, Y. J. et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science. 370(6523), 1464–1468. https://doi.org/10.1126/science.abe8499 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Pon, R. et al. Masking terminal neo-epitopes of linear peptides through glycosylation favours immune responses towards core epitopes producing parental protein bound antibodies. Sci. Rep. 10(1), 18497. https://doi.org/10.1038/s41598-020-75754-7 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Tan, J. et al. Changes in influenza activities impacted by NPI based on 4-Year surveillance in China: Epidemic patterns and trends. J. Epidemiol. Glob. Health. 13(3), 539–546. https://doi.org/10.1007/s44197-023-00134-z (2023).
Article PubMed PubMed Central Google Scholar
Zhang, X. et al. Assessing the impact of COVID-19 interventions on influenza-like illness in Beijing and Hong Kong: An observational and modeling study. Infect. Dis. Poverty. 12(1), 11. https://doi.org/10.1186/s40249-023-01061-8 (2023).
Article PubMed PubMed Central CAS Google Scholar
The National Influenza Surveillance Program (2017 edition). https://ivdc.chinacdc.cn/cnic/zyzx/jcfa/201709/t20170927153830.htm (accessed 21 Jun 2024).
Huang, P. et al. Phylogenetic, molecular and drug-sensitivity analysis of HA and NA genes of human H3N2 influenza A viruses in Guangdong, China, 2007–2011. Epidemiol. Infect. 141(5), 1061–1069. https://doi.org/10.1017/S0950268812001318 (2013).
Article PubMed CAS Google Scholar
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38(7), 3022–3027. https://doi.org/10.1093/molbev/msab120 (2021).
Article PubMed PubMed Central CAS Google Scholar
Sun, Z. Q., Xu, Y. Y. Medical statistics. The people health publishing house. 2020. 5th ed. 55–77. [ISBN: 978-7117-30385-9].
Karlin, E. F. A comparison of entropic diversity and variance in the study of population structure. Entropy (Basel). 25(3), 492. https://doi.org/10.3390/e25030492 (2023).
Article ADS PubMed PubMed Central Google Scholar
Mullick, B. et al. Understanding mutation hotspots for the SARS-CoV-2 spike protein using Shannon Entropy and K-means clustering. Comput. Biol. Med. 138, 104915. https://doi.org/10.1016/j.compbiomed.2021.104915 (2021).
Article PubMed PubMed Central CAS Google Scholar
Rozhoňová, H. et al. SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing. Bioinformatics. 38(18), 4293–4300. https://doi.org/10.1093/bioinformatics/btac510 (2022).
Article PubMed PubMed Central CAS Google Scholar
Yi, K. et al. Mutational spectrum of SARS-CoV-2 during the global pandemic. Exp. Mol. Med. 53(8), 1229–1237. https://doi.org/10.1038/s12276-021-00658-z (2021).
Article PubMed PubMed Central CAS Google Scholar
Lima, R. E. et al. Mathematical modeling and multivariate analysis applied earliest soybean harvest associated drying and storage conditions and influences on physicochemical grain quality. Sci. Rep. 11(1), 23287. https://doi.org/10.1038/s41598-021-02724-y (2021).
Article ADS PubMed PubMed Central CAS Google Scholar
Rosu, M. E. et al. Substitutions near the HA receptor binding site explain the origin and major antigenic change of the B/Victoria and B/Yamagata lineages. Proc. Natl. Acad. Sci. U. S. A. 119(42), e2211616119. https://doi.org/10.1073/pnas.2211616119 (2022).
Article PubMed PubMed Central CAS Google Scholar
Forni, D. et al. The substitution spectra of coronavirus genomes. Brief Bioinform. 23(1), bbab382. https://doi.org/10.1093/bib/bbab382 (2022).
Article PubMed CAS Google Scholar
Heider, A. et al. Molecular characterization and evolution dynamics of influenza B viruses circulating in Germany from season 1996/1997 to 2019/2020. Virus Res. 322, 198926. https://doi.org/10.1016/j.virusres.2022.198926 (2022).
Article PubMed CAS Google Scholar
Desta, I. T. et al. The ClusPro AbEMap web server for the prediction of antibody epitopes. Nat. Protoc. 18(6), 1814–1840. https://doi.org/10.1038/s41596-023-00826-7 (2023).
Article PubMed PubMed Central CAS Google Scholar
Yefremova, Y. et al. Intact transition epitope mapping (ITEM). J. Am. Soc. Mass. Spectrom. 28(8), 1612–1622. https://doi.org/10.1007/s13361-017-1654-7 (2017).
Article ADS PubMed CAS Google Scholar
Zhang, M. et al. Complete genome analysis of echovirus 30 strains isolated from hand-foot-and-mouth disease in Yunnan province, China. Virol. J. 20(1), 215. https://doi.org/10.1186/s12985-023-02179-9 (2023).
Article PubMed PubMed Central CAS Google Scholar
Chen, Q. Y. et al. Analysis of entire hepatitis B virus genomes reveals reversion of mutations to wild type in natural infection, a 15 year follow-up study. Infect. Genet. Evol. 97, 105184. https://doi.org/10.1016/j.meegid.2021.105184 (2022).
Article PubMed CAS Google Scholar
Chitray, M. et al. Symmetrical arrangement of positively charged residues around the 5-fold axes of SAT type foot-and-mouth disease virus enhances cell culture of field viruses. PLoS Pathog. 16(9), e1008828. https://doi.org/10.1371/journal.ppat.1008828 (2020).
Article PubMed PubMed Central CAS Google Scholar
Ding, D. et al. Protein design using structure-based residue preferences. Nat. Commun. 15(1), 1639. https://doi.org/10.1038/s41467-024-45621-4 (2024).
Article ADS PubMed PubMed Central CAS Google Scholar
Noll, D. et al. Positive selection over the mitochondrial genome and its role in the diversification of gentoo penguins in response to adaptation in isolation. Sci. Rep. 12(1), 3767. https://doi.org/10.1038/s41598-022-07562-0 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Pinto, M. et al. Neisseria gonorrhoeae clustering to reveal major European whole-genome-sequencing-based genogroups in association with antimicrobial resistance. Microb. Genom. 7(2), 000481. https://doi.org/10.1099/mgen.0.000481 (2021).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the colleagues from the Guangdong Influenza Surveillance Network (GDISN). This work was financially supported by the Guangzhou Scientific and Technological Project (201904010286), the Guangdong Natural Science Foundation (2016A030313775) and the National Natural Science Foundation of China (30972757).

Author information

Authors and Affiliations

Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, 510120, China
Zhong-Zhou Huang & Qing Guo
School of Public Health, Sun Yat-Sen University, Guangzhou, 510080, China
Zhong-Zhou Huang & Ping Huang
Workstation for Emerging Infectious Disease Control and Prevention, Guangdong Center for Disease Control and Prevention, Guangzhou, 511430, China
Zhong-Zhou Huang, Jing Tan, Ping Huang, Bai-Sheng Li & Li-Jun Liang
Guangdong Key Laboratory of Pathogen Detection for Emerging Infectious Disease Response, Guangdong Center for Disease Control and Prevention, Guangzhou, 511430, China
Ping Huang, Bai-Sheng Li & Li-Jun Liang
School of Public Health, Southern Medical University, Guangzhou, 510515, China
Jing Tan, Ping Huang & Bai-Sheng Li
School of Public Health, Southwest Medical University, Luzhou, 646000, China
Jing Tan

Authors

Zhong-Zhou Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Tan
View author publications
You can also search for this author in PubMed Google Scholar
Ping Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bai-Sheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Qing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Li-Jun Liang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.H. and Z.-Z.H. conceived the study. J.T. and Z.-Z.H. sequenced. L.-J.L., Z.-Z.H. and P.H. collected data. J.T., L.-J.L., Z.-Z.H. and P.H. analyzed data, interpreted the results and drafted the manuscript. P.H., Z.-Z.H., B.-S.L. and Q.G. revised and edited the intellectual content of the article.

Corresponding author

Correspondence to Ping Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, ZZ., Tan, J., Huang, P. et al. The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period. Sci Rep 14, 20418 (2024). https://doi.org/10.1038/s41598-024-71349-8

Download citation

Received: 19 December 2023
Accepted: 27 August 2024
Published: 03 September 2024
DOI: https://doi.org/10.1038/s41598-024-71349-8
Springer Nature Limited

The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period

Abstract

Similar content being viewed by others

Global circulation patterns of seasonal influenza viruses vary with antigenic drift

Genome-wide study of globally distributed respiratory syncytial virus (RSV) strains implicates diversification utilizing phylodynamics and mutational analysis

Antigenic drift and epidemiological severity of seasonal influenza in Canada

Introduction

Methods

Surveillance and gene sequence

Genetic data processing

Nucleotide

Amino acid

Evolutionary and cluster analysis

Statistical evaluation

Results

Nucleotide homology and evolution

Transition and transversion in SNV

Evolutionary selection

Evolution comparison

Genetic factors related to outbreak

Discussion

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation