Abstract
The epidemic and outbreaks of influenza B Victoria lineage (Bv) during 2019–2022 led to an analysis of genetic, epitopes, charged amino acids and Bv outbreaks. Based on the National Influenza Surveillance Network (NISN), the Bv 72 strains isolated during 2019–2022 were selected by spatio-temporal sampling, then were sequenced. Using the Compare Means, Correlate and Cluster, the outbreak data were analyzed, including the single nucleotide variant (SNV), amino acid (AA), epitope, evolutionary rate (ER), Shannon entropy value (SV), charged amino acid and outbreak. With the emergence of COVID-19, the non-pharmaceutical interventions (NPIs) made Less distant transmission and only Bv outbreak. The 2021–2022 strains in the HA genes were located in the same subset, but were distinct from the 2019–2020 strains (P < 0.001). The codon G → A transition in nucleotide was in the highest ratio but the transversion of C → A and T → A made the most significant contribution to the outbreaks, while the increase in amino acid mutations characterized by polar, acidic and basic signatures played a key role in the Bv epidemic in 2021–2022. Both ER and SV were positively correlated in HA genes (R = 0.690) and NA genes (R = 0.711), respectively, however, the number of mutations in the HA genes was 1.59 times higher than that of the NA gene (2.15/1.36) from the beginning of 2020 to 2022. The positively selective sites 174, 199, 214 and 563 in HA genes and the sites 73 and 384 in NA genes were evolutionarily selected in the 2021–2022 influenza outbreaks. Overall, the prevalent factors related to 2021–2022 influenza outbreaks included epidemic timing, Tv, Ts, Tv/Ts, P137 (B → P), P148 (B → P), P199 (P → A), P212 (P → A), P214 (H → P) and P563 (B → P). The preference of amino acid mutations for charge/pH could influence the epidemic/outbreak trends of infectious diseases. Here was a good model of the evolution of infectious disease pathogens. This study, on account of further exploration of virology, genetics, bioinformatics and outbreak information, might facilitate further understanding of their deep interaction mechanisms in the spread of infectious diseases.
Similar content being viewed by others
Introduction
Influenza viruses are the infectious agents of acute upper respiratory tract infections that have had several global pandemics. According to the antigenic characteristics, human influenza viruses are divided into four types: A, B, C and D, of which influenza A virus gene and antigen are prone to mutation, often leading to an epidemic or pandemic; influenza B virus gene is relatively small variation, usually causing local outbreaks1,2. In recent years, influenza type B has attracted global attention, as according to a report summarized with 47 studies, hospitalization in type B for 6.7d was longer than that in type A for 6.5d, and the clinical fatality rate (CFR) of people aged 50 years could reach to 2.5% (95%CI 0.7% ~ 7.6%, P < 0.001)3.
Influenza viruses are characterized by both surface glycoproteins, including hemagglutinin (HA) and neuraminidase (NA). The HA and NA mutations based on amino acid sequences were the basis of antigenic shift and drift4. Due to their lack of polymerase proofreading activity, influenza virus genes are frequently mutated without genetic correction, resulting in 1–2% annual divergence of influenza strains5. Genetic evolution might be demonstrated by genome structure mutations and nucleotide substitutions, while the mutations of genome structure include insertions/deletions (indels) and inversions and the nucleotide (Nt) substitutions involving single nucleotide variant (SNV) includes both transition (Ts) and transversion (Tv)6. Either purine (pyrimidine) ↔ purine (pyrimidine) transitions or pyrimidine ↔ purine transversions in SNV have been explored in their impact on nucleotide composition evolution in human and animal7. A longstanding approach for analyzing amino acid (AA) and protein is to compare rates of nonsynonymous (dN) and synonymous (dS) substitutions at each site, where these dN/dS ratios might be calculated by counting mutations or using phylogenetic substitution models8.
The epidemic of influenza arises from genetic and antigenic mutations, where the epitopes containing some key amino acids play a crucial role in antigenicity9. The D614G substitution in SARS-CoV-2 enhances the pathogenic infectivity, of which both (Aspartic/Glutamic) belong to the acidic amino acids10. Furthermore, the conserved protective epitopes of hemagglutinin (HA) are essential to the design of a universal influenza vaccine and new targeted therapeutic agents, especially on small proteins and peptides11. Thus, the amino acids that constitute the antigen domain, on account of their molecular structure, hydrophilicity and charged properties, might have different weights on their antigenicity.
Globally, there were three seasonal influenza strains (H3N2/H1N1/B) cocirculating across continents, but the non-pharmaceutical intervention (NPI) since the spring of 2020 resulted in only influenza B Victoria lineage (Bv) dominated from March 2020 till March 2021 in China12. NPIs have been implemented worldwide, including travel restrictions, face masks, social distancing, public education on prevention measures, and school closures13. Due to the influence of NPI, influenza outbreaks and epidemic in southern China have become an investigation model for infectious disease transmission in an ideal closed-loop environment. Here, based on Bv epidemic information and some genetic sequence, we analyzed the data using the multiple statistical approaches based on spatio-temporal connectivity, to evaluate Ts/Tv substitutions, charged amino acid effects and variable-related outbreaks and to further explore the internal connections of genetic evolution with outbreaks.
Methods
Surveillance and gene sequence
This study was based on the National Influenza Surveillance Network (NISN, https://10.249.6.18:8881/cdc/)12. Influenza surveillance is performed on basis of the National Influenza Surveillance Program (2017 edition)14. The definition of influenza-like illness (ILI) case is a case had body temperature ≥ 38 °C, accompanied with either cough or sore throat, but a lack of molecular detection; Influenza case is an ILI tested positive for nucleic acid of influenza virus. An influenza outbreak is defined as the occurrence of 10 or more cases of ILI in the same school, childcare institution, or other collective unit within a week.
A total of 72 Guangdong (GD) strains were selected by spatio-temporal sampling from March 2019 to April 2022, including 16, 9, 34 and 13 strains per year (no strains isolated during Apr-Dec 2020). Global strains were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) or GISAID (https://platform.epicov.org/epi3/frontend), including four vaccine strains (VSs) recommended by the World Health Organization (WHO) and five regional strains with characteristics. As for regional distribution, a total of 72 strains (42 from outbreaks and 30 from hospital sentinels) included in Chaozhou (5), Dongguan (2), Foshan (1), Guangzhou (5), Heyuan (4), Huizhou (5), Jiangmen (3), Maomin (6), Meizhou (3), Qingyuan (8), Shantou (2), Shanwei (2), Shaoguan (10), Shenzhen (6), Yunfu (2), Zhanjiang (4), Zhongshan (2) and Zhuhai (2) (Supplementary Table S1a,b). A set of primers of Bv strains was designed and synthesized, on account of the Bv strains isolated during 2016–2018 (Supplementary Table S2). The 72 strains in this study were extracted, amplified, and sequenced15, and then the genetic fragments were merged [GenBank accession PP989545-PP989616 (HA) and PQ037033-PQ037104 (NA)].
Genetic data processing
Nucleotide
The nucleotide (Nt) sequences based on open reading frame (ORF) are aligned by Clustal W while both phylogenetic trees of HA and NA genes (Figure S1) are established with Neighbor-Joining (NJ) in MEGA 11.0.1316. Nucleotide variation rates are calculated across stages and years using a single VS sequence benchmark, of which the differences were statistically compared using One-Way ANOVA. Grounded in the neutral theory of molecular evolution, the binary coalescent tree is the dual backward representation of the continuous-forward-time diffusion model of genetic drift. In species phylogeny and epidemiology, the tree structure is often used to compare different models of evolution or to fit model parameters17. Analysis of Molecular Variance (AMOVA) is a widely used method which employs variance to study the hierarchical genetic structure of populations, where the nucleotide diversity (D) is calculated as the mean pairwise genetic distance obtained17, while the statistical algorithm for genetic variability is as follows: D = [π − S/(0.5 − log(fmin))]/S. Here the pairwise dN/dS estimates were calculated for the coding regions19.
Based on the SNV, the sequenced data are clustered into 12 substitutions (4 × 3), pooling both transition G(C) ↔ A(T) or transversion G(C) ↔ C/T (A/G). Here the substitution ratio is calculated: Ri = Mi/Ni, where R is ratio, subscript i is the specific nucleotide and both M and N are the mutation Nt numbers per specific purine or pyrimidine20.
The strains were sampled from both the outbreak (Ob) and the hospital sentinel (HS), while the strain sampling dates were classified into the Stage 1 (S1, 2019 → February 2020) and the Stage 2 (S2, March 2020 → Apr 2022), which depended on the subsequent statistical calculation (Cluster).
Amino acid
The amino acid (AA) sequences are aligned by Clustal W as well. The AA mutations are analyzed, then classified into four groups, the hydrophobic (H), polar (P), acidic (A) and basic (B) amino acids on account of their charged features (Supplementary Table S3). The Shannon entropy values per AA site are then calculated18. The equation in Shannon Entropy formulation is as follows, where L is a list of all possible amino acids in all the sequences and Pk(i) is the probability of finding the kth amino acid at that position19.
Evolutionary and cluster analysis
The rate of molecular evolution is measured by the number of Nt sequence mutations per unit time. Cumulative mutations across the whole gene region can effectively be used to estimate positive selection (or selective pressure) by calculating substitution ratio of dN/dS (ω)21. Sequence data including Nt and AA are analyzed using the χ2 Test in categorical data and using the One-Way ANOVA in compare means. With the TwoStep and Hierarchical Clusters, the relationship among the variables related to influenza genetic and outbreak are analyzed22. Based on the Schwarz’s Bayesian Criterion (BIC), the Cluster model is as follows: − 2Lm + m × lnn, here with the maximized log-likelihood (Lm) and the sample size (n), which fitted for the minimum value.
Statistical evaluation
The data are processed using WPS tabulation and SPSS 23.0 (SPSS Inc., Chicago, IL.), where the mean value in statistics description is performed using Mean ± SD (normal distribution) or Median (P25, P75) (skew distribution)17.
The compare means and the nonparametric test are statistically significant relying on the P value < 0.05 in two sides, while the evolutionary selection is on the P value < 0.10. The correlation is significant based on the R value > 0.50. The statistical logic and processes of the other approaches are interpreted as needed.
Results
Nucleotide homology and evolution
Most of the HA and NA sequences in the ORF have 1749 bp and 1401 bp, respectively, but by the genetic alignment, the HA ORF in Bri/60/08 (2008 VS) being 1758 bp was deleted of Nt529-537 (aaaaacgac) in all HA genes in the 2019–2022 strains in this study except for the HA gene in GD/1557/19 (1758 bp).
The homologies of HA and NA genes in GD strains were compared with those of four VSs (Table 1). It showed as follows, (1) The highest identities in HA gene occurred in 2020 (99.80 ± 0.07, 2019 VS) and in 2022 (99.23 ± 0.22, 2021 VS) (P < 0.001), respectively; (2) On account of HA genes of the 2021 VS, the HA genes of the GD strains during 2021–2022 were classified in the subset 2, which was different from those of other three VSs; (3) The highest identity in NA genes with the 2021 VS was in 2021 (98.57 ± 0.15) than in 2022 (98.43 ± 0.13) (P = 0.012), which indicated that the 2022 strains in both HA and NA genes further evolved than the 2021 VS.
Both trees of HA and NA genes of 72 influenza Bv strains isolated in Guangdong (Supplementary Fig. S1) have the following characteristics, (1) The HA genes from 2019 to 2020 were closer to the 2019 VS (Was/2/19), and those in 2021–2022 were identical to that in the 2021 VS (Aus/1359417/21); (2) The NA gene in 2019 VS (Was/2/19) was closely related to those of the 2019–2020 GD strains, including the 2019–2022 ones, but these of other two VSs (Col/6/17 and Aus/1359417/21) were genetically different from those of the 2019–2022 GD strains, which suggested that the WHO recommended vaccine strains relied mainly on homologies of HA genes rather than those of NA genes.
Transition and transversion in SNV
The number and ratio of purine and pyrimidine mutations in the nucleotide of each HA and NA gene were calculated according to the reference Bri/60/08 (Table 2). It was showed that (1) Ts mutations were mostly larger than Tv ones, with three increased (A → G↑, C → T↑ and G → A↑) during two stages and one decreased (T → C↓) in HA genes, and one increased (G → A↑) in NA genes; (2) Tv mutations usually were more significant than Ts ones, including two increased (C → A↑ and T → A↑) in HA gene during two stages, while the NA genes were increased and decreased once each (A → T↑ and T → A↓); (3) Tv mutations in HA genes might contributed to the influenza outbreaks after NPI (C → A↑↑ and T → A↑↑); (4) The nucleotide variation ratios of both HA and NA genes were in the following order, G → A, A → G, C → T, T → C, A → T, T → A, A → C, G → T, T → G and C → G (no G → C substitution).
Evolutionary selection
On account of the dS/dN substitutions in the codons, evolutionary selections in both HA and NA genes in the present study were analyzed using both approaches FUBAR and MEME, shown in Table 3. Here the FUBAR had positive sites in HA genes including site 199, 214 and 563, where MEME had those including site 174, 214 and 563 (P < 0.10). Both methods were used for statistical assessment using Bayes Factor and MEME LogL. The positive sites in NA genes included the site 73 and 384 with FUBAR and only site 73 with MEME (P < 0.10) (Table 3, Supplementary Table S4). The positive selections suggested that these amino acid sites were under enormous external pressure. A lot of negative sites existed in FUBAR (Fig. 1).
Evolution comparison
Comparing the two evolutionary indicators [the evolutionary rate (ER) and the Shannon entropy value (SV)], both ERs and SVs were significantly correlated in HA genes (RPearson = 0.690), meanwhile both also in NA genes (RPearson = 0.711); which indicated that both the ERs and SVs were consistent in the same gene (Fig. 2). The differences between two indicators might be that the ER focuses on all nucleotide mutations (dS and dN), and the SV mainly does on the amino acid site of the dN mutation.
Based on the Tajima’ Neutrality Test, the evolutionary selections were present (Supplementary Table S5). The Tajima` D Test is based on neutrality, if positive, common allele excess and if negative, rare alleles excess. It was indicated as follows, (1) Both SHA (195) and πHA (0.0109) were respectively more than SNA (176) and πNA (0.0095), but both psNA (0.1257) and ΘNA (0.0253) were respectively more than psHA (0.1116)and ΘHA (0.0225), which suggested that the dN was larger in HA than in NA and the dS was larger in NA than in HA; (2) Both HA and NA showed a higher rate of synonymous evolution than nonsynonymous evolution, while the DHA (− 1.7668) was larger than DNA (− 2.1355) due to the HA mainly being an antigenic domain and functional region.
Genetic factors related to outbreak
For all HA gene mutations and outbreak influencing factors, the TwoStep cluster was used to explore the relationship among all the variables mentioned above, especially in the association of influenza outbreak events with genetic mutations. The findings in the TwoStep cluster were as follows, (1) Three clusters were mainly based on sampling dates, Ts and Tv, where Cluster 1 was completely separated from Cluster 2 and 3 and the Cluster 1 ended in the February 2020; (2) Cluster 1 isolated from Cluster 2 and 3 was contributed by the AA site 148 (B → P), 165(P → B), 199(P → A), 212(P → A) and 563(B → P), while Cluster 2 did from Cluster 1 and 3 was contributed by site 256 (P ↔ H); (3) Cluster 3 did from Cluster 1 and 2 was contributed by AA site 137 (B → P) and 142 (B → P), while the Cluster 3 started in the second half of 2021. According to the above analysis, the role of genetic evolution and mutation (the amino acid mutations preferring charge/pH) in the development of epidemic and outbreak could be preliminarily identified (Supplementary Tables S4 and S6).
For all continuous and categorical variables in HA genes and disease outbreaks, the K-Mean Cluster was used to further analyze the relationship among all variables (Table 4). It was shown in the One-Way ANOVA of K-Mean Cluster (label cases by strain name) as follows (only show the significant results), (1) There were two clusters, including Cluster 1 in 2019–2020 and Cluster 2 in 2021–2022 (P < 0.001); (2) Three variables (Tv, Ts and Tv/Ts) were statistically different (P < 0.001); (3) AA sites included P137, P148, P199, P212, P214 and P563 were very significantly different (P < 0.001), as P256 as well (P = 0.002), but P188 (P = 0.302) and Outbreak (P = 0.089) were not significantly different between two clusters. Overall, the factors that determine the 2021–2022 influenza outbreak included epidemic time, Tv, Ts, Tv/Ts, P137(B → P), P148(B → P), P199(P → A), P212(P → A), P214(H → P) and P563(B → P) (Table 4, Supplementary Table S6).
Discussion
The B/Victoria lineage stemmed from the 1988–1989 season, of which two distinct antigenic variants of influenza B virus were co-circulated, the B/Victoria and B/Yamagata lineages (Bv/By) with the reference strains B/Victoria/2/87 and B/Yamagata/16/88, respectively. The evolutionary dynamics of influenza B virus are complex and have been characterized by nucleotide insertions and deletions (indels) in the hemagglutinin (HA) gene and extensive reassortment events within and between the Bv and By lineages23. In this study, only strain GD/1557/2019 inserting 529AAAAACGAC537 in HA gene was similar to the vaccine strain B/Bri/60/08, which was different from others. On account of the vaccine strain Aus/1359417/21, the Bv strains circulating in 2022 had the highest homology with their HA gene but were different from those in other years (99.23 ± 0.22, F2022/Others = 74.78, P < 0.001). Some of influenza strains in the present study were isolated at the beginning of NPI (2020), being in fact a continuation of the 2019 epidemic and outbreaks. Moreover, from NPI (2020) to the end of April of 2022, only influenza Bv outbreaks (no H1N1 and no H3N2) occurred in southern China.
This study included the analyses of nucleotides (molecular cluster, transition/transversion, evolutionary rate), amino acids (AA substitution, entropy, evolutionary selection, epitope), genes (HA/NA) and prevalence (epidemic/outbreak, different dates) and the relationship among them of Bv outbreaks. Although SNV occurs at random, the results are significant for the direction of biological evolution. From the results in the present study, the mutations were highly biased toward the specific amino acid, for example, the probability of GC transversion was one in 200,000 (only once) since 2008 (reference strain), but the probability of AG transition was 1–2 per thousand, which was faster than that of GC transversion. The evolutionary rates in this study were successively G → A, A → G, C → T, T → C, A → T, T → A, A → C, G → T, T → G and C → G, with the highest rate 10,000 times faster than the lowest rate. Compared with a study on SARS-CoV-2 pandemic spread during the first months, the frequency of both G → U and C → U substitutions increased, which suggested that the substitution spectrum of SARS-CoV-2 was determined by an interplay of factors, including intrinsic biases of the replication process, avoidance of CpG dinucleotides and other constraints exerted by the new host24.
In this study, the epitope domain mutations including epitope A (120 loop, 137/142/144/199), B (150 loop, 165) and D (190 helix, 212/214) had high evolutionary rates, partially similar to a previous research23. The epidemic and outbreaks in southern China resulted from the mutations on HA genes, which were 1.59 times (2.15/1.36) faster than those on NA genes. As to the deeper reasons, the outbreaks here were associated with mutations of HA gene epitopes A, B and D. Compared with the epidemic in Germany during 2016–202025, a total of 13 substitutions were fixed over time (numbering in HA1 of Bri/60/08), including five in the 120-loop (R116H, I117V, N121T, K129N/D, K136E) and two substitutions in the 120-loop surrounding domain (K48E, N75K), one in the 150-loop (V146I), two in the 160-loop (E164D, N165K) and one in the 190-helix (S197N).
Amino acids have been extensively studied as components of epitopes, while epitopes in infectious diseases involve epidemic, treatment, vaccines and so on26. The ionizing properties of amino acids are associated with the charged capacity, furthermore, with pathogenic adhesion and entry and molecular interaction between antigen and antibody, etc.; where interaction between antigen and antibody is involved in the multiply charged ion signals in amino acids27. Focusing on the epitope domain in this study, three polar amino acids (P137B/P199A/P212A) mutations occurred from 2019–2020 to 2021–2022 (P < 0.01), which affected the antigenicity of the epitope regions.
Based on dS/dN substitution in the codon, there are certain errors in the evaluation of selective evolutionary sites. The site 214 and 563 in the HA genes and the site 73 in the NA genes in this study were the positive ones, which were evaluated by both approaches (FUBAR/MEME; P < 0.10)28. This suggested that the site 214 in HA genes was an AA in the epitope D (H214P) triggered off Bv outbreaks, and a positive selection site under the enormous external pressure in evolution as well.
Entropy is usually used to evaluate the evolution as well29, while here the evolution was evaluated by both ER and SV, of which both were significantly correlated. Estimation of both rates of nucleotide substitution of HA and NA in Bv lineage were 2.05 × 10−3 s/s/y and 2.01 × 10−3 s/s/y, respectively23, while the RHA in this study was less than RNA (RHA = 0.690/RNA = 0.711), which suggested that the amino acid variations on HA were more active than the nucleotide variations, compared with those on NA. At the same time, both DHA (− 1.7668) and DNA (− 2.1355) in this study showed HA genes (especially in the five epitopes in HA1 region) were prone to variation, in other words, NA genes were more likely to evolve synonymous evolution rather than nonsynonymous one.
The key role of charged amino acids has been widely studied in infectious diseases30. Here was a good model of the evolution of infectious disease pathogens (NPI/Bv outbreak only/Less distant transmission). In this study, the first stage entering the second stage of the Bv outbreak involved three polar AAs (N165K, P → H; G199E/K, P → A/B; N212E, P → A), substituted from the polar AAs into the basic, acidic/basic and acidic AAs, respectively; the second half of 2021 in the second stage (Cluster 3, Table S6) involved two polar AAs (H137Q/K/N, B → P; A142T, H → P), substituted from the basic and acidic AAs into the polar AAs, respectively. This suggested that the charge/pH preference for amino acid mutations is closely related (consistent) for the development trend of the outbreak. There are some similar reports, but with different research perspectives31. SNV adaptation is thus likely to have been associated with the influenza virus diversification across the outer environment and to have promoted their survival in extreme32. A genetic approach combined with potential epidemiological linkage enabled us to match data with previous reports on outbreaks or transmission chains, which may benefit public health actions33.
Conclusions
With the advent of COVID-19, the influenza epidemic affected by NPI had a closed, time-limited pattern, and only Bv outbreaks. The HA genes of Bv strains isolated in 2022 evolved further than the vaccine strain isolated in 2021 (Bv/Aus/1359417/21). The codon G → A transition in nucleotide was in the highest ratio but the transversion of C → A and T → A made the most significant contribution to the outbreaks. The epitope domain mutations occurred in the epitope A (AA 137/142/144/199), B (AA165) and D (AA 212/214). Amino acid mutations with polar, acidic and basic features are key factors in the 2021 Bv epidemic, in which the above mutational features alter the molecular structure, charged properties and molecular affinity of its epitope region. The amino acid sites 174, 199, 214 and 563 in HA genes and the sites 73 and 384 in NA genes were evolutionarily selected as the positive sites, which was under evolutionary pressure. The prevalent factors related to 2021–2022 influenza outbreak included epidemic timing, Tv, Ts, Tv/Ts, P137 (B → P), P148 (B → P), P199 (P → A), P212(P → A), P214(H → P) and P563(B → P). The preference of amino acid mutations for charge/pH could influence the trend of the epidemic/outbreak. Further exploratory studies employing mathematical and bioinformatics approaches based on clinical, public health, vaccine research and genetic information may facilitate further understanding of the deep interaction mechanisms of infectious disease transmission.
Data availability
All generated sequence data were deposited in the NCBI GenBank database using the accession numbers PP989545-PP989616 (HA) and PQ037033-PQ037104 (NA).
Abbreviations
- AA:
-
Amino acid
- AMOVA:
-
Analysis of molecular variance
- BF:
-
Bayes factor
- BIC:
-
Schwarz`s Bayesian criterion
- Bv:
-
Influenza B Victoria-lineage
- By:
-
Influenza B Yamagata-lineage
- CFR:
-
Clinical fatality rate
- dN:
-
Nonsynonymous
- dS:
-
Synonymous
- ER:
-
Evolutionary rate
- GD:
-
Guangdong
- HA:
-
Hemagglutinin
- H/P/A/B:
-
Hydrophobic amino acid/polar amino acid/acidic amino acid/basic amino acids
- HS:
-
Hospital sentinel
- ILI:
-
Influenza-like illness
- Indel:
-
Insertion and deletion
- NISN:
-
National influenza surveillance network
- NA:
-
Neuraminidase
- NPI:
-
Non-pharmaceutical Intervention
- Nt:
-
Nucleotide
- Ob:
-
Outbreak
- ORF:
-
Open reading frame
- SNV:
-
Single nucleotide variant
- SV:
-
Shannon entropy value
- Ts:
-
Transition
- Tv:
-
Transversion
- VS:
-
Vaccine strain
References
Han, A. X., de Jong, S. P. J. & Russell, C. A. Co-evolution of immunity and seasonal influenza viruses. Nat. Rev. Microbiol. 21(12), 805–817. https://doi.org/10.1038/s41579-023-00945-8 (2023).
Skelton, R. M. & Huber, V. C. Comparing influenza virus biology for understanding influenza D virus. Viruses. 14(5), 1036. https://doi.org/10.3390/v14051036 (2022).
Pormohammad, A. et al. Comparison of influenza type A and B with COVID-19: A global systematic review and meta-analysis on clinical, laboratory and radiographic findings. Rev. Med. Virol. 31(3), e2179. https://doi.org/10.1002/rmv.2179 (2021).
Huang, Z. Z. et al. Charged amino acid variability related to N-glycosylation and epitopes in A/H3N2 influenza: Hemagglutinin and neuraminidase. PLoS One. 12(7), e0178231. https://doi.org/10.1371/journal.pone.0178231 (2017).
Kamlangdee, A. et al. Broad protection against avian influenza virus by using a modified vaccinia Ankara virus expressing a mosaic hemagglutinin gene. J. Virol. 88, 13300–13309. https://doi.org/10.1128/JVI.01532-14 (2014).
Tu, X. et al. Spontaneous mutation rates and spectra of respiratory-deficient yeast. Biomolecules. 13(3), 501. https://doi.org/10.3390/biom13030501 (2023).
Bergman, J. & Schierup, M. H. Population dynamics of GC-changing mutations in humans and great apes. Genetics. 218(3), iyab083. https://doi.org/10.1093/genetics/iyab083 (2021).
Bloom, J. D. & Neher, R. A. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evol. 9(2), vead055. https://doi.org/10.1093/ve/vead055 (2023).
Huang, P. et al. Highly conserved antigenic epitope regions of hemagglutinin and neuraminidase genes between 2009 H1N1 and seasonal H1N1 influenza: Vaccine considerations. J. Transl. Med. 11(1), 47. https://doi.org/10.1186/1479-5876-11-47 (2013).
Hou, Y. J. et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science. 370(6523), 1464–1468. https://doi.org/10.1126/science.abe8499 (2020).
Pon, R. et al. Masking terminal neo-epitopes of linear peptides through glycosylation favours immune responses towards core epitopes producing parental protein bound antibodies. Sci. Rep. 10(1), 18497. https://doi.org/10.1038/s41598-020-75754-7 (2020).
Tan, J. et al. Changes in influenza activities impacted by NPI based on 4-Year surveillance in China: Epidemic patterns and trends. J. Epidemiol. Glob. Health. 13(3), 539–546. https://doi.org/10.1007/s44197-023-00134-z (2023).
Zhang, X. et al. Assessing the impact of COVID-19 interventions on influenza-like illness in Beijing and Hong Kong: An observational and modeling study. Infect. Dis. Poverty. 12(1), 11. https://doi.org/10.1186/s40249-023-01061-8 (2023).
The National Influenza Surveillance Program (2017 edition). https://ivdc.chinacdc.cn/cnic/zyzx/jcfa/201709/t20170927153830.htm (accessed 21 Jun 2024).
Huang, P. et al. Phylogenetic, molecular and drug-sensitivity analysis of HA and NA genes of human H3N2 influenza A viruses in Guangdong, China, 2007–2011. Epidemiol. Infect. 141(5), 1061–1069. https://doi.org/10.1017/S0950268812001318 (2013).
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38(7), 3022–3027. https://doi.org/10.1093/molbev/msab120 (2021).
Sun, Z. Q., Xu, Y. Y. Medical statistics. The people health publishing house. 2020. 5th ed. 55–77. [ISBN: 978-7117-30385-9].
Karlin, E. F. A comparison of entropic diversity and variance in the study of population structure. Entropy (Basel). 25(3), 492. https://doi.org/10.3390/e25030492 (2023).
Mullick, B. et al. Understanding mutation hotspots for the SARS-CoV-2 spike protein using Shannon Entropy and K-means clustering. Comput. Biol. Med. 138, 104915. https://doi.org/10.1016/j.compbiomed.2021.104915 (2021).
Rozhoňová, H. et al. SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing. Bioinformatics. 38(18), 4293–4300. https://doi.org/10.1093/bioinformatics/btac510 (2022).
Yi, K. et al. Mutational spectrum of SARS-CoV-2 during the global pandemic. Exp. Mol. Med. 53(8), 1229–1237. https://doi.org/10.1038/s12276-021-00658-z (2021).
Lima, R. E. et al. Mathematical modeling and multivariate analysis applied earliest soybean harvest associated drying and storage conditions and influences on physicochemical grain quality. Sci. Rep. 11(1), 23287. https://doi.org/10.1038/s41598-021-02724-y (2021).
Rosu, M. E. et al. Substitutions near the HA receptor binding site explain the origin and major antigenic change of the B/Victoria and B/Yamagata lineages. Proc. Natl. Acad. Sci. U. S. A. 119(42), e2211616119. https://doi.org/10.1073/pnas.2211616119 (2022).
Forni, D. et al. The substitution spectra of coronavirus genomes. Brief Bioinform. 23(1), bbab382. https://doi.org/10.1093/bib/bbab382 (2022).
Heider, A. et al. Molecular characterization and evolution dynamics of influenza B viruses circulating in Germany from season 1996/1997 to 2019/2020. Virus Res. 322, 198926. https://doi.org/10.1016/j.virusres.2022.198926 (2022).
Desta, I. T. et al. The ClusPro AbEMap web server for the prediction of antibody epitopes. Nat. Protoc. 18(6), 1814–1840. https://doi.org/10.1038/s41596-023-00826-7 (2023).
Yefremova, Y. et al. Intact transition epitope mapping (ITEM). J. Am. Soc. Mass. Spectrom. 28(8), 1612–1622. https://doi.org/10.1007/s13361-017-1654-7 (2017).
Zhang, M. et al. Complete genome analysis of echovirus 30 strains isolated from hand-foot-and-mouth disease in Yunnan province, China. Virol. J. 20(1), 215. https://doi.org/10.1186/s12985-023-02179-9 (2023).
Chen, Q. Y. et al. Analysis of entire hepatitis B virus genomes reveals reversion of mutations to wild type in natural infection, a 15 year follow-up study. Infect. Genet. Evol. 97, 105184. https://doi.org/10.1016/j.meegid.2021.105184 (2022).
Chitray, M. et al. Symmetrical arrangement of positively charged residues around the 5-fold axes of SAT type foot-and-mouth disease virus enhances cell culture of field viruses. PLoS Pathog. 16(9), e1008828. https://doi.org/10.1371/journal.ppat.1008828 (2020).
Ding, D. et al. Protein design using structure-based residue preferences. Nat. Commun. 15(1), 1639. https://doi.org/10.1038/s41467-024-45621-4 (2024).
Noll, D. et al. Positive selection over the mitochondrial genome and its role in the diversification of gentoo penguins in response to adaptation in isolation. Sci. Rep. 12(1), 3767. https://doi.org/10.1038/s41598-022-07562-0 (2022).
Pinto, M. et al. Neisseria gonorrhoeae clustering to reveal major European whole-genome-sequencing-based genogroups in association with antimicrobial resistance. Microb. Genom. 7(2), 000481. https://doi.org/10.1099/mgen.0.000481 (2021).
Acknowledgements
We gratefully acknowledge the colleagues from the Guangdong Influenza Surveillance Network (GDISN). This work was financially supported by the Guangzhou Scientific and Technological Project (201904010286), the Guangdong Natural Science Foundation (2016A030313775) and the National Natural Science Foundation of China (30972757).
Author information
Authors and Affiliations
Contributions
P.H. and Z.-Z.H. conceived the study. J.T. and Z.-Z.H. sequenced. L.-J.L., Z.-Z.H. and P.H. collected data. J.T., L.-J.L., Z.-Z.H. and P.H. analyzed data, interpreted the results and drafted the manuscript. P.H., Z.-Z.H., B.-S.L. and Q.G. revised and edited the intellectual content of the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Huang, ZZ., Tan, J., Huang, P. et al. The evolutionary features and roles of single nucleotide variants and charged amino acid mutations in influenza outbreaks during NPI period. Sci Rep 14, 20418 (2024). https://doi.org/10.1038/s41598-024-71349-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-71349-8
- Springer Nature Limited