Introduction

Norovirus has been recognized as the leading cause of acute nonbacterial gastroenteritis outbreaks worldwide1. Human noroviruses are classified into at least five genogroups (GI, GII, GIV, GVIII and GIX) which are further subdivided into 35 genotypes2,3. The norovirus genome consists of a 7.5 kb single-stranded and positive-polarity RNA segment encoding three open reading frames (ORFs). ORF1 encodes non-structural proteins including the viral RNA-dependent RNA polymerase (RdRp) and ORF2 and ORF3 encode structural proteins VP1 and VP2, respectively4. VP1 is composed of shell (S) and protruding (P) domains and the P domain contains both the antigenic sites as well as histo-blood group antigen (HBGA) binding sites5,6.

The epidemiology of norovirus is strongly influenced by norovirus evolution through recombination or accumulation of mutations7. Recombination often occurs at the ORF1/ORF2 junction that leads to new combinations of capsid and RdRp types, further increasing genetic diversity8. These new recombinant strains might have increased fitness and transmissibility over their parental strains9. The same capsid genotype can be associated with different RdRp genotypes, which may offer a temporary selective advantage through altering the efficiency of virus replication2. To better understand epidemiologic and genotypic trends of evolving norovirus recombinant strains in the field, we examined and analyzed norovirus outbreak data and strains collected between January 2015 and December 2018 in Jiangsu China. Our analysis showed that recombinant strains increased significantly in norovirus outbreaks between 2015 and 2018 and the GII.2[P16] recombinant strains were responsible for most outbreaks. Recombination appeared to be main force driving norovirus evolution in the field in the recent years.

Results

Epidemiological features

A total of 213 norovirus outbreaks with 3,951 patients were reported to the Jiangsu CDC from January 2015 to December 2018. Of the 213 outbreaks, 19 (8.9%) occurred in 2015, 9 (4.2%) in 2016, 92 (43.2%) in 2017 and 93 (43.7%) in 2018; 43 (20.8%) were reported in kindergartens, 109 (51.2%) in primary schools, 38 (17.8%) in middle schools, 11 (5.1%) in secondary schools and 11 (5.1%) in other settings; 68 (31.9%) occurred in spring, 5 (2.4%) in summer, 85 (39.9%) in autumn and 55 (25.8%) in winter; 2181 (55.2%) cases were males and 1770 (44.8%) were females. Most outbreaks occurred in the period of season transitions, such as from autumn to winter (November and December) and from winter to spring (February and March). Peaks of culminative outbreaks were observed in March and November whereas no outbreak occurred in July and August, likely due to summer recesses for schools. There were many fewer outbreaks in 2015 and 2016 with the fewest reported in 2016 when cases were reported only in March, October, and December. However, rapid increase of outbreaks in number occurred since February 2017 with most cases reported in that spring. Interestingly in 2018, the cases were fewer in spring and the major peaks of outbreaks occurred in early and late autumn (from October to November). Thus, even though the trend remained similar, the outbreaks in number and peak time differed greatly each year from 2015 through 2018 (Fig. 1a). In addition, there were 7 genogroup I norovirus outbreaks that occurred during this period but were not included in this analysis due to failure to sequencing their RdRp genotypes.

Figure 1
figure 1

Norovirus outbreaks in Jiangsu, China, 2015–2018. (a) Laboratory confirmed number of monthly norovirus outbreaks. (b) Distribution of norovirus genotypes detected in norovirus outbreaks.

Geographic distribution of the outbreaks is shown in Fig. 2. About 170 (79.8%) outbreaks occurred in four prefecture-level cities in the southwest (Nanjing, Wuxi, Changzhou, and Yangzhou) regions. In contrast, 37 (17.4%) outbreaks were reported in the east regions and only 6 (2.8%) occurred in three cities (Xuzhou, Suqian, and Huai’an) in the northwest regions (Fig. 2).

Figure 2
figure 2

Spatial distribution of norovirus outbreaks in Jiangsu, China, 2015–2018. The outbreaks were shown by geographic locations. The number of the outbreaks was reflected with various colors.

Characteristics of the norovirus outbreaks by genotype

Among 213 outbreaks caused by genogroup II norovirus, 197 (92.5%) were genotyped and 16 (7.5%) were not (GII.NT). Based on the RdRp and VP1 sequences, eight genotypes were identified throughout the study period, which included GII.2[P16] (144, 67.6%), GII.3[P12] (21, 9.9%), GII.6[P7] (5, 2.3%), GII.14[P7] (4, 1.9%), GII.4 Sydney[P31] (3, 1.4%), GII.1[P33] (1, 0.5%), GII.2[P2] (3, 1.4%), and GII.17[P17] (16, 7.5%) (Fig. 1b).

To characterize the norovirus outbreaks caused by recombinant strains, eight genotypes were regrouped based on GII.R (Recombinant) or GII.Non-R (Non-recombinant) strains. Six GII.R strains were identified and comprised of GII.2[P16], GII.3[P12], GII.6[P7], GII.14[P7], GII.4 Sydney[P31], and GII.1[P33]. The GII.Non-R strains included the genotypes of GII.2[P2] and GII.17[P17]. Prevalence of GII.R-caused outbreaks increased significantly in recent years from 3.9% (7/178) in 2015 to 48.3% (86/178) in 2017 and GII.R was responsible for at least 178 (83.6%) of 213 norovirus-positive outbreaks with a peak in 2017 and 2018 (Table 1). The number of patients in outbreaks caused by GII.R was 3,181 (80.5%) with a median of 19.5 patients per outbreak.

Table 1 Analysis of characteristics associated with GII.Recombinant, GII.Non-Rec and GII.NT outbreaks from 2015 to 2018 in Jiangsu, China.

As shown in Table 1, GII.R strains were the dominant epidemic strains across all settings. Other than the GII.R strains, GII.Non-R and GII.NT strains had similar prevalence rates in kindergartens and primary schools, but GII.Non-R had higher prevalence rates in middle schools than GII.NT. GII.Non-R strains were also the dominant epidemic ones in other settings. Most norovirus outbreaks occurred in primary schools. GII.R strains were responsible for 94 of 109 (86.2%) of norovirus outbreaks in primary schools (Table 1), while GII.Non-R and GII.NT strains were responsible for only 6 (5.5%) and 9 (8.3%) norovirus outbreaks, respectively. Seasonally, GII.R strains were the main genotypes in all seasons with a peak detection rate in autumn, while the peak for GII.Non-R strains were in spring. Of the 3,951 norovirus-positive cases, the number of male cases is higher than that of female cases in each group, although the difference appeared not statistically significant.

Molecular phylogenetic characteristics of recombinant noroviruses

To characterize the potential recombination events of the GII.R strains, a region of 1095 bp in the ORF1/ORF2 junction of the viral genome was amplified by a nested PCR. The sequences were typed by using the calicivirus typing tool (https://norovirus.ng.philab.cdc.gov). The phylogenetic tree was constructed based on partial RdRp gene (750 bp) and capsid gene (365 bp) using the Maximum Likelihood method (Fig. 3a,b). As shown in Fig. 3, six strains had discordant capsid and polymerase genotypes and were considered intergenotype recombinant strains.

Figure 3
figure 3

Phylogenetic analyses of the recombinant strain sequences based on partial RdRp and full-length capsid regions (VP1). (a) Phylogenetic tree of a 750 bp region of RdRp. (b) Phylogenetic tree of complete VP1. The trees were constructed using the Maximum Likelihood analysis and the evolutionary distances were computed using the Kimura 2-parameterþG method available in MEGA 7.0. Bootstrap values (>70%) are shown as percentages derived from 1,000 samplings at the nodes of the tree. The scale bars represent the number of nucleotide substitutions per site. The new norovirus strains reported in this study are indicated with solid black diamonds.

Since the length of the amplified RdRp fragments from the six recombinant strains was 750 bp long and the corresponding ORF1/2 overlapping regions were 731 to 750 bp, the recombination breakpoints would be near the ORF1/2 junction for all six strains as indicated by the SimPlot analysis (Fig. 4). In fact, the recombination breakpoints were identified at positions varying from nucleotides 641 to 761, corresponding to the nucleotides positioned at 5009 to 5111 in the whole viral genome, localized in the ORF1 region for four strains (GII.2[P16], GII.3[P12], GII.6[P7], and GII.14[P7]) and in the ORF2 for the other two strains (GII.4 Sydney[P31] and GII.1[P33]).

Figure 4
figure 4

The SimPlot analysis of the recombinant strain sequences. SimPlot was constructed using a Simplot software version 3.5 with a slide window width of 200 bp and a step size of 20 bp. At each position of the window, the query sequence was compared to each of the reference strains. The X-axis indicates the nucleotide positions in the multiple alignments of the NoV sequences; and the Y-axis indicates nucleotide identities (%) between the query sequence and the NoV reference strains.

Phylogeography of GII.14[P7] genotypes

Of the six recombinant strains, GII.14[P7] was further analyzed because, unlike other strains, it was a rare genotype which did not have an RdRp genotype that belongs to any known RdRp genotypes. According to the phylogenetic analysis, sequences of GII.14[P7] were grouped into four major clusters based on their RdRp genes (Fig. 5).

Figure 5
figure 5

Phylogenetic analyses of the GII.14[P7] sequences based on partial RdRp and VP1. (a) Phylogenetic tree of a 313 bp region of RdRp. (b) Phylogenetic tree of a 282 bp region of VP1. The trees were constructed using the Maximum Likelihood analysis and the evolutionary distances were computed using the Kimura 2-parameterþG method available in MEGA 7.0. Bootstrap values (>70%) are shown as percentages derived from 1,000 samplings at the nodes of the tree. The scale bars represent the number of nucleotide substitutions per site. The new norovirus strains reported in this study are indicated with solid black diamond.

In detail, as for the RdRp gene, the GII.14[P7] variants identified in the period from 2006 to 2008 fell in Cluster II. The GII.6[P7] variants originating from 2012 to 2016 and from 2002 to 2012 fell in Cluster I and III, respectively. The GII.7[P7] variants and the GII.14[P7] variants from 2013 to 2017 belonged to Cluster IV (Fig. 5a). The GII.14[P7] variant reported here from Jiangsu province was in Cluster IV based on the RdRp trees. As for the VP1 gene, several clusters were observed in the tree, but they did not obtain enough bootstrap support (showed bootstrap support of <70%). The Jiangsu variant was in the same lineage with variants from 2016 to 2017 (Fig. 5b).

Even though the complete VP1 gene of the GII.14[P7] variant in this study has been sequenced, further analysis with VP1 was limited because only a few complete VP1 genes of the GII.14[P7] variants, which were also reported previously within a short period of time, were available in GenBank. On the other hand, three HBGA binding sites of all known GII.14 strains remained conserved, while several amino acid mutations in the predicted conformational epitopes were found10 as shown in Table 2. However, a single amino acid change (aa373, D-N) found peripheral to the HBGA-binding site II, which is also located in the predicted conformational epitopes, may have an important effect on the viral antigenicity.

Table 2 The sequence analysis of HBGA binding sites and predicted conformational epitopes of GII.14 VP1 protein.

Discussion

Recombination of human noroviruses is an important mechanism to generate genetic diversity and recombinant strains are frequently detected, particularly between pandemic peaks11. From January 2015 to December 2018, norovirus outbreaks caused by recombinant strains increased rapidly in number with a peak in 2017 and 2018. Most outbreaks were attributed to the emergence of GII.2[P16] variants, which were responsible for 67.6% (144/213) of all norovirus outbreaks. During the winter of 2016–2017, the GII.2[P16] strain suddenly emerged and rapidly became the predominant genotype throughout mainland China and Japan12,13. During the same period, however, GII.4 Sydney[P16] was the predominant strain in the United States, Australia, and New Zealand11,14. GII.P16 polymerases have also been found to recombine with GII.3 and GII.13 capsids, but the P16 polymerase sequence associated with GII.2 capsids is almost identical to the P16 sequences that harbor the GII.4 Sydney capsids11,14,15. Emergence of GII.P16 strains indicates that the viral RNA polymerase confirms that ORF1 sequences play a more important role in predominance of certain but not all emerging recombinant genotypes.

To understand how recombination occurred among norovirus, more analyses have been carried out on GII.4 and GII.3 strains for rare occurrence of recombination events in the past. The GII.4 norovirus had been the predominantly detected variant worldwide since 1995. Its capsid protein continuously underwent epochal evolution by emergence of one antigenically distinct GII.4 strain approximately every 3–5 years8,14. However, this trend in antigenic evolution did not continue with the GII.4 Sydney[P16] viruses that emerged in 2015. The antigenic domains in the capsid did not evolve with any amino acid changes between the GII.4 Sydney[P16] strains and GII.4 Sydney[P31] strains, although the VP1 sequences of GII.4 Sydney[P16] strains formed a new cluster11,14,16. In contrast, GII.3 strains evolved earlier through recombination, which became common genotypes in sporadic infection17 and ranked only second to the annual GII.4 epidemic strains in China18. In this study, GII.3 was also the second genotype in number causing the outbreaks. Since 2000, most GII.3 noroviruses have become recombinant strains, which possessed a non-GII.3 RdRp genotype. The common types of polymerase recombinants with GII.3 were GII.P12, GII.P16, and GII.P21 (formerly termed GII.Pb)11,19. The recombinant strains increased circulation, suggesting that recombination may have contributed to viral immune escape or conferred higher virological fitness14 for maintaining the fitted strains or genotypes in human population.

Schools were the main sites for outbreaks, especially in primary schools, which was proportionally higher than the others significantly. There was no outbreak in July and August due to schools’ summer recesses, similar to those previously reported in Shanghai, China, where most outbreaks occurred in kindergartens (48.3%) and primary schools (45.0%). In contrast in Australia, Europe, and the United States, most outbreaks occurred in long-term care facilities, followed by hospitals or restaurants, while outbreaks in schools accounted for only a small fraction. The GII.4 viruses, identified as the most predominant genotypes, were more common in outbreaks in health-care facilities compared to other genotypes9,14,16.

Noroviruses are classified into genogroups and genotypes based on amino acid homology in the RdRp and VP1 proteins and some genotypes consist of various subclusters (such as GII.3, GII.4 and GII.6). However, no specific criteria have been applied to classify GII.14[P7] strains within variant types20,21. In this study, we subdivided the GII.14[P7] strains into several clusters according to the GII.P7 and GII.14 reference strains. Generally one VP1 genotype of noroviruses combine with one or more polymerase genotypes and the resultant strains usually contain the original polymerase genotypes. For example, the GII.3 VP1 genotype could combine with several RdRp genotypes, such as GII.3[P3], GII.3[P12], GII.3[P16], and GII.3[P21], and the resultant recombinant strains usually possess the original GII.P3 RdRp genotypes. However, the VP1 of GII.14 genotype seems to combine only with GII.P7 RdRp genotype, which results in recombinant strains without possessing the original GII.P14 RdRp genotype. On the contrary, the RdRp of GII.P7 could recombine with other VP1 genotypes, such as GII.6[P7] variants, in addition to GII.1422. Although we could find GII.P6 genotypes in RdRp region, no GII.P14 genotypes has ever been found. Our data also show that there were several substitutions in the predicted conformational epitopes compared with the ancestral strain, and one of them was peripheral to the HBGA-binding site II. These results suggest that the recombinant strain was evolving slowly and continuously to achieve long-term fitness and stability in the population.

In summary, this study leverages data from two surveillance systems (EPHEIM and NOSS) to provide a comprehensive analysis of norovirus recombinant strains from both the laboratory and epidemiologic perspectives. The results showed that the proportion of recombinant strains increased significantly in the norovirus outbreaks between 2015 and 2018 in Jiangsu, while the GII.2[P16] recombinant strains were accountable for the majority of the outbreaks. Although antigenic drift and recombination are regarded as the main mechanisms for norovirus evolution, constantly increasing proportion of recombinant strains seems to suggest that viruses are more likely to evolve via recombination recently. On the other hand, we have to bear in mind the possibility that more recombinant strains are detected nowadays probably due to improved detection protocols. The current standard for genotyping includes polymerase and capsid genotypes, for example, while in the past years only the capsid or polymerase region was typed by many laboratories. Retyping the strains collected in the past with the current protocol should be able to deal with this concern unequivocally. Increased surveillance for early identification of potential pandemic variants would provide warning to public health sectors so that they could formulate effective preventive and control measures in time.

Methods

Sample collection and ethics statement

Two systems, the Emergent Public Health Event Information Management System (EPHEIM) and the norovirus Outbreak Surveillance System (NOSS), had been used to report noroviruses through outbreak-based surveillance in Jiangsu province. An outbreak was defined as to have at least 20 cases within one week or 5 cases within three days with symptoms including vomiting and/or diarrhea. Patient samples positive for norovirus were submitted to the laboratory of Jiangsu provincial Center for Disease Control and Prevention (CDC) for further analysis. To characterize temporal and spatial distribution of outbreaks, hierarchical mapping was carried out with ArcGIS software (version 10.0; ESRI, Redlands, CA). This study was approved by the Institutional Review Board of Jiangsu CDC with the approval protocol No. JSCDCLWLL2019002. Written informed consent was obtained according to the guidelines of the National Ethics Regulation Committee.

Norovirus genotyping

Norovirus-positive samples were genotyped in both the ORF1 (RdRp) and ORF2 (capsid VP1) regions. A region of 1,095 bp in the ORF1/ORF2 junction of the viral genome was obtained by RT-PCR using a semi-nested specific primer set as previously described23. The genotypes were determined by using the norovirus automated genotyping tool (http://www.rivm.nl/mpf/norovirus/typingtool) and human calicivirus typing tool (https://norovirus.ng.philab.cdc.gov).

The complete VP1 genomic fragments (1.7 kb) of the six recombinant strains were amplified with a semi-nested PCR GII-specific primer set (COG-2F/VN3T20 in the first-round PCR and G2SKF/VN3T20 for the second-round PCR) as previously described23. Next, the ORF1/ORF2 junction fragment and the complete VP1 genomic fragment were PCR-ligated through splicing into a 2.4 kb genomic fragment which contained a complete capsid sequence and partial RdRp sequence. All PCR products were purified and subsequently sent to the Sangon Biotech (Shanghai, China) Company for Sanger sequencing.

Sequences analysis

All nucleotide and amino acid (aa) sequence alignments were performed using Bioedit and MEGA 7.0 software24. Phylogenetic trees were constructed using the Maximum Likelihood algorithm with 1,000 bootstrap replicates and a Kimura2-parameter model in MEGA 7.0 with norovirus reference sequences obtained from the GenBank database. Nucleotide sequences obtained from clinical samples were deposited in GenBank under the accession numbers from MK614059 to MK614064.

In order to verify the recombination event, the 2.4 kb genomic fragments, which were constructed by PCR as mentioned earlier and contained the ORF1/ORF2 junction region, were analyzed along with the reference strains obtained from GenBank by using a Simplot software v.3.5.1. The SimPlot analysis was performed by setting the window width and the step size to 200 bp and 20 bp, respectively.

Statistical analysis

Categorical data were presented as frequencies with percentages. Case numbers were presented as the median and interquartile range (IQR). For categorical data, differences among groups were examined using the chi-square test or Fisher’s exact probability test. For continuous data, Kruskal-Wallis Test was used to determine differences among groups. p < 0.05 was considered to indicate a statistically significant difference.