Background

Shiga-like toxin-producing Escherichia coli (STEC), also called verotoxin-producing E. coli, is a major pathogenic group of E. coli that causes bloody diarrhea and hemolytic uremic syndrome (HUS) and enterohemorrhagic E. coli (EHEC) is one of such STEC [1]. Gene(s) encoding the Shiga-like toxin (Stx) are carried by a lambdoid phage and the most frequently isolated serotypes of Shiga-like toxin-producing EHEC are O157, O104, O26, O111, and O145 [1-3]. E. coli is a common member of the normal flora in the large intestine, but sometimes they acquire pathogenic genes from other bacteria or bacteriophages. Indeed, there are several cases in which non-pathogenic strains or unknown serotypes of STEC cause diseases with symptoms similar to those of the STEC strains [4-6]. The causative organism of the 2011 German outbreak, which is the largest STEC outbreak [7,8], is E. coli O104:H4 that is enteroaggregative E. coli (EAEC) harboring the Stx prophage [9]. The major virulence feature of EHEC is the Shiga-like toxin, which is an exotoxin that causes cellular toxicity. Another feature of EHEC is intimin, which is an outer-membrane adhesion protein encoded by the locus for enterocyte effacement (LEE) island [10]. The major virulence factor of EAEC is aggregative adhesion fimbriae, which mediate bacterial adherence and make ‘stacked brick wall’ structure on the host cells [11]. This EHEC/EAEC hybrid strain also acquired plasmid-encoded antibiotic resistance genes and exhibited strong virulence [12]. In South Korea in 2002 and 2006, there were two case reports that the serotype O8 and O104:H4 E. coli strains caused HUS in a 16 year-old man [13] and a 29 year old woman [14], respectively. Moreover, in 2012, we reported the genome sequence and analysis results of the virulence genes of EHEC strains isolated from Korea [2,3,15]. To reveal the genomic features of STEC in Korea, we sequenced a dozen of E. coli strains from diarrhea patients in Korea from 2001 to 2011. Among them, two strains of Shiga-like toxin-producing E. coli belonging to same group were selected for genome analysis. In this study, we reported the genomes of two E. coli strains, named as NCCP15655 and NCCP15656, which had been isolated from the feces of a female patient and a male patient with diarrhea in South Korea in 2003. In the strains, the gene encoding Shiga-like toxin was detected, but serotypes were not determined by experiment. Through the genome analysis of these two isolates, we report a case of pathogenic E. coli strains with two types of Shiga-like toxin genes in a single genome whose structure is most similar to non-EHEC strains.

Methods

Bacteria and DNA isolation

In 2003, two E. coli strains were isolated from stool samples of a female patient and a male patient with symptom of diarrhea in Korea. To test the presence of the Shiga-like toxin genes (stx1 and stx2), the two strains were subject to PCR with the primers specific to stx1 (F′-CGTACGGGGATGCAGATAAATCGC and R′-CAGTCATTACATAAGAACGCCCAC) and stx2 (F′-GTTCTGCGTTTTGTCACTGTCAC and R′-GTCGCCAGTTATCTGACATTCTGG). These two strains were deposited at the National Culture Collection for Pathogens in Korea National Institute of Health (KNIH) and their accession numbers are NCCP15655 (from a female patient) and NCCP15656 (from a male patient). Genomic DNA was extracted using chemical and enzymatic methods as described in Molecular Cloning, A Laboratory Manual [16].

Genome sequencing, assembly and annotation

Genome Analyzer IIx of the Illumina-Solexa platform at the Biomedical Genomics Research Center of Korea Research Institute of Bioscience and Biotechnology was used for genome sequencing. 22,525,438 high-quality reads with 233-fold coverage for NCCP15655 and 27,858,714 high-quality reads with 235-fold coverage for NCCP15656 were generated from 500-bp paired-end libraries. Sequence trimming and de novo assembly were performed using CLC Genomics Workbench version 5.1 (CLC bio, Inc.) and scaffolding was carried out with SSPACE [17]. Automatic gap filling was performed using IMAGE [18] and manual gap filling was performed using CLC Genomics Workbench. Structural gene prediction was performed using Glimmer 3 [19] and functional annotation was performed using blastp against MicroScope database [20] of E. coli and Shigella species. We then employed automatic annotation using the RAST server [21] and compared it with the annotation result from MicroScope database for more accurate functional assignment. We also performed additional blastp against the subsystem database of the RAST server for the gene categorization.

Gene clustering and phylogenetic tree construction

Core gene set of 71 genomes (60 E. coli strains, 10 Shigella strains, and 1 Escherichia fergusonii) was identified using OrthoMCL (version 2.0.3) [22] with parameters of e-value ≤ 1E-5, identity ≥ 85%, and coverage ≥ 80% [23]. Duplicated genes were excluded from the core gene set. 1,273 core genes were used for the phylogenetic tree construction. Amino-acid sequences of each core gene were aligned with MUSCLE (version 3.6) [24] and converted to phylip format after concatenation of all core genes. A maximum likelihood tree was constructed using PhyML (version 2.4.5) [25] with JTT evolutionary model [26].

Other computational analysis

Average nucleotide identity values based on BLAST (ANIb) [27] were calculated by Jspecies [28] with ANI calculation parameters of identity ≥ 30% and coverage ≥ 70%. Clustered regularly interspaced short palindromic repeat (CRISPR) was detected with CRISPRfinder (http://crispr.u-psud.fr/Server/). Homology searches were conducted using the BLAST software. Serotype analysis was performed using SerotypeFinder (ver.1.0) in the center for genomic epidemiology server (https://cge.cbs.dtu.dk/services/). Subtype analysis of the stx genes was conducted with the sequence-based protocol [29].

Quality assurance

Genome sequencing was conducted using a single bacterial isolate and contamination possibility was checked using CLC Genomics Workbench in the step of de novo assembly, mapping reads to contigs and generation of detailed mapping report. The contamination of other genomes can be checked through confirmation of coverage level distribution in a detailed mapping report as well as inspection of the alignment result with accurate paired distance.

Initial findings

Genome structure

The draft genome of Escherichia coli NCCP15655 and NCCP15656 consist of five contigs and 15 contigs, respectively. The sum of five contigs of NCCP15655 is 4,965,708 bp (50.86% G + C content) and 4,970 coding sequences (CDSs), seven ribosomal RNA operons and 97 tRNAs were predicted. The sum of 15 contigs of NCCP15656 is 4,925,312 bp (50.93% G + C content) and 4.919 CDSs, seven ribosomal RNA operons and 92 tRNAs were detected. NCCP15655 and NCCP15656 have two CRISPRs in each that consist of direct repeat sequences and seven spacer sequences. The spacers 5 and 6 in CRISPR 1 and spacer 7 in CRISPR 2 had no homology with sequences in the GenBank database.

Phylogenetic relationship and comparison with closely related strains

A phylogenomic tree was constructed using 1,273 core genes of NCCP15655, NCCP15656, and the completely sequenced strains in Escherichia/Shigella group. The tree showed that NCCP15655 and NCCP15656 belong to the group B1 and formed a sister clade with strain E24377A, which is an enterotoxigenic E. coli (ETEC) (Figure 1). ANIb values between strain NCCP15655/NCCP15656 and other strains belonging to B1 group were 98.27 ~ 99.08 (Table 1). NCCP15655 and NCCP15656 are Shiga-like toxin producing E. coli but they form a sister clade with ETEC strain E24377A despite of highest similarity of ANI value with non-pathogenic strains. Thus, we compared the genomic features using subsystem classification between NCCP1565/NCCP15656 and E24377A. In spite of the high similarity of genomes and phylogenetic proximity, there are distinct differences between NCCP15655/NCCP15656 and E24377A in the proportion of subsystem-assigned genes. Subsystem classification results showed that the proportions of the genes belonging to the subsystem category “phages, prophages, transposable elements, plasmids” and “virulence, disease and defense” are higher in NCCP15655 and NCCP15656 than E24377A (Figure 2 and Table 2). The number of genes belonging to the sub-category ‘phages, prophages’ and ‘bacteriophage structural proteins’ of “phages, prophages, transposable elements, plasmids” and sub-category ‘resistance to antibiotics and toxic compounds’, ‘adhesion’, and ‘type III, type IV, type VI, ESAT secretion systems’ of “virulence, disease and defense” are higher in NCCP15655 and NCCP15656 than E24377A. In the genome of NCCP15655 and NCCP15656, the genes belonging to sub-category ‘phages, prophages’ and ‘bacteriophage structural proteins’ include Stx phage and the genes belonging to sub-category ‘type III, type IV, type VI, ESAT secretion systems’ encoded conjugative plasmid-related proteins. A conjugative plasmid in NCCP15655 and NCCP15656 harbors the hlyABCD genes that encode a hemolysin.

Figure 1
figure 1

Phylogenetic relationship among genome-sequenced E. coli and Shigella strains. The phylogenetic tree was generated by PhyML with amino-acid sequences of 1,273 core genes from completely sequenced E. coli and Shigella strains. Each color indicates the phylogenetic group of E. coli (red, A; yellow, B1; black, Shigella; blue, E; purple, D; green, B2). Bootstrap values (percentages of 1,000 replications) greater than 50% are shown at each node. Escherichia fergusonii ATCC 35469 were used for the out-group. The scale bar represents 0.001 nucleotide substitutions per site.

Table 1 Average nucleotide identity values based on BLAST between the completely sequenced members of the E. coli B1 group
Figure 2
figure 2

Comparison of the subsystem categories. Comparison results of the subsystem-assigned genes among NCCP15655, NCCP15656, and E24377A. (A) Relative abundance of the subsystem-assigned genes. A, Carbohydrates; B, Clustering-based subsystems; C, Amino acids and derivatives; D, Cell wall and capsule; E, Phages, prophages, transposable elements, plasmids; F, Virulence, disease and defense; I, Membrane transport; J, Protein metabolism; K, Cofactors, vitamins, prosthetic groups, pigments; L, Stress response; M, DNA metabolism; N, Respiration; O, Nucleosides and nucleotides; P, Regulation and cell signaling; Q, RNA metabolism; R, Motility and chemotaxis; S, Nitrogen metabolism; T, Fatty acids, lipids, and isoprenoids; U, Miscellaneous; V, Metabolism of aromatic compounds; W, Phosphorus metabolism; X, Cell division and cell cycle; Y, Iron acquisition and metabolism; Z, Sulfur metabolism; AA, Potassium metabolism; AB, Secondary metabolism; AC, Dormancy and sporulation. (B) Number of CDSs assigned to the sub-category of “Phages, prophages, transposable elements, plasmids”. E-1, Phages, prophages; E-5, Bacteriophage structural proteins; E-3, Bacteriophage integration/excision/lysogeny; E-4, Phage host interactions; E-6, Superinfection exclusion; E-2, Transposable elements. (C) Number of CDSs assigned to the sub-category of “Virulence, disease and defense”. F-1, Resistance to antibiotics and toxic compounds; F-2, Adhesion; F-3, Type III, type IV, type VI, ESAT secretion systems; F-4, Invasion and intracellular resistance; F-5, Fimbriae of the chaperone/usher assembly pathway; F-6, Bacteriocins, ribosomally synthesized antibacterial peptides; F-7, Toxins and superantigens. Bars: black, NCCP15655; gray, NCCP15656; blue, E24377A.

Table 2 Number of the subsystem-assigned CDSs

Interestingly, although the two strains have been isolated independently from different individuals, the two strains are remarkably similar. In fact, the serotype determined by the wzt and wzm gene for O-antigen and the fliC gene for H-antigen indicated that the serotype of NCCP15655 and NCCP15656 is O8:H49. Moreover, at the genomic level, two strains are highly similar and ANIb values between the strains range from 99.98 to 99.99 (Table 1). Based on these relationships, we postulate that they might share a very recent common ancestor, if not clonal.

Shiga-like toxin and virulence genes

In the NCCP15655 and NCCP15656 genomes, genes encoding Shiga toxin type 1 (Stx1) and Shiga toxin type 2 (Stx2) were detected. The Stx1 subunit A is composed of 315 amino-acids and subunit B is composed of 89 amino-acids. In the NCCP15655 genome, the stx 1 genes were detected in the region of a prophage, which have 100% amino-acid identity with the Shiga toxin of Shigella dysenteriae Sd197. The Stx2 subunit A is composed of 319 amino-acids and subunit B is composed of 89 amino-acids. The stx 2 genes were detected in another prophage region, which is located at the end of the contig. The stx 2 gene is very similar to that of E. coli strain 11128, which has stx 1 genes (Figure 3). The results from subtype analysis of the stx genes indicated that stx 1 is stx 1a and stx 2 is stx 2a in both strains. Unlike the typical EHEC strain, in the genomes of NCCP15655 and NCCP15656, the LEE island was not detected but the genes encoding type 1 fimbriae biosynthesis proteins, adhesion AidA, fimbriae-like adhesion SfmA/H, and CFA/I fimbrial minor adhesin were detected. In both strains, a gene encoding type IV pilus biosynthesis proteins, entropathogenic E. coli secreted protein C, which is a serine protease and causes epithelial damage [30], and genes encoding hemolysin were detected in the final contigs designated as plasmid and in chromosome, type 1 fimbriae operon were identified.

Figure 3
figure 3

Clustering analysis of the subunit A of the Shiga toxin type 1 and type 2. Un-rooted trees based on the nucleotide sequences of Shiga toxin subunit A were constructed using Neighbor-joining method with Jukes-Cantor model. Bootstrap values (percentages of 1,000 replications) greater than 50% are shown at each node. The scale bar represents 0.005 nucleotide substitutions per site. Yellow, E. coli B1 group; Sky blue, E. coli E group; Black, unknown (A) Shiga toxin type 1, (B) Shiga toxin type 2.

Future directions

The Stx phage carrying the Shiga toxin and the LEE island harboring the type III secretion system are the major features of EHEC strains [31]. The genomes of NCCP15655 and NCCP15656 encode the Shiga-like toxin, but not genes related to the LEE island. However, they acquired a plasmid encoding hemolysin and entropathogenic E. coli secreted protein C. NCCP15655 and NCCP15656 acquired the virulence genes through the horizontal gene transfer and caused the diarrhea symptom in human. In the case of E24377A, a gene encoding a heat-labile toxin, which is a major virulence factor of ETEC is located in the plasmid but not detected in NCCP15655 and NCCP15656. These mean that, in certain environment, bacterial strains can obtain virulence factors through the acquisition of a virulence gene-harboring plasmid or a phage and cause the disease. This report is yet another example for pathogenic E. coli strains that have acquired virulence genes through acquisition of plasmids and phages. These genomes will be good examples for further analysis for the study of acquisition and diffusion of virulence genes in E. coli.

Availability of supporting data

These Whole Genome Shotgun projects of NCCP15655 and NCCP15656 have been deposited at GenBank under the accession ATLW00000000 and ATLX00000000, respectively.