Background

The Gram-positive Helicobacter pylori is an obligate, microaerophilic and fastidious bacterium that frequently colonizes the gastrointestinal mucosa of humans [1]. This opportunistic pathogen is known to colonize a significant portion of the world’s population, ranging from 24.4% in Oceania to 79.1% in Africa and 63.4% in Latin America and the Caribbean [2, 3]. A recent systematic meta-analysis shows a global decline in the prevalence of H. pylori from 58.2% (1980 – 1990) to 43.1% (2011—2022); however, more than 40% of the world population is estimated to still be infected with H. pylori [4]. The wide variation that exists in H. pylori prevalence between regions is in part attributed to prevailing social and economic conditions [3, 5, 6]. H. pylori can colonize the stomach for decades without causing any detectable symptoms [1]. Nonetheless, colonization can progress to adverse complications. Infection with H. pylori is the most common cause of chronic gastritis, which can lead to more severe gastroduodenal pathologies such as peptic ulcer, mucosa-associated lymphoid tissue lymphoma, and gastric cancer [1]. H. pylori is the only bacterium currently classified as a Group 1 carcinogen by the International Agency for Research on Cancer (IARC) of the World Health Organization (WHO) due to it being the causative agent of 80% of all stomach cancers [7]. The US Centers for Disease Control and Prevention estimates that people with chronic H. pylori infection are at greater risk of developing stomach cancers when compared to uninfected people [8].

Treatment regimens for H. pylori infections often combine a strong acid suppressant (proton pump inhibitor [9]) and antibiotics such as clarithromycin and amoxicillin [10, 11]. Metronidazole has also been used as an alternative [12]. However, antimicrobial resistance is the most important factor in treatment failure of H. pylori infection. Worldwide, rates of resistance against these three primary antimicrobial agents have dramatically increased up to 20 – 30% [13, 14], threatening the first-line defense against eradicating H. pylori. As such, the WHO recommends greater efforts for research and development on novel therapeutics, screening and prevention strategies against H. pylori infections [7]. Effective eradication of H. pylori requires vigilance in implementing antimicrobial susceptibility testing, resistance surveillance, and antibiotic stewardship.

H. pylori displays extensive genetic diversity [15, 16] and a characteristic population structure that is shaped by their intimate co-evolution with their human hosts in the past ~ 100,000 years [17,18,19]. This has resulted to distinct geographically partitioned subpopulations that parallel the evolutionary history of human ethnic groups [15, 16]. Its highly adaptive nature lies in part to its natural competence for DNA uptake and recombination [20, 21]. Understanding the genetic heterogeneity in H. pylori between geographical regions is critical to inform future health policy on prevention and management that is most effective at local and country-wide scales.

In this study, we used in vitro antimicrobial susceptibility testing and whole genome sequencing to characterize the virulence and antimicrobial resistance features and mobile genetic elements of five H. pylori isolates sampled from gastritis patients in a specialized medical center in Pereira, Colombia. Gastritis or the inflammation of the stomach lining can be classified into different types depending on the severity [22]. The isolates in our study came from chronic gastritis, antral erosive gastritis, and superficial gastritis. To place the Pereira isolates in the broader geographical context, we also carried out comparative genomics with publicly available genomes derived from other parts of Colombia and with the global population. This work contributes to the growing body of knowledge of H. pylori pathogenesis that will help inform current efforts of devising effective treatment and prevention strategies, especially in countries that are poorly represented in global genomic surveillance studies of bacterial pathogens.

Methods

Bacterial sampling and isolation

The study included 20 H. pylori isolates randomly selected from a previously published surveillance study carried out in 2015 [23]. Isolates were derived from dyspeptic patients in a specialized medical center in the city of Pereira, Colombia. One isolate was obtained from each patient. The surveillance project was approved by the Bioethics Committee of the Universidad Tecnológica de Pereira (UTP), which approved the informed consent before the start of the surveillance. No inclusion or exclusion criteria of patients were used. All individuals accepted voluntarily and signed the informed consent. H. pylori was cultured from gastric biopsy samples of the antrum and body. Samples were stored in brain–heart infusion (BHI) broth supplemented with 20% glycerol and antibiotics (vancomycin 10 mg/l, polymyxin B 0.33 mg/l, bacitracin 1.07 mg/l, and amphotericin B 5 mg/l). The biopsy specimens were mixed with sterile saline solution and macerated with a homogenizer (Deltaware Pellet Pestle). A total of 100 µl of each mashed solution was plated onto tryptic soy agar (TSA; Oxoid or Merck), supplemented with sheep blood (7%), isovitalex (0.5%), and the same antibiotics/concentrations were used for the BHI broth medium during transport and were incubated under microaerophilic conditions (5% O2, 10% CO2 and 85% N2) at 37 °C for 5 – 14 days. Colonies were confirmed with Gram staining, biochemical tests (positive urease, catalase and oxidase test), and ureC-PCR.

Antimicrobial susceptibility testing

H. pylori isolates obtained from the primary culture were subcultured on non-selective TSA (Oxoid or Merck) with 7% sheep blood, 0.5% isovitalex (BBL) under microaerophilic conditions at 37 °C for 48 h. An E-test (AB BIODISK North American Inc., Piscataway, New Jersey, USA) for metronidazole, clarithromycin, amoxicillin, levofloxacin, rifampicin, and tetracycline was carried out. Suspensions from pure 48-h subcultures were prepared in Brucella broth supplemented with 0.5% isovitalex and inoculum turbidity was adjusted to McFarland 3.0 standard. Thereafter, they were inoculated onto TSA plates supplemented with sheep blood (7%), isovitalex (0.5%) and without antibiotics. E-test strips were placed and incubated under microaerophilic conditions at 37 °C for 72 h. We measured the minimum inhibitory concentration (MICs) and used H. pylori strain ATCC 43504 as a control. MIC for clarithromycin was interpreted based on Clinical & Laboratory Standards Institute (CLSI) breakpoint (> 1.0 mg/l resistant) [24]. We used EUCAST breakpoints for the antibiotics amoxicillin (> 0.125 mg/l), levofloxacin (> 1 mg/l), tetracycline (> 1 mg/l), rifampin (> 1 mg/l), and metronidazole (> 8 mg/l) [25].

DNA extraction and whole genome sequencing

Genomic DNA was extracted from 48-h-old confluent cells obtained from pure cultures using QIAamp DNA Mini and Blood Mini kit following the manufacturer’s guidelines (QIAGEN, Hilden, Germany). A Qubit fluorometer (Invitrogen, Grand Island, New York, USA) was used to measure DNA concentration. Sequencing was carried out at SeqCoast Genomics (Portsmouth, New Hampshire, USA). Briefly, DNA libraries were prepared for whole genome sequencing using the Illumina DNA Prep tagmentation kit and unique dual indexes. DNA samples were sequenced as multiplexed libraries on the Illumina NextSeq 2000 platform operated per the manufacturer’s instructions. Sequencing was done using a 300-cycle flow cell kit to produce 2 × 150 bp paired reads. A 1–2% PhiX control was spiked into the run to support optimal base calling. Read demultiplexing, read trimming, and run analytics were performed using DRAGEN v.3.10.12, an on-board analysis software on the NextSeq2000. Quality of raw sequence data was assessed using FastQC to ascertain that > 85% of bases were higher than the Q30 Phred quality score.

Assembly, quality check, and annotation of draft genomes

Short reads generated by Illumina sequencing were assembled de novo into contiguous sequences using shovill v1.1.0 (https://github.com/tseemann/shovill). Adapter sequences were trimmed by enabling the -trim flag on shovill. Shovill also implements methods for subsampling read depth down to 150X, trimming adapters, correcting sequencing errors, and assembling using SPAdes v.3.14.1 [26]. Sequence quality of draft assemblies was assessed using QUAST v.5.0.2 [27] and CheckM v.1.1.3 [28] (Supplementary Table S1). Genomes with < 90% completeness and > 5% contamination, as recommended by CheckM [28], were excluded from downstream analyses. In all, five genomes from Pereira passed the sequence quality criteria (Supplementary Table S1).

The mash sketch function (https://mash.readthedocs.io/en/latest/) on mash v.1.1 [29] was used on all available reference H. pylori genomes in the National Center for Biotechnology Information (NCBI) database (https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Helicobacter_pylori/) to retrieve the closest matching reference genome. The results of the mash distance calculation showed that reference genome strain MT5135 (NZ_CP071982.1) was the most closely related to our five Pereira genomes. Using fastANI v.1.32 [30], we compared our genomes to reference MT5135 (NZ_CP071982.1) by calculating the genome-wide Average Nucleotide Identity (ANI) values. An ANI threshold of ≥ 95% was used to confirm that the five genomes belong to the same species [30]. Genomes were also confirmed on the type strain genome server of the Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures GmbH (https://tygs.dsmz.de). The draft assemblies were annotated using Prokka v.1.14.6 [31] and Bakta v.1.4.0 [32].

In silico sequence typing and identification of mobile genetic elements and genes encoding antimicrobial resistance, virulence, restriction enzymes, transposases, and phages

Sequence types (ST) were determined using allelic variation in seven housekeeping genes (atpA, efp, mutY, ppa, trpC, ureI, yphC) [33] and compared to the multi-locus sequence typing (MLST) database for H. pylori in pubMLST (https://pubmlst.org/organisms/helicobacter-pylori) [34]. This was done by scanning the contigs of draft assemblies using mlst v.2.19.0 (https://github.com/tseemann/mlst). Genomes with novel combination of the seven multi-locus alleles were submitted to the H. pylori pubMLST database for curation and ST assignment [34].

We determined the presence, types, and abundance of specific genetic elements in the H. pylori genomes. Sequences of genes involved in restriction-modification (RM) were retrieved from the REBASE database (http://rebase.neb.com/rebase/) [35] and used as a database in ABRicate v.1.0.1 (https://github.com/tseemann/abricate) to screen our draft assemblies for the presence of RM genes. To determine the presence of antimicrobial resistance genes in our genomes, two tools were used – AMRFinder-Plus v.3.10.23 implemented in NCBI and its accompanying NCBI-compiled antimicrobial resistance database [36] and ABRicate v.1.0.1 tool together with the Comprehensive Antimicrobial Resistance Database [37]. The Virulence Factor Database [38] implemented within ABRicate v.1.0.1 was used to screen the genome assemblies for virulence-associated genes. Next, the draft genomes were screened for the presence of EPIYA (Glu-Pro-Ile-Tyr-Ala) motifs in the Bakta [32]-translated sequences of the CagA toxin. The sequences were extracted using samtools v.1.15.1 [39]. We aligned the amino acid sequences using CLUSTALW [40] implemented in MEGA11 [41]. The alignments were inspected for the presence of the EPIYA motifs EPIYA-A (EPIYAKVNKKK(A/T)GQ), EPIYA-B (EPIY(A/T)QVAKKVNAKI), EPIYA-C (EPIYATIDDLGGP), and EPIYA-D (EPIYATIDFDEANQAG) [42]. The presence of transposases in our genomes was determined using ISEScan v.1.7.2.3 [43]. ISEscan contains a built-in HMMER v.3.3.2 [44], which implements a probabilistic profile hidden Markov models (profile HMMs) to search a database of transposase profiles against our translated draft genomes. We implemented the ISEscan default settings of < 1e-5 and ≥ 60% for e-value and sequence coverage, respectively. This threshold is often utilized in IS detection (for examples, [45, 46]). Lastly, our genomes were examined for the presence of phage elements using VirSorter2 [47]. The length of individual phage DNA detected in each genome was summed to give the total length of all phage DNA segments in each genome.

All publicly available H. pylori genomes derived from human sources in Colombia were retrieved from the EnteroBase H. pylori database (available as of June 2024). Recent genomes submitted to the pubMLST database (https://pubmlst.org) from 2019 onwards (available as of June 2024) were also included. Altogether, the Colombian dataset included the five Pereira genomes and 259 genomes from six other regions sampled between 2000 – 2022. We compared the five Pereira genomes that we sequenced with genomes from other parts of Colombia for the presence of the genetic elements described above.

Phylogenetic tree reconstruction

To provide a broader phylogenetic and geographical context for the Pereira genomes, all the publicly available H. pylori genome assemblies derived from human sources worldwide and associated metadata from the Enterobase H. pylori database (n = 3,285 genomes; downloaded in June 2024) [48] and pubMLST as described earlier, were used to create a phylogeny. These genomes were assembled and annotated using the same methods as those used for the five Pereira genomes sequenced in this study. Using the combined 3,285 downloaded genomes and the five Pereira genomes, split k-mer distances was calculated using Split-Kmer Analysis (SKA) (https://github.com/simonrharris/SKA). Split k-mers refers to any two k-mers in a sequence separated by one or more bases. The use of SKA in epidemiological purposes have been previously validated and benefits from speed and less computational requirements [49, 50]. The reference-free alignment generated from the split k-mer files was used as input in IQ-TREE v.2.1.4 [51] to build a maximum likelihood phylogeny, implementing the general time reversible (GTR; [52]) + Gamma model of nucleotide substitution model. Branch support was assessed using 1,000 bootstrap replicates implemented using the built-in ultrabootstrap -UFBoot software [53]. The phylogenetic tree was visualized and rooted at the midpoint using Figtree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) and annotated using the Interactive Tree of Life (iTOL) v6.8.1 [54]. The SKA alignment and phylogenetic tree were used as input to partition the genomes into sequence clusters of genetically similar individuals using the Bayesian hierarchical clustering algorithm fastBAPS v.1.0.8 [55]. FastBAPS implements a rapid initial agglomerative clustering of the aligned sequences based on pairwise a single nucleotide polymorphism distance matrix, and followed by Bayesian model selection to decide which clusters should be merged at each step of hierarchical clustering [55].

Statistical analysis

All statistical tests was carried out using the stat_compare_means option in ggpubr v.0.4.0 package in RStudio v.2022.02.1 + 461 [56]. We used the Wilcoxon signed rank test to compare the number of virulence genes, genes encoding restriction-modification enzymes, and relative phage genome sizes in H. pylori isolated from different regions in Colombia. Results were considered significant when p < 0.05.

Results

Antimicrobial susceptibility profiles, genome characteristics, and sequence types of Colombian H. pylori

Five H. pylori isolates were obtained from unique patients diagnosed with varying levels of gastritis in a specialized center in Pereira, Colombia in 2015. Three of the isolates were recovered from patients with chronic gastritis (RPe_4, RPe_33, RPe_46), one isolate from a patient with antral erosive gastritis (RPe_43), and one isolate from a patient with superficial gastritis (RPe_23) (Supplementary Table S1). CLSI and EuCAST breakpoint interpretation of MIC from antimicrobial susceptibility tests revealed the isolates to be resistant to at least one of the five antimicrobials tested (Fig. 1a and Supplementary Table S2). Four isolates were resistant to metronidazole, whereas two isolates each were resistant to clarithromycin and levofloxacin, respectively. Rifampin resistance was observed in only one isolate, RPe_43. In contrast, all isolates were susceptible to tetracycline and amoxicillin.

Fig. 1
figure 1

Antimicrobial susceptibility profile and genomic features of the five gastritis isolates in this study. a Results of the antimicrobial susceptibility tests against six antimicrobial agents. Acronyms of antimicrobial agents: MTZ – Metronidazole; CLA – Clarithromycin; LEV – Levofloxacin; TET – Tetracycline; AMO – Amoxicillin; RIF – Rifampin. b Number and types of genes associated with Types I and II Restriction Modification systems. c Types of Insertion Sequence (IS) families and IS clusters

Genome assemblies of the five isolates yielded a range of 30 – 232 contigs per genome, largest contig per genome ranging from 113,302 – 250,354 bp, total genome length of 1,552,597 – 1,789,322 bp, N50 ranging from 27,500 – 162,261 bp, and number of protein-coding genes per genome ranging from 1,397 – 1,496 (Supplementary Table S1). We sought to determine if these five isolates were related to known H. pylori clonal lineages based on the allele configuration of their seven housekeeping genes [33]. MLST results showed that all five genomes represented novel STs and were subsequently assigned ST designations on the PubMLST H. pylori database [34]. These novel STs were ST4353 (RPe_4), ST4354 (RPe_23), ST4355 (RPe_33 and RPe_46), and ST4356 (RPe_43) (Fig. 1a and Supplementary Table S2).

To place the five Pereira genomes in the broader Colombian population, the Colombian dataset was broadened to include all publicly available H. pylori genome sequences from Colombia derived from human samples (n = 259 genomes) for a total of 264 genomes. A total of 163 known STs was detected in this larger Colombian population (Supplementary Table S3). The most prevalent STs were ST3809 (10 genomes), ST415 (8 genomes), and ST3806, ST3807, ST2811, ST3816, and ST3945 having four genomes each. Two STs were represented by three genomes each, 47 STs were represented by two genomes each, and 106 STs were represented by one genome each. Twenty genomes had unknown ST information.

A search for known genetic mechanisms of antimicrobial resistance present in the five Pereira genomes revealed the presence of the chromosomally located gene hp1181 that encodes the efflux pump, in all five genomes. However, its presence does not interpret to an antibiotic resistant phenotype in H. pylori [57]. This gene was also detected in all other genomes from Colombia (n = 259 genomes; Supplementary Table S4). A small fraction of genomes carried a quinolone resistance gene qnrB5 in four genomes (two from Bogota and two from unknown location) and beta-lactam resistance gene blaTEM-181 in seven genomes from Nariño.

Distribution of virulence determinants and restriction modification systems

We examined the five genomes we sequenced for the presence of virulence-related genetic determinants. A total of 120 virulence-related genetic factors were detected in our genomes (range: 88 – 117 genes), of which 85 are present in all five genomes (Supplementary Table S5). Among the major virulence factors detected were those associated with adherence, effector delivery system (type IV secretion system [T4SS]), motility (flagella), exotoxin (vacA), immune modulation (lipopolysaccharide [LPS], outer inflammatory protein, Lewis antigen), and stress survival (urease). Eleven LPS-associated genes, which functions in innate immune response and evasion [58], were detected in all five genomes. These include HP_RS07005, gluE, gluP, kdtB, lpxB, wbcJ, wbpB, rfaC, rfaJ, rfbD, and rfbM.

Two isolates RPe_4 (ST4353) and RPe_43 (ST4356) carried similar virulence-related genes. Both harbored the cytotoxin associated gene A (cagA) which is implicated in gastric carcinoma in humans [59]. The gene product CagA is introduced into the gastric epithelial cells through the T4SS carried on a pathogenicity island [60]. The T4SS genes were present in only these two genomes. The activity of the CagA toxin is regulated by tyrosine phosphorylation of the C-terminal EPIYA (Glu-Prp-Ile-Tyr-Ala) motifs inside host cells [61, 62]. There is considerable variation in the EPIYA motifs and such variation is known to determine the clinical outcome of the disease [61,62,63]. The two genomes RPe_4 (ST4353) and RPe_43 (ST4356) harbored the variants EPIYA-A, EPIYA-B and EPIYA-C (Supplementary Figure S1), which is consistent with EPIYA patterns reported in H. pylori in Europe, Americas, and Australia [61]. Importantly, there were no repeats of the EPIYA-C in genomes from this study. The presence of fewer EPIYA-C repeats is associated with reduced risk of gastric cancer [62, 63]. Both genomes also carried the vacuolating cytotoxin gene (vacA), which is responsible for the vacuolation and induction of apoptosis and disruption of cellular pathways [64]. When we compared the Pereira genomes with genomes of H. pylori from other regions in Colombia (only those represented by ≥ 5 genomes), no significant difference was observed (p > 0.05, Wilcoxon signed rank test; Supplementary Figure S2) in the number of virulence genes per genome, except for genomes between Pereira and Tumaco (range = 117 – 119 virulence genes, p = 0.0032).

We examined the five genomes for the presence of genes involved in RM systems. RM enzymes play a critical role in bacterial defense against invasion by foreign DNA such as bacteriophages and conjugative plasmids [65]. Type I and II RMs were detected in all five genomes (Fig. 1b, Supplementary Table S6 and Table S7). For the type I RMs, the presence of either four (RPe_23, RPe_33 and RPe_46) or five (RPe_4 and RPe_43) genes encoding RM enzymes per genome were identified. Type I methyltransferases (M.Hpy99XX and M.Hpy99XV) and restriction endonuclease (Hpy99XVI) were present in all five genomes (Supplementary Table S6). Type II RMs system were also abundant in our five genomes. We identified genes encoding a total of 61 RM enzymes (Supplementary Table S7). The enzymes comprised of 37, 17 and seven unique methyltransferase, restriction enzymes, and restriction enzyme/methyltransferase, respectively. The number of genes encoding RM enzymes present per genome ranged from 28 (RPe_46) to 34 (RPe_23). Among the Type II enzymes, four methyltransferases (M.HpyUM037VIII, M.Hpy99III, M.Hpy99XII, M.Hpy99XI) and six restriction endonucleases (Hpy299IX, Hpy8I, HpyCH4V, HpyGI, HpyC1I, HpyCH4I) were detected in all five genomes. The only Type III RM enzyme detected was a methyltransferase (M.Hpy99XXI) carried in RPe_43 genome (Supplementary Table S8). No Type IV RM gene was detected in any of the five genomes. When genomes from Pereira were compared with genomes of isolates from other regions in Colombia (only those represented by ≥ 5 genomes), no significant difference was observed in the number of type I and type II RM genes per genome (p > 0.05, Wilcoxon signed rank test; Supplementary Figure S3, Tables S7 and S8). Type III RMs were detected in only a small fraction (16.21%; 42/259 genomes) of genomes from other regions in Colombia (Supplementary Table S8).

Mobile genetic elements in Colombian H. pylori genomes

We sought to characterize the presence of different mobile genetic elements (MGE) that may contribute to DNA mobility and genome plasticity in H. pylori. A total of eight (range = 1 – 3 per genome) full and linear phage DNA were detected across the five genomes (Supplementary Table S9). The highest number of phage DNA detected was three and two in RPe_43 and RPe_23, respectively, whereas one phage DNA was detected in each of the other three genomes. All phage DNA were single stranded and linear, with the exception of two double stranded DNA segments detected in the RPe_43 genome. The total sizes of phage DNA as a proportion of the genome ranged from 0.1% (1,585 bp in RPe_33) to 0.6% (9,884 bp in RPe_43). The total length of phage DNA per genome from Pereira versus other regions in Colombia (represented by ≥ 5 genomes harboring phage DNA) showed no significant difference (p > 0.05, Wilcoxon signed rank test; Supplementary Figure S4).

Next, we search for the presence of insertion sequences (IS), which are transposable elements that encode transposases that catalyze intra- or inter-genome mobility [43]. The five genomes from this study harbored IS elements belonging to IS21 (IS21_259 IS cluster) and IS200/IS604 (IS200/IS604_96 and IS200/IS604_384 cluster) families (Fig. 1c and Supplementary Table S10). IS elements belonging to IS1595 (IS1595 cluster) family were present only in genomes RPe_4 and RPe_43. No plasmid replicon was detected in any of the five H. pylori genomes sequenced in this study.

Phylogenetic relationship of Colombian H. pylori with the global population

To place our five Pereira genomes in the broader geographical and phylogenetic context, we determined their evolutionary relationship with 3,285 publicly available genomes sampled from 60 countries (Fig. 2a,b and Supplementary Table S3). Majority of the genomes were collected from China (n = 629), Germany (n = 500), Australia (n = 328) and Colombia (n = 264, including the five genomes from this study), representing 52.3% of the entire dataset (Fig. 2a). Genomes from this global dataset were derived from a sampling period between 1983 to 2022.

Fig. 2
figure 2

Population structure and phylogenetic relationships of the global H. pylori population. a Maximum likelihood phylogenetic tree built using split k-mer distances. The dataset includes the five Pereira genomes sequenced in this study (marked by red stars) and 3,285 publicly available genomes. The tree is rooted at the midpoint. The scale bar represents the number of nucleotide substitutions per site. The branches are shaded to delineate the five dominant sequence clusters determined by fastBAPS. Outer rings (innermost to outermost) show the continent, country of origin, and year of sampling. b Continent and country-wide distribution of H. pylori genomes belonging to fastBAPS sequence clusters. Bar charts show the number of genomes of the five dominant sequence clusters. For clarity, only countries with ≥ 10 genomes are shown. Sequence clusters and countries with few genomes were grouped together in the category “Others”

Analysis of MLST profiles showed 643 unique STs in 1,116 genomes, whereas no ST information was available for 66% (n = 2,174) of the genomes. We used Bayesian clustering algorithm implemented in fastBAPS [55] to delineate genomes into genetically similar phylogenetic sequence clusters. A total of 25 sequence clusters were observed in the global dataset. The most dominant were sequence clusters 5 (n = 1,156 genomes), 18 (n = 1,134 genomes), 2 (n = 252 genomes), 11 (n = 142 genomes), and 6 (n = 113 genomes), which altogether represented 85% of the global dataset (Fig. 2a). The five Pereira genomes sequenced in this study were found in different locations on the tree, but all are part of sequence cluster 18. The Pereira genomes were closely related to other genomes from Europe, North America, and South America, including the genomes from other parts of Colombia. The different sequence clusters appear to be differentially distributed across continents (Fig. 2b). For example, sequence cluster 18 was common in Africa (n = 111/150 genomes, 74%), Europe (n = 226/880 genomes, 25.68%), North America (n = 362/442 genomes, 81.90%), and South America (n = 310/345 genomes, 89.85%). In contrast, sequence cluster 5 was common in Asia (n = 963/1092 genomes, 88.18%) and sequence cluster 11 in Oceania (n = 123/337 genomes, 36.49%). Overall, the five Pereira genomes sequenced in this study exhibited close genetic similarity with other genomes from Colombia as well as other members of sequence cluster 18.

Discussion

H. pylori infection is particularly concerning in Colombia, where > 80% of the population is estimated to be infected with H. pylori [66] and virulent genotypes reach proportions of > 90% [67]. High levels of H. pylori contamination have also been reported in water from the Bogotá River and domestic wastewater treatment plants, which also pose grave risk to communities who re-use the water [68]. Age-standardized incidence and mortality rates of stomach cancer in Colombia is among the highest in Central and South American countries (25.3 and 17.8 per 100,000 people, respectively) [69]. H. pylori has been implicated in the progression of gastric precancerous lesions [70, 71]. It is also associated with gastric cancer and dysplasia in Colombian patients [72]. Continuous and vigilant monitoring of H. pylori genotypes is therefore critical to reduce the public health and clinical burden of H. pylori infection in the country.

Here, we report the antimicrobial susceptibility profiles and genome content of five isolates from patients diagnosed with varying severity of gastritis in Pereira, Colombia. The five isolates represent four novel STs. Our analysis augments current knowledge about the circulating genotypes in the country. Although our small Pereira dataset does not reflect country-wide prevalence of antimicrobial resistance phenotype, it presents valuable data that can be incorporated into current resistance surveillance and drug therapy efforts within the city of Pereira. Comparison of genomic features such as the frequency of virulence genes, RM enzymes, and phage DNA between our Pereira dataset and other regions in Colombia revealed no significant differences, indicating a lack of geographical structure in the gene pool of human-derived H. pylori across the country. These results show that only a few clonal lineages and specific clinically relevant genes are circulating in different parts of Colombia, which may reflect uniformity in selective pressures throughout the country. A more in-depth investigation of social attributes and antimicrobial usage within Colombia will clarify the factors contributing to the population genomic structure of H. pylori in the country.

A more extensive sampling and comparative genomic analysis of H. pylori isolates from different gastritis types will be particularly informative in elucidating the bacterial genetic basis of gastritis. Variants of the cytotoxin genes cagA and vacA have been previously proposed as risk markers in patients with premalignant gastric lesions [64, 73, 74]. Clinical manifestations of gastritis can range from mild cases to severe presentations associated with significant morbidity [22], and sequence variation in cytotoxin and other virulence genes may potentially be used to differentiate and predict gastritis types. Our findings of the presence of the Western type cagA EPIYA-ABC are consistent with previous reports [42, 61]. However, fewer EPIYA-C repeats were observed, and which has been previously reported to indicate reduced oncogenic risks [63]. Accurate diagnosis of oncogenic risks and implementation of appropriate treatment regimens that will be most effective for distinct clinical presentations is therefore necessary. Our results also provide an impetus for population-level genome surveillance efforts of gastritis-associated H. pylori in Colombia.

H. pylori utilizes a large arsenal of colonization and virulence factors to mediate the interaction with the human gastric niche and facilitate the onset of disease. Our results are consistent with previous reports on the high genetic diversity and virulence determinants of clinical H. pylori from other South American countries [75,76,77,78]. Genes encoding antimicrobial resistance genes, RM systems, and virulence, including those that encode cytotoxin genes were identified. The observed prevalence of the efflux pump gene hp1181 in Colombian H. pylori is consistent with the results of a recent study of H. pylori in other locations [79]. The gene expression of hp1181, and possibly post-transcriptional regulation, has been reported to be associated with the resistant phenotype in multidrug resistant H. pylori [57]. Carriage of key virulence factors can affect the severity of gastroduodenal diseases and damage to the gastric epithelium by H. pylori [64]. The bacterial adoption of one or more mechanisms associated with adhesion, invasion, and toxin production are required to produce disease in the host [1]. In our study, cagA and vacA were detected in two strains resistant to metronidazole and clarithromycin or rifampin. The convergence of antimicrobial resistance phenotypes with the toxin determinants related to gastric lesions is troubling and these strains need to be closely monitored over the long term.

In addition to their pathogenicity attributes, H. pylori possess mechanisms to prevent adulteration of their DNA through the presence of RM systems. These unique defense mechanisms consist of restriction endonucleases, which functions to recognize and cleave DNA, and methyltransferase, which methylates DNA recognized by restriction endonucleases [80, 81]. Depending on co-factor requirements, the structure of recognition sequences, organization, and sequence specificity, RM systems may be classified into types I – IV [80, 81]. Our findings that the five genomes carry few mobile genetic elements was not surprising. This is because the presence of RMs acts as a barrier to the introduction of extrachromosomal DNA in H. pylori [65, 82]. Our results are consistent with previous studies of H. pylori showing that the majority of RMs in H. pylori are type II [83,84,85]. Furthermore, our results agree with previous reports that show strain-specific patterns in the number and variety of RMs in H. pylori [84, 85], as no two strains in our study possessed the exact complement of RM genes in their genomes. This strain-specific reservoir of RM genes phenomenon is suspected to play a role in the virulence of H. pylori [65, 80]. Future work is needed to further explore the under-appreciated contributions of RMs and whether the strain-specific RM distribution may be associated with pathogenicity and clinical presentations, which will be beneficial in developing novel ways to treat specific H. pylori infections.

Whole genome sequencing of clinical isolates from poorly represented countries such as Colombia is integral to understanding the global population structure, mechanisms of infection and pathogenicity, disease burden, and outbreak detection of the opportunistic H. pylori. Our results on the phylogeny of Colombian and global H. pylori genomes are consistent with results from previous studies that show a close relationship with other countries as well as the evolution of independent clades within Colombia [86,87,88]. Our study offers valuable insights on the similarities of Pereira genomes and those from other regions in Colombia, which to our knowledge has not been reported before. Hence, our study should be considered as an important baseline census of the standing genomic and lineage diversity of H. pylori in different parts of the country. Our findings will inform current epidemiological efforts and spur countrywide efforts for a more systematic sampling scheme of H. pylori from a variety of clinical presentations, carriage, therapeutic interventions, hospitals, and environments in Colombia.

Conclusions

We present the draft genome sequences of five H. pylori isolates from patients diagnosed with gastritis in Pereira, Colombia. These genomes represent four novel STs. Their genome sequences reveal a variety of determinants of antimicrobial resistance and virulence, which can be mobilized by phages and IS elements. In the context of the Colombian and global H. pylori populations, the five isolates from Pereira were closely related to other lineages circulating in other parts of Colombia.