Introduction

Establishment of the gastrointestinal (GI) microbiome in infants is widely recognized as a fundamental developmental process with lasting health effects1,2. Microbial colonization of the GI tract begins during the birthing process, and then passes through multiple stages3,4 before reaching a stable adult-like state around three years of age5. Each developmental stage represents a huge reservoir of microbial genes with potential functions in symbiotic as well antagonistic host interactions, including genes involved in antibiotic drug resistance, virulence, and the transfer of genetic material6.

The spread of antibiotic resistance is one of the major challenges in global public health, with infections caused by antibiotic resistant bacteria becoming a leading cause of death worldwide7. Antimicrobial substances, and their corresponding resistance genes, are of ancient origin8,9, with antibiotic resistance genes (ARGs) found in pristine and isolated environments like permafrost10, secluded caves11, and the remote Antarctic12. ARGs have even been detected in feces of individuals from isolated amazon tribes13, and in the stomach contents of a 5000-year-old mummified Tyrolean14. These examples demonstrate that antimicrobial compounds, and the resistance mechanisms to defeat them, are an integral part of microbial evolutionary history.

It should come as no surprise then, that the human GI microbiome, one of the most densely populated microbial ecosystems known to science, acts as a reservoir for antibiotic resistance genes (ARGs), the so-called human GI resistome. Compared with human adults, the infant GI microbiome tends to have high ARG carriage15,16. Furthermore, the infant microbiome often has high abundances of pathobionts, microbes that are part of the normal gut community but with a relatively high virulence potential, such as Escherichia coli, Klebsiella pneumoniae, and Staphylococcus aureus. These species are of highest concern for ARG surveillance according to the World Health Organization’s Global Antimicrobial Resistance and Use Surveillance System (GLASS) 2022 report17.

In addition to acting as an ARG reservoir, the density of the GI microbial community can promote the spread of ARGs to pathogenic bacteria through mechanisms of horizontal gene transfer (HGT). There is evidence of extensive intra-individual HGT in the human GI microbiome, especially in industrialized populations18. Similar to ARGs, mobile genetic elements (MGEs) are also enriched in the infant GI microbiome15,19, and ARGs associated with MGEs have a heightened risk of transfer to pathogens20. However, there seems to be a bias in transfer of ARGs depending on the resistance mechanism21.

Here, we present a detailed map of the GI resistome over the first year of life in the BabyBiome cohort of born-to-term infants delivered in the Oslo area of Norway. As we have previously reported, community dynamics in the infant GI microbiome are both complex and individual3. In our previous study, fecal samples were collected from 12 infants on a near daily basis over the first year of life (2684 samples in total) and analyzed using 16S rRNA gene amplicon sequencing for characterizing taxonomic shifts throughout the colonization period. For the present study, 149 samples were selected in order to span the first year of life at monthly intervals and subjected to deep metagenomic sequencing. The resulting metagenomes were analyzed for ARG content against the Comprehensive Antibiotic Resistance Database (CARD) by both assembly-free mapping and metagenome assembled genome (MAG) reconstruction and annotation. In addition to the extensive catalogue of ARGs that we observed in the infant microbiomes, we uncovered longitudinal patterns, such as significant decreases over time for most major classes of antibiotic resistance. Furthermore, our dense longitudinal sampling scheme allowed us to assign ARGs to specific taxa by correlating contemporaneous ARG abundances and population dynamics. Our results demonstrate that, in general, ARGs tend to track specific bacterial lineages. We also describe the dynamics of known virulence factors (VFs) and show that these genes display highly similar dynamics and taxonomic associations as ARGs. Finally, we report genetic linkages between ARGs and MGEs, demonstrating temporal trends and preferential associations between specific MGE and ARG categories.

Methods

Sample processing and sequencing

Sample collection, storage and processing were described in de Muinck and Trosvik3. Briefly, 12 infants were sampled at a near-daily basis over the first year of life, and designated as ID1-12, with ID4 and 5 being siblings and ID10 and 11 heterozygous twins. 149 samples were selected for deep shotgun sequencing in order to reflect monthly intervals over the first year of life for each of 12 infants (Fig. S1). The criteria for choosing samples for shotgun sequencing were as follows: For each individual we took the day of the first and the last samples and made a vector sequence by increments of 30 spanning that period (target sampling interval). Then we found the actual samples that were closest in time to the target days. If the final day of the selected sample was more than 10 days removed from the actual final day of sampling, the final day of sampling was added to the sequence. In instances where available sample material from a preferred day was deemed insufficient, the sample closest in time to that one was selected instead. Shotgun sequencing libraries were prepared with the Kapa Hyper library kit (Roche) and sequenced on an Illumina NovaSeq 6000 instrument using a full S4 flow cell and 150 PE mode.

Sequence data processing

Raw sequence reads were trimmed and filtered using Trimmomatic v0.3922. Reads with a host origin were identified by alignment to the human genome sequence CRCh38.p13 using Bowtie2 v2.4.223 and removed using SAMTools v1.1124.

Read-based taxonomic assignment was done using MetaPhlAN 325. Assembly-free analysis of ARG content in metagenomes was done using ShortBRED v0.9.426 and the Comprehensive Antibiotic Resistance Database (CARD) v3.2.6, specifically the protein homolog model within that database27. The total mapping rate to CARD was computed using the ‘flagstat’ function in SAMtools v1.1124.

Reads were assembled into contigs using MEGAHIT v1.2.928. Reads were mapped back to contigs using BWA v0.7.1729 and the resulting SAM files were sorted by coordinate and converted to binary format (BAM) using SAMtools. Contigs were binned using CONCOCT v1.1.030. Bins were inspected for completeness, contamination and strain heterogeneity using CheckM v1.1.331, which was also used to compute relative MAG coverage. MAGs with less than 50% estimated completeness were discarded. So were MAGs with estimated contamination higher than 10%, unless they also showed a strain heterogeneity of at least 50%, indicating that the MAGs were composite of two or more closely related strains. Taxonomic classification of MAGs was done with GTDB-Tk v1.732. MAG ARG annotation was carried out using rgi v6.0.1 with the full CARD database27 and the –low_quality setting in order to capture partial gene sequences on the edges of contigs.

For identification of virulence factors we used BWA v0.7.1729 to align sequence reads to the virulence factor database core DNA sequence set (VFDB; downloaded August 8 2023)33. Prior to alignment redundancy was removed from the database by sequence clustering using the cluster_fast function in VSEARCH v2.22.134 with the –centroids argument and –id set to 0.99. Alignment coverage was computed the ‘coverage’ function in SAMtools, and VF markers with a mean coverage of > 90% were considered hits. Mean coverage values were normalized by dividing by library sizes.

In order to look for potential linkages between ARGs and mobile genetic elements (MGEs), all final contigs were annotated with CARD using rgi v6.0.1. Resulting ORFs found by the rgi Prodigal module35 were further annotated for MGEs with the MobileOG database v1.636 using DIAMOND v11.3.037. CARD and MobileOG annotations with at least 80% fragment coverage and identity were considered hits. MGE annotations were classified using the MobileOG script getElementClassifications.R into the categories; Plasmid, Phage, Integrative Element (IGE), Conjugative Element (CE) and Insertion Sequence (IS). Coverage on contigs was computed using the ‘coverage’ function in SAMtools, and mean coverage values were normalized by dividing by library sizes.

Statistical analysis

All statistical analysis were carried out with R38. PERMANOVA and non-metric multidimensional scaling (NMDS) were carried out with the functions ‘adonis2’ and ‘metaMDS’ in the vegan 2.5.6 package. GAM modelling was done with the ‘gam’ function in the mgcv package39. Correlation tests were done with the ‘cor.test’ function in the stats package. Benjamini–Hochberg false discovery rate correction was done with the ‘p.adjust’ function with method set to ‘BH’. Random forest models were computed with the ‘randomForest’ function in the randomForest v4.7-1.1 package, using 501 trees, while model significance was computed with the ‘rf.significance’ function in the rfUtilities v2.1-5package with 999 permutations. PHATE dimensionality reduction was done with ‘phate’ function in the phateR package v1.0.740, with the gamma parameter set to 0. For testing for time trends in the data, we use the Spearman correlation test in order to be conservative and not make assumptions about the underlying data distribution, which is poorly defined. For matching pairs of time series, we used Pearson’s correlations for increased sensitivity.

Ethics approval and consent to participate

This study was carried out in accordance with relevant guidelines and regulations. All methods and protocols were approved by the Regional Ethics Committee of South-East Norway, REK application 2014/656 ID 24485. Informed consent was obtained from the legal guardians of all the participants in the study.

Results

Core/pan resistome and main ARG structuring factors

Samples from the twelve infants in the cohort were selected in order to represent the temporal distribution of microbiome taxonomic profiles during the first year of life (Fig. S1). Shotgun sequencing resulted in an average number of DNA sequence reads per sample of 121,121,684 (± 32,477,513 s.d.). By and large, there was good correspondence between the taxonomic classification obtained from 16S rRNA amplicon sequencing3 and shotgun sequencing, with a total mean Pearson’s correlation of 0.79 for the four main phyla (Bacteroidetes, Firmicutes, Proteobacteria and Actinobacteria). There was, however, some substantial inconsistencies depending on both the taxon and the individual, (Table S1), and there was a tendency for amplicon sequencing to be biased towards Firmicutes and Proteobacteria and against Bacteroidetes and Actinobacteria (Table S2). A total of 1682 ARGs were identified using the read-based approach with ShortBRED, with a mean carriage of 140 ± 27 ARGs in the 12 infants, and a mean percentage of 0.27 (range 0.06–1.30; Table S3) reads mapped to CARD. Of the 260 uniquely identified ARGs, 47 were found in all 12 infants, and 47 were unique to a single child (Fig. 1a, Table S4). The core resistome was dominated by efflux systems, particularly systems associated with the TolC outer membrane channel. For example, the AcrAB, EmrAB and mdtABC efflux complexes all depend on TolC for function and are typically associated with E. coli. On the other hand, ARGs unique to a single individual functioned predominantly through drug inactivation. Both in terms of the number of ARGs identified and the overall mapping rate, efflux systems were the most prevalent type of resistance mechanism (Fig. 1b and c). However, while the mapping rate was five times higher for efflux systems relative to the second most prevalent category (drug inactivation) the corresponding ratio for the total number of unique ARGs was only 1.2, demonstrating that many ARGs are associated with non-dominant taxa.

Figure 1
figure 1

The core resistome is dominated by efflux systems and shows an overall time trend over the first year of life. (a) Pan and core resistome. The colored stacked bar segments represent the number of genes in each mechanistic category. Five genes (out of 260) had a dual mechanistic classification, and those genes were counted to both categories (e.g. both efflux and permeability), so that the height of a bar is not in all cases an entirely accurate reflection of the actual gene count. In category 1 the number of genes is actually 47 but comes out as 49 in the chart. In categories 9 and 10 the actual numbers are 22 and 18, but come out as 23 and 19, respectively. The size of the core genome is 47, but comes out as 48. (b) Efflux mechanisms ranked by mapping rate. (c) Efflux mechanisms ranked by the number of genes classified to each category. (d) NMDS model of ARG profiles in all samples. Dots are color coded according to day of sampling after birth, indicated on the colored bar at the bottom left, with dark blue indicating samples collected close to the day of birth and dark red samples collected close to the first birthday. Figure S2 shows the same plot with the dots color coded to indicate the identity of each infant.

The most important structuring factor was the individual (R2 = 0.21, p < 0.001, PERMANOVA, Fig. S2) when looking at ARG carriage in the full cohort. However, there was also a highly significant overall temporal progression in ARG composition, both when looking at the data set as a whole (R2 = 0.05, p < 0.001, Fig. 1d), and when looking at each of the twelve infants individually (p < 0.01 for all tests). These results were also corroborated with random forest models. A regression model with sampling day since birth as response variable resulted in an R2 of 0.66 (p < 0.001), while a classification model with child identity as the response variable resulted in a classification success rate of 80.5% (p < 001). Interestingly, the classification model was unsuccessful in discriminating the individuals comprising the twin pair, ID10 and 11, with a combined error rate of 54%. As described in3, ID10 underwent an 11-day period of treatment with multiple antibiotics from the age of 22 to 33 days, due to a Streptococcus infection. As shown in that study, this did not have major lasting impacts on the taxonomic composition of the gut microbiome or subsequent colonization by Streptococcus spp. Here, we found that, while the mapping rate to CARD was slightly elevated in ID10 relative to ID11, the difference was not statistically significant (p = 0.35, Wilcoxon rank sum test excluding samples taken before antibiotic treatment). Together, these results indicate uniqueness of ARG carriage amidst overall general temporal trends in gut resistome composition.

ARG carriage declines over the first year of life

The overall main drug class to which resistance was the most prevalent was beta-lactam antibiotics (Fig. 2). This was also the case at the individual level in eleven of the twelve infants (Fig. S3). Tetracycline was the predominant drug resistance class for ID2, while beta-lactam was only the fifth most prevalent drug class in this individual. When splitting beta-lactam into sub-classes (carbapenem, penam, penem, cephalosporin, cephamycin and monobactam) the most abundant resistance class in seven of the infants was fluoroquinolone. Genes for tetracycline resistance were dominant in four infants, while cephalosporine resistance was the most prevalent class in ID12 (Fig. S4). Resistance to sulphonamides and glycopeptides was rare, and not detected at all in several infants.

Figure 2
figure 2

Beta-lactam antibiotics were the most predominant drug class to which resistance was observed. The bars indicate the main classes of antibiotics ranked according to normalized abundance (RPKM = reads per kilobase of reference sequence per million sample reads). The beta-lactam class is the sum of the sub-classes carbapenem, penam, penem, cephalosporin, cephamycin and monobactam.

In addition to large temporal fluctuations, the infants showed highly individual temporal trajectories with respect to carriage of ARGs conferring resistance to various classes of antibiotics (Fig. 3). Relative abundance patterns of ARGs were quite stable (Fig. S5), but with a tendency towards expansion of tetracycline, lincosamide and streptogramin resistance genes at later time points. The overall percentage of reads mapped to the CARD database showed a highly significant linear downward trend over time (R2 = 0.22, p < 0.001, generalized additive model) (Fig. 4a). ARG diversity also showed a significant, albeit slightly non-linear, negative time trend (R2 = 0.18, p < 0.001) (Fig. 4b). Significantly reduced abundances over time were observed in ARGs representing 13 out of the 18 main antibiotic drug classes (Table 1), while resistance to the two rare classes, glycopeptides and sulphonamides, showed the opposite trend, and streptogramin resistance genes did not show any significant time trend. ARGs within three of the six main resistance mechanisms (antibiotic efflux, target alteration and reduced drug permeability) also decreased significantly in abundance over time (Table 2).

Figure 3
figure 3

The infants showed highly individual temporal trajectories with respect to carriage of ARGs conferring resistance to various classes of antibiotics. Time trajectories of ARGs in all 12 infants in the cohort. The y-axes indicate normalized ARG abundances in RPKM. Abundances are broken down into the 18 identified main drug classes. ARGs conferring resistance to multiple drugs are counted multiple times. The drug classes are ordered by total abundance across all infants, from bottom to top. The x-axis is common to all panels.

Figure 4
figure 4

The percentage of reads mapped to the CARD database and ARG diversity decreased over the first year of life. (a) Percentage of sequence reads mapping to the CARD database plotted against day after birth. (b) Diversity of ARGs over time, as measured by Shannon entropy. The trend lines represent generalized additive models, with five degrees of freedom for the regression spline in order to accommodate significant non-linearities in the predictor-response relationships. The grey shaded areas represent 95% confidence intervals. The dots are color coded according to individual provenance of samples as indicated in (a). Both models are highly significant (p < 0.001). For (a) estimated degrees of freedom (edf) is 1, indicating a linear relationship. For (b) edf = 1.9, indicating some non-linearity.

Table 1 Significantly reduced abundances over time were observed in ARGs representing 15 out of the 18 main antibiotic drug classes.
Table 2 ARGs within three of the six main resistance mechanisms decreased significantly in abundance over time.

ARG carriage correlates with Proteobacteria abundance and E. coli in particular

The infant gut microbiome is highly dynamic over the first year of life, with variability in population abundances on multiple levels of taxonomy. Leveraging our longitudinal sampling effort allowed us to evaluate contemporaneous correlations between CARD mapping rates to the population dynamics of specific taxa. The percentage of sequence reads mapped to CARD generally correlated strongly with relative abundances of Proteobacteria (mean Pearson R2 = 0.81; Fig. 5), with significant contemporaneous correlation observed in all children except for ID12 (Table 3). In this child, the CARD mapping rate did not correlate strongly with any particular phylum, and the highest correlation was with Actinobacteria (R2 = 0.57, p = 0.053). ID12 had by far the lowest average CARD mapping rate (Table S3) of the 12 children, as well as the lowest mean relative abundance of Proteobacteria (1.82%, Table S5) (Fig. 5), and excluding this infant resulted in an increased mean correlation of 0.87 between the CARD mapping rate and relative abundance of Proteobacteria.

Figure 5
figure 5

The percentage of sequence reads mapped to CARD correlated with relative abundances of Proteobacteria and E. coli in particular. Relative abundances of Proteobacteria (black lines) and E. coli (red lines) are indicated on the left side y-axes. Percentages of sequence reads (green lines) mapped to CARD are indicated on the right side axes. See Table 3 for correlations.

Table 3 The percentage of sequence reads mapped to CARD generally correlated strongly with relative abundances of Proteobacteria and E. coli in particular.

In the full cohort, there was a significant decrease in the population of Proteobacteria over time (Spearman’s rho = − 0.45, p < 0.001). E. coli was the dominant species within that phylum (Fig. 5; Table S6) in the majority of the children, and the mean contemporaneous correlation between CARD mapping rate and E. coli relative abundance was R2 = 0.69. These correlations were statistically significant for nine infants, marginally significant for ID8 and 12, and non-significant for ID6 (Table 3) where hardly any E. coli was observed (Fig. 5). In ID6, the CARD mapping efficiency tracked the dynamics of Proteobacteria, but this could be attributed mainly to the Klebsiella species K. pneumoniae (R2 = 0.99) and K. quasipneumoniae (R2 = 0.97). In ID12, unlike the other infants, we were unable to identify specific taxa with which the ARG mapping rate correlated strongly.

Although at differing degrees of prevalence, E. coli was observed in all twelve infants, mostly with some consistency (Fig. 5), and as demonstrated above this species displayed high ARG carriage. By using matching criteria of at least five non-zero observations of gene and taxon, and a cutoff at 0.97 Pearson correlation, we assigned ARGs to E. coli by matching gene dynamics with the dynamics of the E. coli populations in 11 of the 12 infants. Since E. coli was only observed in ID6 at the four final time points, at a mean relative abundance of 0.05%, we were not able to associate any ARGs with E. coli using this method for this infant. Instead, in ID6, the high contemporaneous correlation with the dynamics of Proteobacteria ARG mapping rates was found to be driven to a large degree by Klebsiella spp. (Fig. S6). For the 11 other infants, an average of 26 ARGs were linked to E. coli (range 5–41; Fig. S6), with 43 unique ARGs identified in the set (Table S7).

The majority of ARGs were annotated on Proteobacterial MAGs

In addition to using a read based approach and contemporaneous correlation to link ARGs with taxa, we also assembled reads into contigs and binned contigs into MAGs. After filtering, the mean per infant number of MAGs was 73 (range 30–97; Supplementary Data 1). Since contamination from closely related genomes could lead to the artefactual occurrence of multiple ARG copies on a MAG, each unique ARG was counted only once per genome. CARD annotation, with default rgi settings, of all MAGs in all the children combined resulted in 7150 hits. Interestingly, the distribution of percent amino acid identity of hits showed a bimodal distribution, with a roughly normally distributed group of low identity hits centered around 35–40% amino acid sequence identity (Fig. S7). When focusing on this group by only looking at hits with less than 50% identity (3736 hits) we found that 86.7% of these hits were annotated as conferring resistance to glycopeptide antibiotics. More specifically, antibiotic target-altering enzymes conferring vancomycin resistance. We suspect that these hits are false positives that are actually representing widely distributed enzymes with various functions in producing cell wall precursors.

In order to remove false positive hits we applied filtering on 80% fragment coverage and amino acid identity, which left 1989 ARGs that were used for subsequent analysis, with a per infant mean of 166 ± 53. On the phylum level, 80.5% of these ARGs were found on MAGs classified as Proteobacteria, followed by Firmicutes (15.0%) and Actinobacteria and Bacteroidetes (both at 2.0%). On the species level, 17 out of the 20 top ARG carriers were classified as Proteobacteria (Table 4), with E. coli having by far the highest carriage at 28.0% of all annotations. The three species in the top 20, that were not members of the Proteobacteria, were Staphylococcus aureus (4.0%) Enterococcus faecalis (2.3%) and Staphylococcus epidermis (1.1%). In all twelve infants, E. coli was the dominant carrier of ARGs.

Table 4 17 out of the 20 top ARG carriers are classified as Proteobacteria, with E. coli having the highest percent carriage.

The majority of ARGs assigned to E. coli by contemporaneous correlation are found on E. coli MAGs

Since E. coli had the highest carriage of ARG by a large margin, we focused our analysis on E. coli in order to compare ARG assignment by contemporaneous correlation with MAG annotation. First, the estimated proportion of reads mapped to E. coli MAGs was highly correlated with E. coli population dynamics estimated with MetaPhlAn, with a mean Pearson’s R2 of 0.95, demonstrating good agreement between the two approaches. In fact, the mean correlation was pulled down by ID6 (R2 = 0.68), without which, the mean correlation was 0.97. Next, we compared ARGs assigned to E. coli by contemporaneous correlation in an infant with ARGs annotated on the corresponding E. coli MAGs. E. coli MAGs were generally of good quality (mean completeness, contamination and strain heterogeneity of 90.7, 11.1 and 73.4%, respectively; Table S8). MAG completeness was reduced in ID10 and ID11, the twins, likely due to the presence of a second Escherichia MAG that could not be reliably classified to the species level, and that likely confounded the binning process. In the other 10 children, a single Escherichia MAG was found and classified as species coli (mean completeness 95.8%). The mean number of ARGs annotated to the E. coli MAGs was 44 (range 32–51), with 13 genes found in all 12 infants and 7 in only a single individual (Fig. S8). In addition, the majority of the ARGs assigned to E. coli by contemporaneous correlation were also found on the corresponding MAG in each child (mean 72.6%), with the exception of ID6, where no genes were assigned by correlation.

A total of 29 unique ARGs were assigned to E. coli by contemporaneous correlation in at least one child without being found on the corresponding E. coli MAG in the child in which the correlation was observed (Table S9). Most of these were found in only one or a few infants, but an AmpC beta-lactamase gene (ARO: 3004290) conferring resistance to penicillin-like and cephalosporin-class antibiotics was assigned to E. coli by correlation in ID1, 2, 3, 4, 8, 10 and 11. The fact that this ARG was identified in seven infants, and not found on the corresponding E. coli chromosomes, suggests that it may be associated with a plasmid, or some other mobile element that was not readily assembled on a contig or successfully binned. As another example, in six of the infants we associated acrA (ARO: 3004042) with E. coli using contemporaneous correlation, while this gene was not annotated on the corresponding E. coli MAGs. This gene is normally associated with an efflux system in Enterobacter cloacae.

Virulence factor carriage declines over the first year of life, and correlates with Proteobacteria, and Escherichia coli in particular

We also investigated the prevalence of virulence factors (VFs) in the metagenomes of the 12 infants. We identified a total of 4487 genes encoding VFs in all 12 infants combined, and 742 unique VF genes in the full dataset, with a mean mapping rate to VFDB of 0.48% (range 0.02–2.65%). 73 of these genes were shared between all 12 infants, while 131 were unique to a single child (Fig. S9). Like for ARGs, the strongest structuring factor for VF carriage was identity (R2 = 0.22, p < 0.001, PERMANOVA), while time of sampling also had a highly significant overall effect (R2 = 0.07, p < 0.001). As with ARGs, the proportion of reads mapping to the VF database showed a significant general negative time trend over the first year of life (p < 0.001, GAM; Fig. S10a). This trend was significant in 10 of the 12 VF categories observed, with stress-survival and motility being the only exceptions (Table S10). Unlike for ARGs, we did not see a significant trend in VF diversity (Fig. S10b). The read mapping rate to VFDB showed a significant contemporaneous correlation to the CARD mapping rate in all 12 infants (mean R2 = 0.87, Table 5), as well as a strong correlation (mean R2 = 0.82) with the relative abundance of Proteobacteria (Fig. S11, Table 5). As for ARGs the exception was ID12, where the correlation was negative and non-significant, while the relationship was significant in the other infants (Table 5). Overall VF carriage also correlated strongly with E. coli relative abundance (mean R2 = 0.75; Fig. S11), with a significant relationship for all infants except ID6 and 12 (Table 5). Furthermore, excluding ID6, a mean of 61 individual VF genes (range 10–137) showed strong (R2 > 0.97) contemporaneous correlation with E. coli population dynamics (Fig. S12).

Table 5 The mapping rate to VFDB showed a significant contemporaneous correlation to the CARD mapping rate as well as a strong correlation with the relative abundance of Proteobacteria and E. coli in particular.

Contigs with co-localized ARGs and MGEs decrease in abundance over time

The potential of ARGs to be mobilized and transferred from one taxa to another was investigated by annotating the contig assemblies for both ARG and MGE markers in order to identify physical linkages between the two gene categories. In total, 2726 ARGs were annotated on 2002 contigs, and 729 of these contigs also carried one or more putative MGEs. Kaszab et al.41 used a universal cutoff of 10 kb separation between an ARG and an MGE in order for an ARG to be considered to have a high probability of being mobilizable. This cutoff was considered conservative relative to those used by other investigators42,43. For the 5 MGE categories defined by MobileOG we modelled the cumulative percentage of ARGs linked with MGEs as a function of the distance between ARGs and the nearest MGE (Fig. 6a). The total proportion of annotated ARGs located within 150 kbp of an MGE was 41%, with more than half of these found in relatively close proximity (< 10 kbp; Table 6). Considering all MGE-linked ARGs, irrespective of MGE category or the distance between ARGs and MGEs, normalized abundances showed a significant overall negative trend over time (Fig. 6b). The same time trend was seen for contigs with ARGs and MGEs in close proximity (< 10 kbp) (Fig. 6c). Significant negative time-trends were also observed when considering plasmids and phages separately (Table 7). In contrast, IGEs showed a significant positive time-trend. However, it should be noted that the categories Plasmid and Phage (557 and 310 contigs, respectively) showed a substantial overlap (190 contigs). This is due to both dual MGE categories for certain MobileOG genes, like ratA and hokG (104 contigs), and annotation of both plasmid and phage genes on the same contig (163 contigs). The MGE category IS co-localized with ~ 1% of the total ARGs and was excluded from further analysis.

Figure 6
figure 6

Contigs with co-localized ARGs and MGEs decreased in abundance over time. (a) Percentage of potentially mobile ARGs from the total pool of ARGs annotated on contigs, grouped by mobility category. Generalized additive models are fitted on the distance between annotated ARG and nearest MGE in kilobase pair (Kbp) against the percentage of mobilizable ARGs. Shaded areas represent 95% confidence intervals and are color coded by MGE category. The models for Mobile ARGS, Plasmids and Phage are highly significant (p < 0.001), the models for IGE, CE and IS are not significant. (b,c) Relative abundance of mobile ARG contigs plotted against day after birth. The trend lines represent generalized additive models. Both models predict a significant linear downward trend. The grey shaded bands represent 95% confidence intervals. The dots are color coded according to individual provenance of samples as indicated in (b). (b) Includes all potential mobile ARG contigs irrespective of the distance between ARG and MGE, while (c) only includes ARGs localized within 10 Kbp of an MGE.

Table 6 Percentage of ARGs annotated on contigs found within the indicated proximity to at least one MGE.
Table 7 Total MGE-linked ARGs showed a significant overall negative trends over time and also when considering plasmids and phages separately.

Of the 491 contigs with at least one ARG in close proximity of an MGE, 79.0% were binned to a MAG. On the phylum level, most of these contigs were binned to MAGs classified as Proteobacteria (66.8%), with 7.4% and 3.9% in bins classified as Firmicutes and Bacteroides, respectively. Escherichia coli carried by far the largest proportion of mobile ARGs with 30.5%, followed by Enterobacter hormaechei A and Citrobacter freundii (both ~ 5%, Table S11).

MGE categories are differentially linked with different categories of antibiotics and resistance mechanism

The MGE category to which ARGs are linked might influence how frequently these genes are transferred, and to which taxa. Furthermore, different resistance mechanisms could preferentially associate with certain MGE categories. Looking at the distribution of ARG resistance mechanisms by MGE category, and only considering ARGs located within 10kb of an MGE, we did not see significant differences in the distribution of resistance mechanisms between the total ARG pool and the total pool of MGE-linked ARGs (Fig. 7). However, the resistance mechanism profiles within the various MGE categories did differ significantly compared to the total pool of MGE-linked ARGs in some cases (Table S12). For example, in contrast to the other MGE categories, ARGs linked with IGEs did not predominantly encode efflux system components, and the dominant resistance mechanisms of IGE-linked ARGs was drug inactivation. The frequency of linkage between ARGs conferring resistance through target protection and both categories of transposon (IGE and CE) was more than fivefold that of the full distribution of MGE-linked ARGs (p < 0.05 for both comparisons). Plasmids and phages had a significantly higher proportion of linkages with efflux pumps (p < 0.01), and a lower proportion linked to drug inactivation genes (p < 0.01), compared with the full distribution. We also observed substantial variation among individuals (Fig. S13), with for example linkage between the resistance mechanism ‘reduced permeability’ and the MGE categories plasmids and phages only observed in one infant.

Figure 7
figure 7

Resistance mechanism profiles within MGE categories differed significantly from the total pool of MGE-linked ARGs. (a) Proportion of ARG resistance mechanisms per mobility type with < 10 Kb distance between ARGs and MGEs. The number at the bottom of each bar represents the total number of ARGs within the indicated MGE category. (b) Counts of ARG resistance mechanisms from (a). ARGs annotated on the same contig with several MGEs are shown in each MGE category. ARGs with multiple resistance mechanisms are counted multiple times. Corresponding Wilcoxon test results per resistance mechanism per MGE category are found in Table S12.

Since efflux systems frequently confer resistance to multiple classes of antibiotics, these were excluded from analysis of potential linkages between MGEs and ARGs within different drug classes. With respect to the main drug classes, the distributions of all ARGs mapped to contigs and MGE-linked ARGs showed a few notable difference (Fig. S14, Table S13). ARGs conferring resistance to rifampicin were found on a number of contigs, but in no instance were these genes linked with any category of MGE. In addition, the proportion of ARGs conferring resistance to aminoglycoside and tetracycline were significantly lower in MGE-linked ARGs (p < 0.006), while resistance to beta-lactams was higher (p < 0.004). Noteworthy differences (Table S14) between MGE categories and the full distribution of MGE-linked ARGs included the lack of tetracycline resistance genes on plasmids, and a high proportion of these genes linked with IGEs compared to the total MGE-linked ARGs (p < 0.05). On the other hand, plasmids were more frequent carriers of beta-lactam resistances genes (p < 0.05).

Discussion

Introduction (core and pan resistome)

Here, we present an in-depth survey of ARG dynamics in healthy infants over the first year of life. We identified a sizeable core resistome, i.e. genes detected in at least one sample in all 12 infants, as well as a similar size set of resistance genes unique to individual infants. Li et al.44 investigated gut resistomes in a cohort of 662 healthy 1-year-old Danish infants, reporting a small core resistome consisting of the adeF and tetQ genes that were found in a large majority (97.9 and 85.8%, respectively) of the samples. Interestingly, we identified tetQ as part of the core resistome in our cohort, but failed to detect any component of the adeFGH efflux complex. However, we found that the core resistome consisted predominantly of multidrug efflux systems. These systems are widespread on bacterial chromosomes, suggesting that they are quite fundamental to cell function, operating in the transport of large numbers of small molecules across the cell membrane.

Among ARGs unique to individual infants, the predominant mechanism of antibiotic resistance was drug inactivation. Many of these genes were beta-lactamases, as well a number of aminoglycoside modifying enzymes with a high potential for mobility45. We also found several vancomycin resistance genes that operate by altering the structure of the target (peptidoglycan) rather than modifying the drug. Similarly to Li et al.46 the most predominant class of resistance mechanism overall was antibiotic efflux.

ARG carriage and diversity declines over the first year of life

ARGs in the gut microbiome of young infants have been found to be more abundant than in the microbiome of their mothers15,47, indicating relatively high carriage during early development. Furthermore, a study comparing ARG carriage in infants at the age of 6 weeks and one year also found reduced ARG abundance at one year48. Here, we used a high-resolution time series approach to lend credence to the notion of a general abatement of the GI resistome during early GI microbiome development, both in terms of ARG abundance and diversity. Interestingly, glycopeptide (predominantly vancomycin) and sulfonamide antibiotics showed small, but significant increases over time. These were both rare classes of resistance genes, both in terms of total abundance and the distribution among infants. For example, four out of the twelve observed vancomycin resistance genes were unique to a single infant. We do not have a good explanation for the observed increase in vancomycin resistance over time, but it may be linked to rare gram-positive taxa49. Given the importance of this antibiotic in combatting potentially lethal infections, it is noteworthy that this class of ARG bucked the overall time trend. In the case of sulfonamide antibiotics, only two out of four identified genes were specific to that class of drug, sul1 and sul2. These are widely distributed among gram negative species, and often associated with mobile genetic elements50.

ARG carriage correlates with Proteobacteria abundance and in particular E. coli

If the flow of ARGs among taxa is pervasive, resistance gene dynamics will not necessarily be reflected in the dynamics of sub-populations of microbes in the GI ecosystem. However, with limited transmission among taxa, specific ARGs can be expected to closely track the dynamics of specific lineages. Here, the clear tendency was for longitudinal variation in the infant GI resistome to be tightly linked with the dynamics of prevalent taxonomic groups, suggesting that HGT among more distantly related taxa is not pervasive over the first year of life.

Several studies have linked ARG carriage in the infant gut microbiome with Proteobacteria, and specifically E. coli.46,48. Here, not only do we confirm these general results, we leverage our individual time-series in order to assign specific ARGs to E. coli by contemporaneous correlation of population and gene dynamics within an individual infant's GI microbiome. This type of approach is facilitated by sample sets with good temporal resolution, such as in the study presented here and as demonstrated on a smaller scale by Yassour et al.47.

It is perhaps not so surprising that genes track the dynamics of the lineages that carry them. However, these dynamics can be obscured for genes that are shared between lineages with differences in population dynamics over time. In this study, a species like E. coli is classified as a single taxonomic unit, with dynamics shared among strains. Thus, potential intra-specific ARG transmission, or genomic heterogeneity among closely related strains, should not disturb ARG assignment by contemporaneous correlation on the species level. However, ARGs carried by multiple species that are more phylogenetically different can be hard to assign as the gene dynamics will then manifest as a composite of two or more independent sets of population dynamics linked to taxa that may not respond similarly to environmental conditions. Furthermore, the inherently unstable nature of the infant GI microbiome makes it hard to observe many, if not most taxa, with sufficient consistency to get good estimates of correlation. It is very likely that species ARG assignment by contemporaneous correlation is more effective using longitudinal samples from a more mature GI microbiome where community membership and composition are more stable.

While a strong correlation between ARG carriage and the abundance of Proteobacteria (and E. coli in particular) was found for most of the children, there were a couple of exceptions. In ID6 we did not observe any E. coli until the end of the sampling period, and then at relatively low abundances (although enough to generate a good quality E. coli MAG). Nevertheless, we observed relatively high levels of ARG carriage early in the time series, which was strongly correlated with Proteobacteria abundance, including a massive bloom on day 32. The phylum level dynamics in this infant could be attributed mainly to Klebsiella spp., demonstrating strong dynamic coupling between Proteobacteria and ARG carriage even in the virtual absence of E. coli.

In ID12 we did not observe any significant correlation between ARG carriage and Proteobacteria or E. coli. In fact, we were unable to find strong correlations between CARD mapping rates and any of the 103 observed species. This is a surprising result that is difficult to explain. Both the very low ARG carriage and Proteobacterial abundance in ID12 relative to the other infants may contribute to the lack of observable correlation. In this scenario, the longitudinal ARG carriage profile could more easily be dominated by blooms of different ARG-carrying taxa at different time points, making the overall mapping rate a composite of multiple transient dynamic profiles. This would confound correlation matching to any specific taxon. An alternative explanation could be pervasive horizontal transfer of ARGs among lineages obscuring correlation patterns, although it is unclear why this would be happening only in this particular infant. It is also noteworthy that only five specific ARGs were linked to E. coli by correlation, the lowest by far of any of the infants (with the exception of ID6, see above), while 46 ARGs were annotated on the E. coli MAG, demonstrating the potential obstacles to gene assignment by contemporaneous correlation.

Comparison of contemporaneous correlation assignment and MAG annotation

A benefit of the contemporaneous correlation approach is that it should, in theory, be able to associate plasmid borne genes with taxa, provided the plasmids follow the dynamic patterns of the carrier taxon. A plasmid has its own life cycle, and plasmid dynamics can be seen as superimposed on the carrier population. While the temporal dynamics of a plasmid is necessarily tightly linked with the host cell cycle51, the coupling between plasmid and host population dynamics is not well described as a biological phenomenon, and the form of coupling should not be pre-assumed. However, in cases where plasmid and population dynamics track one another in real time, the contemporaneous correlation should link plasmid to taxon given time series data of appropriate resolution. In this study, we only focus on a single taxon, and compare ARGs associated with E. coli by correlation with ARGs annotated on E. coli MAGs. ARGs linked with E. coli by the former method and not by the latter can then be considered candidates for genes carried on plasmids. An obvious caveat with this approach is that incompleteness of MAGs is likely to result in false positives. Here, we observed candidate ARGs that followed this pattern in multiple children. While this in no way constitutes proof that these ARGs are carried on plasmids, these results can still be considered starting points for more in-depth study, and we still think it is an approach that could be deserving of further exploration. Especially given the relative difficulty of linking plasmids to taxa in complex microbial communities, and the importance of MGEs in the spread of antibiotic resistance.

Virulence and ARG carriage are both linked to Proteobacteria and E. coli

Antibiotic resistance is a natural phenomenon, and not necessarily problematic when it occurs in non-pathogenic bacteria. Problems arise when resistance is coupled with virulence potential such as in typical pathobionts of the human gut microbiome (i.e. Staphylococcus aureus, Enterococcus faecium and Clostridium difficile, as well as E. coli and Klebsiella spp). VFs increase the virulence potential of a pathogen by assisting in colonization of the host and enhancing the ability to cause disease. Here, we report that the repertoire of genes linked with bacterial virulence follows the same time-trends as ARGs, with VF abundance generally declining over the first year of life. Furthermore, VF gene abundances are strongly linked with ARG carriage, and track the dynamics of Proteobacteria, and E. coli in particular. These observations may, at least in part, be explained by database bias (i.e. identification of more VFs associated with E. coli due to the extra effort) resulting from the enormous amount of basic research in molecular genetics carried out using E. coli as a model organism. Nevertheless, the E. coli genome is famously plastic with huge intra-specific variation in gene repertoire and an apparent appetite for incorporating new genetic elements52,53.

ARGs assigned to contigs with MGEs decrease in abundance over time

MGEs constitute a diverse group, including plasmids, phages and transposons, and the boundaries between MGE categories can be unclear54. In addition, certain MGEs can facilitate the transfer of other MGEs in the same host. For example, conjugative plasmids can facilitate the transfer of plasmids lacking the conjugation machinery55, phages can transfer select plasmids or pieces of chromosomal DNA in Staphylococcus aureus56,57, and phage-plasmids can replicate like plasmids but transfer as phages58. Like ARGs, MGEs have been found to be more abundant in infants compared to their mothers15, with high abundances during the first weeks of life and a gradual decline over the first year59.

Here, we observe the same decreasing trends in abundance over time for both plasmids and phages. This is in contrast to a previous study that found an increase in the abundance of phage-associated ARGs over the first year of life, both in vaginally born and caesarian-delivered infants60. Phages overwhelmingly dominate the gut virome61, but phage dynamics and ecological impacts on natural GI microbial communities are not well characterized62. Temporal variation in virome size and composition is necessarily linked with dynamics in the host populations, but this relationship can be complex and influenced by numerous ecological factors. A large metagenome study across multiple cohorts and age groups found that abundances of phage populations declined steadily during infancy until about 3 years of age, only to recover during childhood and teen years to a higher adult-like population level61. Another study, looking at the gut virome in infants from birth until an age of 3 years, also found decreasing diversity over time, as well as a shift in composition from a phage population dominated by Caudovirales to one dominated by Microviridae63. The implications of phage community dynamics, both on short and long timescales, for transmission and dissemination of ARGs in the gut microbiome could be substantial and is a venue for future research.

Like phages, plasmid dynamics are necessarily linked with the host cell, with a life cycle that depends on factors like the host strain64, nutritional status65, and environmental stressors66. Most plasmid studies have been done in axenic laboratory systems, and very little is known about how plasmids behave in natural ecosystems, but it is likely that biotic interactions among microbial populations can have major effects on the dynamics of plasmids in a community. A study using an artificial system based on the plasmid pUC19 allowed environmental tuning of copy number from a single molecule up to 80067, demonstrating a great potential for plasticity. The same study also found a substantial metabolic burden of plasmid carriage that scaled linearly with the copy number, illustrating potential fitness conflicts between plasmid and host. Another study found a bimodal distribution in the expression levels of a plasmid-encoded reporter gene within a clonal population of E. coli68, demonstrating substantial heterogeneity even in a homogenous environment. As we did, Busi et al.60 found that the abundance of ARG-associated plasmids declined over the first year of life, but this was not a clear trend. It is not intuitive why plasmids and phages would become rarer as the host community becomes more diverse with age. It is likely that this is a reflection of demographic changes in the gut microbial community, but given the complexity of the interactions between the host cells and these classes of MGEs, further work is needed to substantiate our findings. We would also point out that it can be difficult to discriminate between plasmids and phages based on DNA sequence data, and the line between the MGE classes can be blurry58, complicating data interpretation. In this study, transposons constituted a rare class of MGE relative to plasmids and phages, and we did not find discernable temporal trends. While more than 30% of contigs with MGE-linked ARGs binned with an E. coli MAG, none of these contigs carried a transposon-related sequence, and transposons preferentially binned with Bacteroides spp. or were not binned at all.

The high binning rate of contigs with MGE-linked ARGs observed in this study was somewhat of a surprise. MGEs often have a DNA sequence composition that deviates from that of the host chromosome, as well as variable copy numbers and repeated sequence components, which can confound both assembly and binning algorithms. A study using simulated metagenome data found that only around 40% of contigs with genomic islands were correctly binned, while the proportion for plasmids was substantially lower than that69. Particularly for plasmids, the authors achieved assembly and binning rates of up to 70%, but most contigs were assigned to the wrong genome bin. The high binning rates we observed here might be related to efficient contig production due to co-assembly of time-series samples sequenced to a very high depth. However, we have no fully tractable way of checking the binning correctness of contigs with MGE-linked ARGs, and it is not unlikely that many of those contigs were wrongly binned. Furthermore, issues with assembly probably caused us to lose a substantial proportion of MGEs in the data.

Conclusion

In this study, we employed in-depth surveillance of a highly dynamic microbial ecosystem in order to describe patterns of ARG carriage during a critical stage of human life. While many ARGs have the potential to mobilize across taxa, we found that the bulk of the observed resistance markers were confined within lineages. In particular, rather than acting as a hub for ARG transmission, E. coli seems to function more as a sink where ARGs accumulate. In the one case where E. coli was all but absent from the developing gut microbial community, ARG carriage was not much reduced relative to other infants, and the role of ARG sink was apparently adopted by Klebsiella spp. In the one case were Proteobacteria were relatively rare over the first year of life (ID12), the ARG-taxon contemporaneous correlation broke down, and we were not able to associate ARG carriage with specific populations. While we do not have a good explanation for this particular observation, it does suggest that Proteobacteria abundance could function as a general biomarker for ARG carriage and virulence potential in infants. However, the fact that virulence and antibiotic resistance are intimately linked emphasizes the importance of surveillance in order to detect evolutionary events producing new resistant pathogens.

We would like to point out that the limited cohort size of this study and the inherent temporal and compositional variability of the infant microbiome can make it difficult to generalize some of the results presented here. Studies using high temporal resolution sampling in larger cohorts, with more extensive metadata, should be undertaken in order to substantiate our findings. Furthermore, metagenomic data could be calibrated against phenotypic and genomic data obtained from isolates in order to link individual strains of the infant gut metagenome with virulence potential and antibiotic resistance.