Introduction

Numerous studies of the marine deep sedimentary biosphere have explored controls on microbial community composition, activity and abundance, and have identified the impact of in situ temperature1,2 sediment composition3, sediment depth and age4,5 electron acceptor availability6, organic carbon content7, and distance from shore8. Localized geochemical interfaces9,10, organic-rich layers11, and deeply buried, distinct microbial niches12 can further modify the composition and activity of the deep sedimentary biosphere. While all these studies provided precious information on the deep sedimentary biosphere, the International Ocean Discovery Program (IODP) Expedition 385 offered the drilling opportunity to sample hydrothermal subsurface sediments at multiple locations where temperature gradients are gradual enough to permit examination of microbial communities down to ~ 180 meters below sea floor (mbsf).

Here, our overall aim was to investigate how distinct geothermal gradients, porewater geochemistry and sediment lithology shape the bacterial and archaeal communities of organic-rich deep subsurface sediments that are exposed to different degrees of hydrothermal influence. Among data-rich hydrothermal deep biosphere systems with well-defined chemical and physical context1,13,14,15, Guaymas Basin stands out as the first deeply-drilled transect across a hydrothermally-active ocean spreading center16. This young, marginal rift basin in the central Gulf of California, Mexico has thick, organic-rich sediments due to the highly productive overlying water column, and due to terrigenous inputs. Magmatic intrusions (primarily dolerites) embedded within this sediment matrix influence hydrothermal circulation17, and thermally alter the buried organic matter18. In Guaymas Basin, thermal decomposition of organic matter at depth produces complex mixtures of methane, short-chain alkanes, petroleum hydrocarbons, organic acids, and ammonia that rise towards the sediment surface in nutrient-rich hydrothermal fluids. Studies of near-surface sediments in Guaymas Basin show that these fluids sustain abundant microbial communities that re-assimilate hydrocarbons into biomass19,20,21.

Since environmental factors influence microbial subsurface ecosystems in different ways, well-documented subsurface sites with precisely quantified geochemical, thermal and sedimentological parameters are particularly attractive targets for studies that seek to determine the impact of different environmental parameters on microbial populations. Although the sulfur-, methane- and hydrocarbon-oxidizing microbiota in the surficial sediments of Guaymas Basin are well-studied22,23, investigations of the subsurface biosphere of Guaymas Basin were limited to 16S rRNA amplicon sequencing down to 10 mbsf24,25,26 until the IODP Expedition 385 in 2019. IODP 385 provided an opportunity to drill deep sediments with distinct geochemical and thermal regimes at different sites, and to apply a full array of modern biogeochemical and microbiological approaches27.

Specifically, IODP 385 drilled eight locations in Guaymas Basin that differ in their degree of hydrothermal influence16. The sites (U1545-U155228,29,30,31,32,33,34) follow broadly a northwest-southeast transect across the northern Guaymas Basin (Fig. 1a). At the paired Sites U1545 and U1546 (located ~ 52 km northwest of the northern Guaymas Basin axial graben), sediment cores were recovered down to ~ 540 mbsf. These two sites differed by a massive sill intrusion between ~355–430 mbsf at Site U1546 (77 m thickness, 149,000 years minimum age)35. Paired sites U1547 and U1548, located ~ 27 km northwest of the axial graben at the Ringvent site, targeted hot sediments near a shallow, recently emplaced sill intrusion (ca. 5000 to 10,000 years old) that creates steep thermal gradients and drives active hydrothermal circulation24. Hence, Sites U1547 and U1548 are our best examples for the influence of hydrothermalism in Guaymas Basin24,36. The axial Site U1550 was characterized by the lowest geothermal gradients amongst all IODP 385 drill sites, suggesting local conductive cooling. The two off-axis sites U1549 and U1552 target methane cold seeps which are sustained by deeply buried and thermally equilibrated sill intrusions at several hundred meters depth. The most southeastern Site U1551 was located in the Yaqui River distal submarine fan in the southern Guaymas Basin, southeast of the axial graben, and was dominated by gravity-flows of terrigenous silt and sands.

Fig. 1: IODP 385 drill sites.
figure 1

Overview of the International Ocean Discovery Program Expedition 385 (IODP385) drill sites. a Bathymetry of Guaymas Basin with the locations of the IODP385 drill sites. The inner figure shows the sampling location in Guaymas Basin; red lines depict the transformation faults, and green lines show the oceanic spreading centers along the transformation faults in Guaymas Basin. b the Ringvent area with Sites U1547 and U1548. Orange lines represent the traces of seismic profiles; A-E the different Holes drilled at each site. c Interpolated in situ temperatures for sediment samples. Bathymetric map of Guaymas Basin courtesy of D. Lizarralde (WHOI).

IODP 385 studies to date have reported on the genomic potential (metagenome analyses) and on specific categories of active metabolic processes/strategies (metatranscriptome analyses) that sustain life in deep Guaymas subsurface37,38,39,40. In this study, we examine the subsurface diversity of Bacteria and Archaea within contrasting hydrothermally-influenced sediments of Guaymas Basin using 16S rRNA marker gene sequencing to provide a more detailed profile of changes in microbial taxa with depth, temperature, and sediment properties. To provide high-resolution 16S rRNA marker gene profiling for both Bacteria and Archaea we combine data from two different high-throughput sequencing platforms. We generated short 16S rRNA amplicons of ~ 400 base pairs using general prokaryotic primers (Illumina MiSeq), and longer 16S rRNA amplicons of ~800 base pairs using archaea-specific primers and PacBio sequencing to more specifically detect and identify archaeal lineages41. We relate these diversity profiles with cell abundances recovered from the same sites and depths, in situ temperatures, porewater geochemistry, and mineralogical characteristics of the sediments, and we connect observed distributions of particular taxa to evidence for their metabolic activities in available metatranscriptome data from IODP 385. Targeted phylogenetic analyses for selected archaeal taxa identify ASVs affiliating with a few thermophilic and hyperthermophilic archaeal lineages that occur at specific sites, albeit in smaller amplicon numbers relative to non-thermophilic subsurface bacteria and archaea that dominate the amplicon libraries at all sites.

Results and discussion

Thermal and geochemical site characteristics

Among oceanic sedimented basins and their geothermal gradients42, the Guaymas Basin is exceptional due to elevated temperatures and high heat flow that permeate the entire basin to varying degrees43. The steepest thermal gradients are consistently detected in the hydrothermally influenced Ringvent sites U1547 and U1548 where a hot, recently (ca. 10,000 years) emplaced sill is heating the sediment cover, driving hydrothermal circulation along the crest of the buried sill24. Our sediment samples from Ringvent cover in situ temperatures between 8.2 to 68.0 °C (Fig. 1b, c; Table 1). The steep downcore temperature gradient leads to quickly decreasing cell numbers below approximately 50 mbsf (Fig. 1c, and section “Overview of cell counts and prokaryotic diversity”). At the cooler sites U1545 and U1546, temperatures range between 2.8 °C to 44.8 °C for most sampled depths (Fig. 1a). Most sediment cores collected from each site extend into the sulfate-reducing zone, the sulfate-methane transition (SMTZ) zone, and the methanogenic zone. We note that the examined sediments at Ringvent sites U1547B and U1548B did not reach the deep methanogenic zone, and that at Sites U1550B and U1552B sediments above the shallow SMTZs (~17 and 20 mbsf, respectively) are only represented by one sample each.

Table 1 Geochemical data and in situ temperatures for Guaymas subsurface sediments analyzed for 16S rRNA gene sequencing

Geochemical profiles of Guaymas Basin subsurface sediments show comparable trends between sites, with the exception of Ringvent sites that stand out from the other drilling locations (Table 1). As an example, sulfate concentrations at almost all sites show gradual decrease with depth from near seawater concentration (28 mM) to 0.3 mM near or below the SMTZ, while at Sites U1547 and U1548 the sulfate concentrations are consistently close to near-surface concentrations (24.5 ± 3.3 mM) at all examined depths above 90 mbsf. Generally, indicators of organic matter remineralization (alkalinity, dissolved inorganic carbon (DIC), sulfide, ammonia) peak around the SMTZ, or reach a broad zone of higher concentrations that extends further downcore. In comparison to other drilling locations, the Ringvent sites also show the lowest porewater ammonia and DIC concentrations, suggesting reduced organic matter biomineralization (Table 1). This is consistent with lower detected downcore cell counts at Ringvent sites (Fig. 2). Dissolved organic carbon (DOC) measurements range from 1.47 to 64.7 μM for samples from all sites, with the exception of Site U1552B where DOC concentrations are markedly higher (182 and 214.4 μM) at the SMTZ. The highest measured methane concentration occurs at the cold seep site U1549B (7.09 mM) at a depth of 45.5 mbsf below the SMTZ. We note that in situ methane concentrations are likely higher than those detected, because of outgassing during core recovery.

Fig. 2: Cell abundances at IODP 385 drill sites.
figure 2

Cell abundances are shown for the eight IODP385 drilling sites. a Cell counts are plotted against the depth regime only for those samples discussed in this study (0.8 to ~ 180 meters below sea floor; mbsf). b Cell counts are plotted against temperature (2.8 oC to 68 oC) for samples discussed in this study.

The mineralogic and elemental composition of the samples determined by X-ray Diffraction (XRD) and X-ray Fluorescence (XRF) analyses, reflect a mixture of biogenic, clay mineral and siliciclastic components (mainly quartz and feldspars). While diatom-biosilica is the main biogenic component at all sites, it is more abundant at Sites U1545B and U1546B, and least abundant at U1551B which is influenced by sedimentation from the Yaqui River. Dolomite, a mixed, calcium (Ca2+) and magnesium (Mg2+) authigenic carbonate (CaMg(CO3)2), occurs in trace amounts in all shallow sediments, and is most abundant in the examined sediments from the Ringvent site U1547B (Supplementary Data 1).

Overview of cell counts and prokaryotic diversity

Figure 2 shows cell counts plotted against temperature and against sediment depth. While we observe that cell abundance decreases as temperature and sediment depth increase, we notice different trends with depth (Fig. 2a), and with in situ temperature (Fig. 2b). Specifically, when plotted against depth, cell counts decline in a site-specific manner: at the hot Ringvent sites U1547 and U1548, the downcore decrease of cell counts is steeper than at the remaining sites with cooler temperatures (Fig. 2a, Table 1). When plotted against temperature, cell counts decrease more uniformly at all sites, from near >108 cells cm−3 at cool temperatures near the sediment surface (ca. 5 °C), towards 103 cells cm−3 in hot, deep sediments near 60–70 °C (Fig. 2b, Fig. 1c). These cell count trends are reflected in similar patterns of downcore decreasing DNA recovery that show lower DNA yields at the hot Ringvent sites compared to other sites at similar depths44. Further, they can also explain the recovery of 16S rRNA amplicons from deep samples at cool sites (max. 177.4 mbsf at U1545B), whereas at hot sites, successful recovery of longer 16S rRNA amplicons is limited at “shallower” samples that do not exceed 100 mbsf (e.g., 94.6 mbsf U1547; see below).

High-throughput Illumina MiSeq 16S rRNA amplicon sequencing indicates 11 bacterial classes plus 1 unclassified bacterial group that jointly account for 88.8% of Illumina MiSeq reads. Apart from the unclassified group found exclusively at 1.7 mbsf in Site U1545B, all bacterial lineages are shared between at least two drilling sites (Fig. 3). Of the Illumina MiSeq 16S rRNA amplicon sequences (hereafter referred as MiSeq ASVs), few ASVs were assigned to Archaea; (11.2% of the total reads associated mainly with Thermoplasmatales, Bathyarchaeia and unclassified Archaea; Fig. 3). The relatively low recovery of archaeal signatures may be representative of the community composition and in situ biomass of archaea, however, it may also be influenced by potential biases in DNA extraction for different taxa and/or successful amplification of targeted genes using selected primers. To improve coverage and taxonomic resolution of archaeal diversity, we used PacBio sequencing of the archaeal 16S rRNA gene (Fig. 4). Our efforts to amplify the archaeal 16S rRNA gene yielded positive results between 0.8 and 177.4 mbsf, but were not successful at greater depths, most likely reflecting the steep downcore decline of cell counts at all sites (Fig. 2). However, 16S rRNA archaeal amplicons (hereafter referred as PacBio ASVs; section “Methods”), include hot subsurface sediments (>45 °C; Table 1) that have been suggested to host thermophilic Archaea in Guaymas Basin39.

Fig. 3: Bacterial and archaeal diversity in Guaymas deep subsurface samples based on Illumina MiSeq data.
figure 3

Taxonomy bar chart (class level) for Illumina MiSeq 16S rRNA gene sequencing results (Bacteria and Archaea) of Guaymas deep subsurface samples. The x-axis shows normalized ASV abundance for class-level bacterial and archaeal groups from sediment samples of different Guaymas Basin drilling sites, annotated by sediment depth in meters below sea floor (mbsf) on the y-axis (left). The secondary y-axis (right) indicates temperature (oC). Sites with active hydrothermalism (U1547 and U1548) are shown in red. Temperatures at Sites U1547 and U1548 are up to a 3-fold higher when compared to the temperatures observed at the other examined sites at similar depths.

Fig. 4: Archaeal diversity in Guaymas deep subsurface samples based on PacBio data.
figure 4

Taxonomy bar chart for PacBio archaeal 16S rRNA gene sequencing results for subsurface sediment samples from all eight drilling sites. The y-axis shows normalized ASV abundance of the identified archaeal taxa, and x-axis indicates sediment depth in meters below sea floor (mbsf), while the secondary x-axis indicates temperature (oC). Sites with active hydrothermalism (U1547 and U1548) are shown in red. Temperatures at Sites U1547 and U1548 are up to a 3-fold higher when compared to the temperatures observed at the other examined sites at similar depths.

Guaymas subsurface subsurface diversity is dominated by cosmopolitan lineages

The Guaymas subsurface amplicon dataset based on Illumina MiSeq sequencing of general prokaryotic marker genes is dominated by Chloroflexota (Dehalococcoidia), Atribacterota (JS1 lineage), Planctomycetota (Planctomycetia, Phycisphaerae), Acidobacterota (Aminicenantia; former lineage OP8), Omitrophicaeota, Thermoplasmata, and Bathyarchaeia (Fig. 3). These phyla also dominate other 16S gene datasets from global surveys of non-hydrothermal, anoxic subsurface sediments obtained by deep ocean drilling5,6,45,46. Less abundant bacterial ASVs are affiliated primarily with Spirochaeota, with the candidate division TA06 that is consistently identified in estuarine and marine sediments47, and with the 4_29 lineage (phylum Nitrospirota) from non-hydrothermal marine subsurface sediments (Supplementary Fig. 1).

Our MiSeq data indicate that JS1 and Dehalococcoidia co-occur in 82% of the examined samples and are the most frequently detected subsurface prokaryotic taxa. At 2.1 mbsf in the Ringvent site U1547B, JS1 and Dehalococcoidia cover 86.2% of the total U1547B ASV reads and are represented by 13 different ASVs in each taxon. Overall, the identified JS1 signatures are dominated by a single ASV (22% of all occurrences) that was observed in 32 samples making it the most abundant bacterial ASV. In addition, we find that the majority of Dehalococcoidia ASVs is recovered from depths that do not exceed ~32 °C, consistent with the previously observed preference of Chloroflexota MAGs for temperatures between 2 °C and 45 °C39 (Fig. 3; Table 1). Dominant deep subsurface ASVs within the Chloroflexota and the Atribacterota form clusters with marine sediment clones from different geographic locations, including Guaymas Basin (Supplementary Fig. 1).

While most members of JS1 and Dehalococcoidia remain uncultivated, key metabolic and physiologic traits inferred from molecular analyses of cell isolates, predict genetic flexibility associated with anaerobic respiration, organic carbon uptake, and sulfur cycling48,49,50. Metatranscriptomes processed for subsurface samples from these Guaymas Basin sites indicate that Dehalococcoidia and Atribacteria (JS1 lineage) are metabolically active and sustain their survival in the Guaymas subsurface using mechanisms of genome editing38, as well as genes associated with metabolic processes and environmental responses shown in Supplementary Data 2. In particular, we find genes involved in carbon fixation, hydrogen/organohalide respiration, degradation of nitroalkanes, terpenoid metabolism, as well as expression of response genes which can be activated under pH/temperature changes and environmental stresses, to arrest translation (inhibitor RaiA51), promote sporulation, and control the flexibility of the membrane (slipins and their associated NfeD protein family52). Some of these metabolic processes (e.g., organohalide respiration in Dehalococcoidia) are metabolic traits also suggested in other deep sea sediments including the Peruvian Margin53. Subsurface Atribacteria, a consistently dominant population in deeply-buried marine sediments54, show unusually low mutation rates, measured in single nucleotide polymorphic positions per genome and per generation55, suggesting a low effective population size and thus a high proportion of dormant cells.

The metabolic regime of the other identified subsurface taxa (e.g., Aminicenantia; Anaerolineae, Deltaproteobacteria and Planctomycota; 8.6%, 4.5%, 4.6% and 5.03% of bacterial reads) indicates chemoautrophic (e.g., Wood–Ljungdahl pathway) and/or heterotrophic lifestyles that involve degradation of carbohydrates and use of necromass (e.g., by Aminicenantia, see Supplementary Data 2). Extended description of expressed genes for these bacterial phyla is provided in the Supplementary Information and Supplementary Data 2. We particularly note that Planctomycetota were detected at almost all sites, except U1551B (Fig. 3). Metatranscriptomes indicate that Planctomycetota are active only at shallow depths (0.8–2.1 mbsf; Sites U1547B-U1552B), and express, in addition to energy-related and metabolic genes, also carbon storage (CsrA)/carbon starvation regulators (CstA) that arrest protein translation56. These regulators allow fast responses to nutrient depletion and can cause substantial shrinkage of the inner membrane driven by loss of cytoplasmic water content, a tactic that allows cells to remain metabolically active despite the lack of nutrients57. It appears these Planctomycetota survive within shallow Guaymas Basin sediments, but quickly become stressed by higher temperatures and lower nutrient availability with increasing depth. However, this requires further examination including microscopy surveys.

Correlation of environmental factors with bacterial and archaeal communities

To evaluate the potential impact of depth, temperature, geochemistry and mineral composition on microbial community structure, we performed nonmetric multidimensional scaling analyses (nMDS) for the prokaryotic (MiSeq ASVs) dataset and for the parallel dataset of archaeal amplicons obtained by PacBio sequencing (Fig. 5). p-values were determined using the envfit function in the Vegan R package (see “Methods” and Supplementary Data 1). The samples are coded by in situ temperature and grouped at three different regimes, to reflect the temperature profiles of the examined sediments (cool: 2–20 °C; warm: 20–45 °C; hot: 45–68 °C). Similar nMDS analyses where samples are coded by depth (shallow, intermediate, deeper depths; 0.8–15, 15–60, >60 mbsf) are shown in Supplementary Fig. 2 and provide very similar results as with temperature.

Fig. 5: nMDS ordination plots for Guaymas Basin bacterial and archaeal communities.
figure 5

Non-metric multidimensional scaling (nMDS) ordination plots of prokaryotic (MiSeq ASVs) and archaeal (PacBio ASVs) communities with mineralogical and geochemical parameters. On the basis of Fisher’s method for combining p-values, we show the geochemical/mineralogical variables with p-values < 0.05 resulting from a two-sided permutation test performed using envfit. a Distribution of prokaryotic (bacterial and archaeal) communities are influenced by depth (mbsf), p = 0.0048; NH4+ (mM), p = 0.0072; SO42-(mM), p = 0.0192; Mg2+ (mM), p = 0.0112; TOC (wt%), p = 0.036; propane (µM), p = 0.020; n-butane (µM), p = 0.0312. b Archaeal communities are affected by depth (mbsf), p = 0.0032; Temperature (°C), p = 0.0032; TOC (wt%), p = 0.0086; TN (wt%), p = 0.0048; CO (nM), p = 0.0096; dolomite, p = 0.038; Rb, p = 0.016; Hg, p = 0.0032; Ba, p = 0.043; dolomite, p = 0.038. The directions of the arrows indicate a positive or negative correlation among the environmental parameters with the ordination axes. Arrow length reflects correlation strength between environmental parameter and ASV occurrence. The samples are color-coded by site, and their depth temperature regimes are indicated by shape (circles for 2–20 °C; triangles for 20–45 °C and squares for 45–68 °C).

Overall, sediment depth (p = 0.0075), NH4+ (a proxy of cumulative organic matter remineralization; p = 0.02434) and short-chain alkane concentrations (propane p = 0.0499; n-butane, p = 0.0941) separate the prokaryotic community of cool samples (2–20 °C) from warm sediments (20–45 °C) at almost all sites (Fig. 5a). Seawater proxies such as sulfate correlate with the distribution of the prokaryotic communities among the most shallow and cool samples but have little effect on sediments from Site U1551B (Fig. 5a). Similarly, TOC (wt%) impacts the composition of some shallow/cool sediments, and particularly at Site U1546B, which presents high TOC composition at shallow depths (Table 1). Finally, a significant (p = 0.0112) correlation appears between Mg2+ and the composition of microbial communities, largely separating cool/shallow samples from warm and hot samples. While Mg2+ is a major constituent of seawater, its absence serves as an indicator of hydrothermalism58. The samples with the strongest correlation with Mg2+ are from the hydrothermally influenced Site U1548B. Other measured elemental/mineralogical data did not demonstrate any significant correlation with the prokaryotic community distribution (MiSeq ASVs; Fig. 5a and Supplementary Data 1). However, weathering, hydrothermal alterations and the dissolution rates of minerals are intrinsic factors that potentially drive the distribution of bacteria in both terrestrial and marine settings59.

In contrast to nMDS analysis using the general prokaryotic MiSeq ASVs (88.2% bacterial ASVs), the nMDS analysis using the archaeal 16S rRNA data (PacBio ASVs) shows that both mineralogical and geochemical properties of the subsurface sediments correlate with the downcore distribution and diversity of Archaea (Fig. 5b). Fitting of the mineralogical/elemental data collected with X-ray spectroscopy indicate that elements such as rubidium (Rb), mercury (Hg), and barium (Ba) (detected with XRF), and anhydrous carbonates like dolomite (detected with XRD), correlate with subsurface archaeal community composition along with TOC and TN. The alkali metal Rb (p = 0.0164) and the transition metal Hg (p = 0.0032) correlate significantly with the archaeal assemblages at cool and shallow sediments suggesting that these metals are proxies for shallow input, perhaps via spreading hydrothermal plumes and their fallout that gradually impact the entire basin. Alternatively, Hg and Rb can also reach the shallow Guaymas subsurface bound to terrestrially-sourced organic carbon, or deposited through clays originating from the Yaqui River sediment plume. Methanogenic Archaea, and putatively Lokiarchaeia (Asgardarchaeota) have strategies to biotransform metals including Hg when sulfate reducers are present60,61,62. While PacBio ASVs indicate dominance of Lokiarchaeia in shallow and cool sediments (Fig. 4) where sulfate reducers also coexist (Fig. 3), the metatranscriptomes of this study and the recovered MAGs associated with Lokiarchaeia from the Guaymas subsurface do not show presence or expression of the hgc genes responsible for Hg methylation.

Consistent with Rb and Hg, the alkaline earth metal Ba (p = 0.043329) correlates with the archaeal community at cool/shallow sediments. Further, the presence of dolomite (p = 0.038434) seems to distinguish the archaeal community of cool sediments from that found at deeper and hotter samples (Fig. 5b). Prokaryotes precipitate dolomite and barite (barium sulfate, BaSO4) in organic rich sediments within the sulfate-reducing and SMTZ zones, as a by-product of methane oxidation, methanogenesis and sulfate reduction63. Archaeal cell walls are efficient precipitators of dolomite due to their high carboxyl group density that allows dolomite nuclei formation64. While precipitation of Ba as BaSO4 occurs abiotically when sulfate is present, studies indicate that Bacteria (and we think potentially Archaea) may mediate Ba precipitation through formation of extracellular polymeric substances65, even below the SMTZ where sulfate is depleted.

Temperature and depth separate the archaeal ASVs of warm (20–45 °C) and hot (45–68 °C) subsurface sediments from those of most cool samples (2–20 °C). The availability of labile organic compounds, TOC (wt%; p = 0.025) and TN (wt%; p = 0.0056), also distinguishes the archaeal community in the majority of cool and shallow samples from those in warmer, deeper samples. Exceptions are Sites U1546 and U1548 where the archaeal communities of the cool sediments show greater similarity to those in warm and hot samples (Fig. 5b).

A few known thermophilic and hyperthermophilic lineages were detected in samples from hydrothermally-active sites, albeit sporadically distributed, and found only at certain depths where their biomass was sufficient for detection of their marker genes (Fig. 4). Given their patchy representation across our datasets, nMDS analyses considering only these individual lineages proved uninformative. Phylogenetic analyses of specific taxonomic groups were used to explore connections between these lineages and known (hyper)thermophilic lineages (see section below).

Major archaeal phyla and evidence for their activities

The subsurface archaeal community of Guaymas Basin was analyzed by PacBio sequencing of 800 base pair amplicons, allowing sufficiently longer sequences (>400 bp) for phylogenetic analysis at finer-scale taxonomic resolution, and for the detection of uncommon or novel phylogenetic lineages41. The PacBio ASV dataset is dominated by Bathyarchaeia (Thermoproteota66), Thermoplasmata (Thermoplasmatota67), Hadarchaeia (Candidatus Hadarchaeota68), and multiple class-level lineages within the Asgardarchaeota69 (Fig. 4). Halobacterota, which include euryarchaeotal methanogens and methane oxidizers, are generally less abundant, except at 16.8 mbsf in Site U1550B. Overall, Lokiarchaeia (Asgardarchaeota) are abundant in the shallow/cold sediments (0.8–15 mbsf/2–20 °C), and range to 45.5 mbsf (12.1 °C) at the cold seep site U1549B. Thermoplasmatota and Bathyarchaeia are widely distributed among subsurface Guaymas Basin samples, with the difference that Thermoproteota show are more frequently detected within the first 54 mbsf primarily at Sites U1545, U1546, and U1549. Finally, the Hadarchaeia show a consistent preference for deep, hot sediments in Guaymas Basin, and are found between 69.5 and 177.4 mbsf, at temperatures ranging from 44.8 °C to 62.4 °C (Fig. 4; Table 1). Analyses of metatranscriptomes for IODP 385 samples indicate expression of housekeeping genes, as well as genes associated with genome editing that support the activity and survival of these archaeal groups38. These include expressed archaeal genes associated with DNA repair mechanisms, highly organized cellular process of exosome-mediated mRNA degradation, and protein homeostasis and proteolysis, including chaperones, specific archaeal chaperonins (thermosomes) and a suite of different proteases. We discuss below additional transcriptome details not previously reported in ref. 38 for specific archaeal taxa. Outside of Guaymas Basin, these archaeal lineages appear consistently among the dominant archaea in cold, non-hydrothermal subsurface sediments5,6,45,46. Yet, their distributions within the Guaymas dataset, and their potential for different temperature preferences, prompted us to make a closer examination of these phylum-level lineages. Therefore, we searched our metatranscriptomes and retrieved transcripts affiliated specifically with these lineages, and we used long read (PacBio) 16S rRNA phylogenetic analyses to search for potentially thermotolerant or thermophilic groups within these phyla.

Hadarchaeia

Most hadarchaeial ASVs fall into phylogenetic clusters associated with deep rock aquifers and non-hydrothermal sediments (Fig. 6) and were recovered primarily from the deepest samples at each site (Table 1). The higher abundance of Hadarchaeia ASVs in deep, hot sediments agrees with previous findings that show recovery of hadarchaeial MAGs at depths > 60 mbsf and temperatures between 45 °C and 65 °C39. Although this detection pattern is suggestive of thermophilic adaptations, we did not observe any specific affiliations of our hadarchaeial ASVs with the recently described thermophilic, alkane-oxidizing hadarchaeial isolate70. Overall, Hadarchaeia are reported to be heterotrophs71, while some lineages are suggested to depend on chemosynthesis72. In our metatranscriptomes we find a small number of transcripts taxonomically annotated to Hadarchaeia, a possible consequence of reduced cell numbers in hot, deep sediments (Fig. 2). Most expressed hadarchaeia genes are “hypothetical proteins”, reflecting the paucity of data available for this taxon (Supplementary Data 2). However, functionally annotated genes indicate that Hadarchaeia are metabolically active in the deep subsurface. These genes are nascent polypeptide-associated complex proteins (NAC), involved in protein quality control73 and genes for membrane proteins that control the flexibility of the cell membrane, e.g. slipins74. Maintaining membrane integrity and protein homeostasis is likely central to the survival of Hadarchaeia of the deep hydrothermally influenced biosphere.

Fig. 6: Hadarchaeia phylogeny for Guaymas Basin samples.
figure 6

Distance phylogeny (Minimum Evolution) of Hadarchaeia based on PacBio 16S rRNA gene amplicons of ~800 base pairs. ASVs are labeled with their IODP sediment sample of origin. Bootstrap numbers are based on 1000 iterations. Cultured hyperthermophilic Crenarchaeota serve as outgroup.

Thermoplasmatota

The cosmopolitan archaeal phylum Thermoplasmatota is found in a wide range of sedimentary habitats, with different lineages exhibiting different thermal tolerances (i.e.24,75,76). In marine sediments, they dominate primarily due to their capacity to utilize a wide range of organic substrates, including detrital proteins, aromatics, lignin, acetate, and extracellular carbohydrates (i.e.77,78). In our dataset, the Thermoplasmatota, specifically class Thermoplasmata, occur within a wide range of sediment temperatures (2 °C and 62 °C; Table 1 and Fig. 4), their highest abundance is detected at samples with temperatures <20 °C. Expressed genes annotated to Thermoplasmata in our study include proteins and protein domains associated with the properties of the cell membrane (see also Supplementary Information). Maintenance of membrane fluidity is required for both cold- (i.e. growth optimum at or below 15 °C) and hot-adapted Archaea79, which could explain the expression of these genes also by Hadarchaeia and Bathyarchaeia found abundant at warm (20–45 °C; Bathyarchaeia) and hot temperatures (45–65 °C; Hadarchaeia). In addition, our metatranscriptome data show expression of the transcriptional regulators Lrp/AsnC, also called “feast-famine” regulators, that orchestrate the metabolic adaption of microbes under different nutrient conditions80. These are taxonomically widespread cellular mechanisms that conserve energy by shifting cells into a senescent state or temporarily depress transcription or translation (see also Supplementary Information). We find these genes expressed by Thermoplasmata and other archaeal taxa (Supplementary Data 2). Finally, Thermoplasmata express genes for stress-related inhibition of translation (YciH)81 in cool samples (2–20 °C), which further supports that this taxon prefers for cool and shallow Guaymas subsurface sediments, and becomes stressed by more elevated downcore temperatures.

Bathyarchaeia

Members of the Bathyarchaeia are identified in all examined sediments and often comprise dominant proportions of all recovered ASVs (Fig. 4). The wide thermal range of Bathyarchaeia is supported through the recovery of Bathyarchaeial MAGs from sediments with temperatures between 2 °C and 47 °C39. The 16S rRNA gene phylogeny indicates that the identified ASVs belong to diverse Bathyarchaeial lineages (Fig. 7). While most deep Guaymas Bathyarchaeial ASVs fall into the non-hydrothermal MCG-1 cluster, some belong to hydrothermal Bathyarchaeial lineages (MCG-16, MCG-23c82,83) that contain clones from Guaymas Basin hydrothermal surficial sediments collected with Alvin push-cores. In addition, we find a previously unidentified lineage with intron-containing 16S rRNA genes (ASV 442ec) also associated with hydrothermal vent clones from surface sediments (Fig. 7 and Supplementary Data 3, 4).

Fig. 7: Bathyarchaeia phylogeny for Guaymas Basin samples.
figure 7

Distance phylogeny (Minimum Evolution) of Bathyarchaeia based on PacBio 16S rRNA gene amplicons of ~ 800 base pairs. ASVs are labeled with their IODP sediment sample of origin. Bootstrap numbers are based on 1000 iterations. Hydrothermal clones are marked in red. Cultured hyperthermophilic Crenarchaeota serve as outgroup.

Fermentation, hydrogen cycling, carbon fixation, and aromatic hydrocarbon degradation are among the broad metabolic activities of Guaymas subsurface Bathyarchaeia (Supplementary Data 2). Interestingly, we find that Bathyarchaeia and Asgardarchaeota also express genes for synthesis of diterpenes/diterpenoids (e.g., geranylgeranyl pyrophosphate) in shallow (0.8–2.1 mbsf) and intermediate depths (25.8 mbsf; U1545B) (Supplementary Data 2). Diterpenes and diterpenoids could function as secondary metabolites and antimicrobial compounds for cellular competition, or as building blocks for archaeal membrane lipids which support the integrity of their cell membranes84.

Asgardarchaeota

The Asgardarchaeota of the Guaymas subsurface sediments are dominated by Lokiarchaeia (Fig. 8). Phylogenetically, the Guaymas Lokiarchaeia belong mostly into the cold sediment cluster A85, and are not affiliated with hydrothermal clones (Fig. 8). Other Asgard ASVs from deep Guaymas Basin sediments (e.g., Heimdallarchaeia, and Helarchaeia) also lack affinities with consistently hydrothermal lineages (Fig. 8). Our metatranscriptome data indicate the presence of active cells within several class-level lineages of Asgard archaea, including Lokiarchaeia, Heimdallarchaeia, Odinarchaeia and Thorarchaeia, and reveal an interesting metabolic repertoire that allows Asgards (Lokiarchaeia in particular) to be active in shallow and intermediate Guaymas subsurface depths. Detailed description of the Asgard transcripts is provided in Supplementary Information. Collectively, our findings suggest that Asgard cells (Lokiarchaeia in particular) exhibit evidence of increased stress with depth. For example, we find expressed genes related with a variety of transporters including energy-coupling factor (ECF) transporters responsible for micronutrient uptake from the environment86, and thorarchaeial transcripts that form a dipeptide uptake system used for energy gain under starvation87.

Fig. 8: Asgardarchaeota phylogeny for Guaymas Basin samples.
figure 8

Distance phylogeny of Asgardarchaeota based on PacBio 16S rRNA gene amplicons of ~ 800 base pairs. ASVs are labeled with their IODP sediment sample of origin. Bootstrap numbers are based on 1000 iterations. Hydrothermal clones are marked in red, cultured Asgardarchaeota in blue. The deeply-branching Marine Hydrothermal Vent Group3 serves as outgroup.

Methane-cycling archaea (Halobacterota)

Methane-processing archaea of the phylum Halobacterota are dominated by anaerobic methane-oxidizing archaea (ANME-1), recently renamed Methanophagales (Fig. 4). Phylogenetically, these ASVs form several well-supported clusters with sequences from cold methane seeps, mud volcanoes and deep subsurface sediments (Fig. 9). Methanophagales ASVs from Site U1547B cluster with clones from hydrothermal sites, including Guaymas Basin, and fall into ANME-1 subclusters identified as ANME-1b, ANME-1a and unnamed ANME-1; the latter cluster contain clones from hydrothermal sites, including Guaymas Basin (Fig. 9). These clusters include sequences from methane-oxidizing enrichments at 37 °C and 50 °C, indicative of thermotolerance for Methanophagales88,89. Small numbers of sequences affiliated with the hydrogenotrophic, methanogenic Methanocellales are found at U1549B and U1547B, and sequences affiliated with uncultured Methanosarcinales lineages are found in small numbers at U1545B, U1546B, and U1549B (Fig. 9). The hyperthermophilic Methanocaldococcus lineage is represented by ASVs from deep Ringvent samples (U1547B) at 74.2 and 94.6 mbsf, and by ASVs from other sites (Fig. 9). The closest cultured relative of these Ringvent ASVs is Methanocaldococcus bathyardenscens, an obligately hydrogenotrophic, hyperthermophilic isolate from the Juan de Fuca vents90. The Ringvent ASVs are also closely related to the type species of the genus, Methanocaldococcus jannaschii, isolated from Guaymas Basin hydrothermal vents91. The presence of Methanocaldococcus in the Ringvent sediments is further supported by a parallel survey of methyl-coenzyme reductase genes (mcrA) responsible for methanogenesis and anaerobic methane oxidation, demonstrating that Methanocaldococcus-related mcrA phylotypes at Ringvent sites U1547B and U1548B are almost identical to mcrA genes of Methanocaldococcus bathyardenscens40. Since every cultured species within the genus Methanocaldococcus that is closely related to Guaymas ASVs has so far turned out to be hyperthermophilic92,93, we consider the Methanocaldococcus ASVs the clearest example of well-documented hyperthermophiles in our archaeal dataset (Fig. 9), and avoids inferences on habitat and temperature preferences of uncultured archaea.

Fig. 9: Phylogeny of methanogens and anaerobic methane oxidizing archaea for Guaymas Basin samples.
figure 9

Distance phylogeny of Methanogens and anaerobic methane-oxidizing archaea based on PacBio 16S rRNA gene amplicons of ~ 800 base pairs. ASVs are labeled with their IODP sediment sample of origin. Hydrothermal clones and hyperthermophilic species are highlighted in red. The hyperthermophilic Methanococcales serve as outgroup.

Thermophilic archaea synthesis

Considering the occurrence patterns of thermophilic archaea in the Guaymas Basin subsurface, our results show that thermophilic lineages of the Bathyarchaeia82, thermotolerant/thermophilic clusters within the Methanophagales/ANME-188, and hyperthermophilic, obligately hydrogenotrophic Methanocaldococcus species90,91 are found consistently in the subsurface at the hydrothermally influenced Ringvent sites U1547B and U1548B. The same mesophilic, heterotrophic bacteria and archaea that constitute the sedimentary subsurface biosphere at other Guaymas Basin sites remain dominant at Ringvent, but they co-occur with these hydrothermal archaea at this location. Based on ASV frequency, the hydrothermal archaeal populations are relatively small in our samples. We hypothesize that archaeal hyperthermophiles would appear more prominently closer to the hydrothermally active circular mound of Ringvent16; however, drilling restrictions prohibited drilling directly into the hydrothermal crest of Ringvent where the hot sill intrusion approaches the sediment surface. At the hydrothermal crest hydrothermal circulation creates hydrothermal features (silica-dominated precipitates, microbial mats, tubeworms) analogous to highly active hydrothermal zones in the southern spreading center of Guaymas Basin24. We conclude that our detected ASVs of thermophilic and hyperthermophilic archaea identify a hydrothermal biosignature “halo” that surrounds the Ringvent site, but is quickly diluted in the greater sedimentary Guaymas system.

Conclusions

Analyses of MiSeq and PacBio sequencing data of partial 16S rRNA genes in this study indicates that the composition of the heretofore largely unknown deep biosphere of Guaymas Basin resembles cosmopolitan microbial communities found in organic-rich marine subsurface sediments, and is dominated by heterotrophic, mesophilic microbial phyla already known from cold, deep subsurface sediments in other locations. Amplicon-based detection of this community falls off at relatively shallow depths and moderately high temperatures (near 45 °C at most sites), consistent with decreasing cell numbers. Sequence signatures of some hydrothermal microbial populations are detected within the archaeal 16S rRNA gene dataset, using phylogenetic analyses of archaeal ASVs to tease out lineage-specific hydrothermal associations among the Bathyarchaeia, Methanophagales/ANME-1 and Methanococcales against the dominant background of widespread mesophilic, uncultured subsurface archaea. Interestingly, these thermophilic lineages are detected in relatively low ASV numbers and are largely limited to the hydrothermally impacted Ringvent sites, falling below detection at other drilling sites. We interpret the apparent preference of thermophiles for Ringvent sites as evidence that active hydrothermal circulation within the Ringvent system provides the energy supply required to sustain these thermophiles at a sufficient level so that their DNA can still be detected in nearby drilled sediments. At subsurface sites further removed from hydrothermal circulation, these hyperthermophiles disappear, even if temperatures alone would appear sufficient to allow their growth.

In conclusion, while we do find evidence for some thermophilic hydrothermal vent archaea at Ringvent sites with active subsurface hydrothermal circulation, subsurface life in Guaymas Basin sediments is not simply dominated by transplanted hydrothermal vent communities. Instead, we find that heterotrophic, cosmopolitan subsurface bacteria and archaea show some thermal adaptability and mRNA data indicate that they manage to thrive within a wide range of sediments across all of Guaymas Basin. Yet moderately high temperatures appear to select against this cosmopolitan subsurface sediment community, indicating that elevated temperature exacerbates the challenges that subsurface microorganisms already face regarding energy limitation and substrates required for cell maintenance.

Methods

Sample collection

Cores were recovered during IODP Expedition 385 (Sept 16-Nov16, 2019). Holes at each site (A-E; Fig. 1b) were first advanced using advanced piston coring (APC), then half-length APC, and then extended core barrel (XCB) coring when piston coring was no longer possible. Formation temperature measurements were made at several depths using the advanced piston corer temperature (APCT-3) and Sediment Temperature 2 (SET2) tools27,28,29,30,31,32,33,34. Interpolated temperature data for specific sediment samples were obtained by fitting linear functions to temperature measurements, as described previously40. When core sections were brought onto the core receiving platform of the D/V JOIDES Resolution, whole round samples for microbiology were sub-sampled with ethanol-wiped spatulas, capped with ethanol-wiped endcaps, and immediately transferred to the microbiology laboratory. There, the samples were placed in tri-foil gas-tight laminated bags, flushed with nitrogen, heat-sealed, and then stored at 4 °C until further processing. Masks, gloves and laboratory coats were worn by those handling the sample during all laboratory steps. Samples for metatranscriptome analysis were collected immediately upon core recovery to the receiving platform on the drill ship by subcoring into the center of a freshly cut core section, followed by flash freezing in liquid nitrogen. Within the overall transcript pool for all samples, we see expressed genes associated with general stress responses that may also reflect, at least in part, microbial responses to the pressure changes during core recovery. Until methods are developed for preserving deep drilling cores in situ, we must consider the possibility that specific cellular transcriptional activities may be up- or down-regulated to an unknown degree in response to sample recovery. For DNA analyses, core samples were transferred from their gas-tight bags onto sterilized foil on the bench surface inside a Table KOACH T 500-F system, which creates an ISO Class I clean air environment (Koken Ltd., Japan). In addition, the bench surface was targeted with a fanless ionizer (Winstat BF2MA, Shishido Electrostatic Co., Ltd., Japan). Within this clean space, the exterior 2 cm of the extruded section were removed using a sterilized ceramic knife. The core interior was transferred to sterile 50-mL Falcon tubes, labeled, and immediately frozen at −80 °C for post cruise analyses.

Core contamination control

On IODP 385, we ran the tracer perfluoromethyldecalin (PFMD) during drilling of holes at each site intended for microbiology. PFMD was introduced into the drilling fluids with a high-pressure liquid chromatography pump (rate of injection: 9.77 × 10−3 × [stroke of mud-pump, 19.54 L/stroke], 0.55 ml min−1 at 50 mud-pump strokes min−1) to provide a final concentration of ~ 0.5 mg L−1. Since microbial cells are larger than these molecules, they may not penetrate a sediment sample in proportion to the drilling fluid, and therefore the tracer is regarded as providing a qualitative estimate of contamination potential. To assess the possible degree of contamination within sediment core material, 3cc syringe samples of sediment were collected at several depths along each hole throughout the expedition to measure the concentration of detected tracer along the radius of sediment cores using gas chromatography (GC) in the vessel laboratory. Samples for GC analyses were collected on the core-receiving platform from the ends of freshly cut core sections immediately after core recovery. Samples were collected at the top of the core – where the highest contamination is expected, for selected core sections at each site. For each core targeted for GC analysis, one ~ 3cc syringe sample was collected at the interface of the sediment core and the core liner, a second sample was collected at the center of the sediment core, and a third was collected in between these two. All three samples were placed in 20 ml GC headspace vials, closed with screw caps, and stored at 4 °C for analysis. Headspace samples were heated to volatilize any tracer present, and the gas was injected into an Agilent 6890 N gas chromatograph with ECD after preparing and running calibration standards using 10−4, 10−6, 10−8 and 10−10 dilutions of each tracer. Any tracer detected on the interior of a whole round sample is interpreted to mean drilling fluid was possibly able to penetrate into the sample, and thus it is likely contaminated. Cores were not processed for microbiology where detectable tracer was measured in the core interior. Contamination was also assessed by analyzing the microbial composition of the drilling fluid (primarily composed of surface ocean water) post-cruise via DNA sequencing approaches. A sample of the drilling fluid was collected at the beginning of drilling for every hole drilled for microbiology directly from the injection pipe on the rig floor into sterile bottles with screw caps, and handled using sterile equipment. Replicate 250 mL aliquots were filtered onto 45 mm 0.20 μm pore-size Millipore Durapore filters and frozen at −80 °C and marker gene libraries were prepared as for sediment samples.

Geochemical analysis

Geochemical analyses were performed on freshly collected wet sediments and on centrifuged porewaters from downcore sediment samples at depths collected for microbiology studies from each site. For porewater analysis, approximately 40 ml were centrifuged in 50 ml conical Falcon tubes under nitrogen for 5 to 10 min at 1000 × g, resulting in the separation of 4–10 ml of porewater from the sediment, depending on depth. Porewater samples were filtered immediately through 0.45 μm filters, into Eppendorf cryovials and stored at −20 °C. At Louisiana State University Wetland Biogeochemistry Analytical Services (WBAS), filtered porewater samples were analyzed for dissolved organic carbon (DOC) and total dissolved nitrogen (TDN) using a Shimadzu TOC/TN analyzer. Nitrite plus nitrous oxide (NOx) was measured using an O.I Analytical Flow Solutions IV autoanalyzer with the Cd-reduction method (EPA 353.2). Other porewater and solid phase analyses were performed shipboard using published protocols27 and were excerpted from geochemical data tables published for each site28,29,30,31,32,33,34.

X-ray diffraction (XRD) analysis

Mineralogic analyses were performed with a Rigaku SmartLab XRD instrument equipped with a CuKα radiation source (λ = 1.54060 Å) using a 10 mm slit and at a tube voltage of 40 kV and a current of 40 mA. Samples were first dried and crushed in a mortar and then randomly oriented powders were obtained after sieving at 63 µm. Each measurement was done after vertical alignment between the X-ray source, the surface of the sample and the detector. For all XRD runs 2θ ranged between 3°–5° and 90°, scans were done at 1° 2θ /min, with a step size of 0.004° 2θ while the sample was rotating 60 times per minute. Peak search, mineral ID and quantification were carried out using the Rietveld method with Rigaku’s SmartLab Studio II and ICDD PDF-5 + 2024 database. Amorphous silica % was estimated using Rigaku’s SmartLab Studio II amorphous phase detection tool.

X-ray fluorescence (XRF) analysis

Elemental analysis (elements from Mg to U) were obtained with a Hitachi X-MET8000 Geo analyzer on dried and crushed powders after sieving at 63 µm. Measurements were run in triplicate for 60 s each and then averaged.

Microbial cell counts

The sediment sampling for cell counts occurred immediately after core retrieval on the core receiving platform by sub-coring with a sterile, tip-cut 2.5 cc syringe from the center of each freshly cut core section. Approximately 2 cm3 sub-cores were immediately put into tubes containing fixation solution consisting of 8 mL of 3xPBS (Gibco™ PBS, pH 7.4, Fischer) and 5% (v/v) neutralized formalin (Thermo Scientific™ Shandon™ Formal-Fixx™ Neutral Buffered Formalin). The mixture was stored at 4 °C until further processing. Fixed cells were separated from the slurry via ultrasonication and density gradient centrifugation94,95. For cell detachment, a 1 mL aliquot of the formalin-fixed sediment slurry was amended with 1.4 mL of 2.5% NaCl, 300 μL of pure methanol, and 300 μL of detergent mix (100 mM ethylenediamine tetraacetic acid [EDTA], 100 mM sodium pyrophosphate, 1% [v/v] Tween-8094). The mixture was thoroughly shaken for 60 min (Shake Master, Bio Medical Science, Japan), and subsequently sonicated at 160 W for 30 s for 10 cycles (Bioruptor UCD-250HSA; Cosmo Bio, Japan). The detached cells were recovered by centrifugation based on the density difference of microbial cells and sediment particles, which allows collection of microbial cells in a low-density layer. To this end, the sample was transferred onto a set of four density layers composed of 30% Nycodenz (1.15 g cm−3), 50% Nycodenz (1.25 g cm−3), 80% Nycodenz (1.42 g cm−3), and 67% sodium polytungstate (2.08 g cm−3). Cells and sediment particles were separated by centrifugation at 10,000 × g for 1 h at 25 °C. The light density layer was collected using a 20 G needle syringe. The heavy fraction, including precipitated sediment particles, was resuspended with 5 mL of 2.5% NaCl, and centrifuged at 5000 × g for 15 min at 25 °C. The supernatant was combined with the previously recovered light density fraction. With the remaining sediment pellet, the density separation was repeated. The sediment was resuspended using 2.1 mL of 2.5% NaCl, 300 μL of methanol, and 300 μL of detergent mix and shaken at 500 rpm for 60 min at 25 °C, before the slurry sample was transferred into a fresh centrifugation tube where it was layered onto another density gradient and separated by centrifugation just as before. The light density layer was collected using a 20 G needle syringe, and combined with the previously collected light density fraction and supernatant to form a single suspension for cell counting.

For cell enumeration, a 50%-aliquot of the collected cell suspension was passed through a 0.22 μm polycarbonate membrane filter. Cells on the membrane filter were treated with SYBR Green I nucleic acid staining solution (1/40 of the stock concentration of SYBR Green I diluted in Tris-EDTA [TE] buffer). The number of SYBR Green I– stained cells were enumerated either by a direct microscopic count96 or an image-based discriminative count97. For image-based discriminative count, Count Nuclei function of MetaMorph software (Molecular Devices) was used to detect and enumerate microbial cells.

DNA extraction

Bacterial and archaeal 16S rRNA gene amplicon sequence variants (ASVs) were obtained from 1.7-177.4 mbsf at the Site U1545B; 0.8–168.8 mbsf Site U1546; 2.1–94.7 mbsf Site U1547B; 2.1–76.5 mbsf Site U1548; 1.6–161.4 mbsf Site U1549; 2.0–54.8 mbsf Site U1550; 0.8-34.2 mbsf Site U1551; 0.8–46.9 mbsf Site U1552 (Table 1). The depths at which cores were sectioned for DNA extracts were dependent upon temperature profiles observed at each site43. DNA extracted from each sample was used for both Illumina MiSeq and PacBio 16S rRNA amplicon sequencing. DNA was extracted from 59 sediment samples using between 0.5 and 9.5 grams of sediment, depending on DNA yield. A FastDNA™ SPIN Kit for Soil (MP Biomedicals) was used following the manufacturer’s protocol with homogenization modifications as described in ref. 44. Briefly, 0.5–9.5 grams (depending on depth) of sediment were homogenized with a FastPrep™-24 in Lysing Matrix E tubes for 40 s at speed 5.5, twice, with a 2 min pause on ice in between. Final extracts were concentrated using Amicon® Ultra- 0.5 ml Centrifugal Filters (MilliporeSigma). We note that DNA extractions occurred into two laboratories (Edgcomb Lab, WHOI; Tesk Lab, UNC), and hence, two kit control samples where no sediment was added were extracted with the same extraction method to confirm absence of contamination due to reagents and handling. In addition, a drilling fluid control filter was extracted at Edgcomb Lab with the same DNA extraction method to further remove taxa not recovered from the sediments.

Prokaryotic (bacterial and archaeal) 16S rRNA gene amplification and Illumina MiSeq sequencing

The 16S rRNA gene V4/V5 hypervariable regions were targeted using the general prokaryotic primer pair 515F-Y (5′-GTGYCAGCMGCCGCGGTAA-3′98) and 926 R (5′-CCGYCAATTYMTTTRAGTTT-3′99) to recover 16S rRNA gene fragments of both Bacteria and Archaea. Libraries for samples 1547B-1H2, 1547B-3H2, 1545B-1H2, and 1545B-6H2 were prepared from DNA extracts by the Georgia Genomics and Bioinformatics Core (GGBC) at the University of Georgia. Libraries for all other samples and control fluid filters were prepared internally through the amplification steps described below, before sent to GGBC (for Illumina MiSeq) or the University of Delaware DNA Sequencing & Genotyping Center for final library preparation and sequencing. Illumina MiSeq overhang adapter sequences were added to locus-specific primers for use in first round Polymerase Chain Reaction (PCR) amplifications. 16S rRNA PCR amplifications were performed for each sample (1:10 dilution) in triplicate using SpeedStar™ HS DNA Polymerase (TaKaRa) and 10X Fast Buffer I as described by the manufacturer. Thermocycling conditions were: 95 °C for 5 min; x30 (95 °C for 30 s, 60 °C for 30 s, 72 °C for 60 s); 72 °C for 5 min and a 4 °C hold. PCR amplification replicates were combined and purified with AMPure® XP beads (Beckman Coulter). Extraction kit control amplifications were attempted at both Laboratories where extractions were physically performed with only one yielding an amplicon sufficient for library construction. All libraries produced with 515F-Y/926 R amplified fragments were sequenced on an Illumina MiSeq platform with PE300 read lengths.

Archaeal 16S rRNA gene amplification and PacBio sequencing

To better capture the diversity of the subsurface archaeal community, we used the archaeal 16S rRNA primer set Arch25F and Arch806R that targets the V2-V4 hypervariable regions and generate a ~800 base pairs 16S rRNA amplicon. The use of larger 16S rRNA gene amplicons provides an improved basis for phylogenetic identifications, and alternate primers reduce the dependency on the extremely widespread Prokaryotic Miseq primer set98; both factors are essential for the detection of rare or novel phylogenetic lineages41. Samples for PCR amplification using Archaea-targeting primers and PacBio long-read sequencing are provided in Table 1. The primer set is Arch25F (5′TCYGKTTGATCCYGSCRG 3′100) and Arch806R (5′GGACTACVSGGGTATCTAAT 3′101). The 806R primer site is unusually conserved among archaea, including uncultured subsurface lineages102. PCR reactions were performed using the SpeedSTARTM HS DNA Polymerase (TaKaRa) kit with the following modifications: each 25 μΜ PCR reaction contained up to 1 ng of template DNA, 2X Fast Buffer I, 2.5 mM dNTP mixture, 5 units of SpeedSTAR HS DNA Polymerase, 10 mM of each primer and DEPC water (Fisher BioReagents™) up to 25 μΜ. The PCR reactions were performed in an Eppendorf Mastercycler Pro S Vapoprotect (Model 6321) thermocycler with the following conditions: 95 °C for 5 min, followed by 30 cycles of 94 °C (30 s), 55 °C (30 s), 72 °C (45 s). The total volume of PCR reactions was run in 2% agarose gel (Low-EEO/Multi-Purpose/Molecular Biology Grade Fisher BioReagents™) and the correct size PCR products (~800 bp), were isolated and recovered from the gel using the Zymoclean Gel DNA Recovery Kit as instructed by the manufacturer. Libraries for PacBio sequencing were prepared from the recovered and gel purified DNA extracts at the University of Delaware DNA Sequencing & Genotyping Center.

16S rRNA marker gene analyses

Sequenced reads for 16S rRNA gene fragments were analyzed with the QIIME2 pipeline for paired-end (Illumina MiSeq) and single-end (PacBio) sequencing employing the DADA2 denoising method for amplicon sequence variant (ASV) construction103,104. ASVs in our samples that were matches to ASVs in either kit or drilling fluid controls were removed within the QIIME2 pipeline. In addition, prior to downstream analyses, we manually remove any remaining ASVs in our data sets that were taxonomically annotated to known contaminants of human and terrestrial origin and kit contaminants105. Drilling fluid taxa removed from the final data set include: Mycobacteriales and Propionibacteriales (Actinobacteria), Staphylococcales (Firmicutes), Paceibacterales, UBA1400, UBA9983_A, and Microgenomatia (Patescibacteria), and Burkholderiales and Pseudomonadales (Gammaproteobacteria). Taxonomy was assigned using a trained classifier (q2-feature-classifier106) with the SILVA_132_QIIME_release or SILVA_v138.1_release107,108 as a reference database for Illumina MiSeq or PacBio sequences respectively. Illumina MiSeq 16S rRNA gene reads are deposited into the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under access numbers SRR23641000-SRR23641053. The PacBio reads are deposited under SRA access numbers SRR23604162-SRR23604206.

Phylogenetic analyses

Phylogenetic analyses were performed with PAUP4.0*109 using HKW85 distances under minimum evolution optimality criteria, with the gamma distribution shape parameter set to 0.5. Phylogenetic trees were constructed with sequences selected from cultured bacteria and archaea and complemented by environmental sequences (limited to ca. 50 to 80 sequences for readability). Sequences were manually aligned in SEQPUP (Version 0.6, http://iubio.bio.indiana.edu/soft/molbio/seqpup/), guided by secondary structure to maintain positional homology across the alignment. Primer sequences were excluded from alignments and phylogenies. Tree topology was initially checked by 100 distance bootstrap replicates and after final species selection by 1000 neighbor bootstrap replicates, yielding consistent support for phylogenetic branches and clusters. Whenever possible, phylogenetic trees were checked against published phylogenies of subsurface archaea for consistency3,83,85.

RNA extraction and sequencing

Total RNA was extracted successfully from 19 sediment samples from sites U1545B-U1552B (Supplementary Data 2). Before each RNA extraction, all samples including a blank sample (control), were washed twice with absolute ethanol (200 proof; purity ≥ 99.5%; Thermo Scientific Chemicals), and one time with DEPC water (Fisher BioReagents) to remove hydrocarbons and other inhibitory elements present in Guyamas sediments, that without these washes, resulted in low or zero RNA yield. In brief, 13–15 grams of frozen sediments were transferred into UV-sterilized 50 ml Falcon tubes (RNAase/DNase free) using clean, autoclaved and ethanol-washed metallic spatulas. Each tube received an equal volume of absolute ethanol and was shaken manually for 2 min followed by 30 s of vortexing at full speed to create a slurry. Samples were transferred into an Eppendorf centrifuge (5810 R) and were centrifuged at room temperature for 2 min at 2000 rpm. The supernatant was decanted, and the ethanol wash was repeated. After decanting the supernatant of the second ethanol wash, an equal volume of DEPC water was added into each sample. Samples were manually shaken and vortexed as before to create slurry and were transferred into the Eppendorf centrifuge (5810 R) where they were centrifuged at room temperature for 2 min at 2000 rpm. The supernatant was decanted, and each sediment sample was immediately divided into three bead-containing 15 mL Falcon tubes, provided by the PowerSoil Total RNA Isolation Kit (Qiagen). RNA was extracted as suggested by the manufacturer with the modification that the RNA extracted from the three aliquots was pooled into one RNA collection column and eluted at 30 μl final volume. All RNA extractions were performed in a UV-sterilized clean hood (two UV cycles of 15 min each) that was installed with HEPA filters. Surfaces inside the hood and pipettes were thoroughly cleaned with RNase AWAY (Thermo Scientific) before every RNA extraction and in between extraction steps. Trace DNA contaminants were removed from RNA extracts using TURBO Dnase (Thermo Fisher Scientific) and the manufacturer’s protocol. Removal of DNA from the RNA extracts was confirmed with PCR reactions using the bacterial primers BACT1369F/PROK1541R (F: 5′CGGTGAATACGTTCYCGG 3′, R: 5′AAGGAGGTGATCCRGCCGCA 3′110), targeting the small ribosomal subunit (SSU) of 16S rRNA gene. Each 25 μl PCR reaction was prepared using GoTaq G2 Flexi DNA Polymerase (Promega) and contained 0.5 U μl–1 GoTaq G2 Flexi DNA Polymerase, 1X Colorless GoTaq Flexi Buffer, 2.5 mM MgCl2, (Promega) 0.4 mM dNTP Mix (Promega), 10 mM of each primer (final concentrations), and DEPC water. These PCR amplifications were performed in an Eppendorf Mastercycler Pro S Vapoprotect (Model 6321) thermocycler with following conditions: 94 °C for 5 min, followed by 35 cycles of 94 °C (30 s), 55 °C (30 s), and 72 °C (45 s). The PCR reaction products were run in 2% agarose gels (Low-EEO/Multi-Purpose/Molecular Biology Grade Fisher BioReagents) to confirm absence of DNA products. RNA quantification (ng μl−1) was performed using Qubit RNA High Sensitivity (HS), Broad Range (BR), and Extended Range (XR) Assay Kits, (Invitrogen).

Amplified cDNAs from the DNA-free RNA extracts were prepared using the Ovation RNA-Seq System V2 (Tecan) following manufacturer’s suggestions. cDNAs were submitted to the Georgia Genomics and Bioinformatics Core for library preparation and sequencing using NextSeq 500 PE 150 High Output (Illumina). The sequencing of the cDNA library from the control sample was unsuccessful as it failed to generate any sequences that met the length criterion of 300–400 base pairs.

Metatranscriptome data analyses

Raw sequencing reads were trimmed to remove adapters and low-quality bases using fastp (v0.23.2)111 with parameters (-q 20 -u 20 -l 50 -w 16 -5 -M 30 -g -D --detect_adapter_for_pe --dup_calc_accuracy 6). We used Trinity (v2.14.0)112 to assemble the 19 metatranscriptomes with default settings. Trinity generated 640,136 assembled metatranscripts with size > 165 bp. We performed DIAMOND (v.2.0.7) BLASTx113 against NCBI-NR database (release date: 2022-12-04) to provide functional and taxonomic annotations for the assembled metatranscriptomes. The control sample failed to generate sequences that met the minimum length criterion. Therefore, to remove putative contaminants, an in-house database was constructed that contained reference genomes from taxa identified as potential kit contaminants and human pathogens105. Further, we also searched the DIAMOND BLASTx output of our contigs for taxonomic classifications that belonged to any other putative contaminants (not already in our in-house database), e.g., known pathogens such as Coxiella, usually not listed as a common kit contaminant. For this additional search the DIAMOND BLASTx output was filtered to only consider annotations with e-values that were stronger than 1e-5. The reference sequences of all putative contaminants were downloaded from NCBI-NR database (release date: 2022-12-04) and were added to our database (Supplementary Data 6). All contigs in our sample data that matched any sequences in this contaminant database with > 90% similarity over > 50% of the contig length were removed from downstream analysis. This process removed 8,301 transcripts (8,301/640,136; ~1.2%). The remaining decontaminated assembled transcripts were processed with Prodigal (v2.6.3)114 to predict gene and protein sequences. CD-hit115 (v. 4.8.1; -c 0.95 -aS 0.9 -n 10) was used to cluster genes and to remove redundancy. For functional annotation, KofamScan (v.1.3.0)116 and GhostKOALA (v.2.2)117 were used to assign orthologs (KOs) to protein sequences using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. DIAMOND (v.2.0.7) BLASTp113 (v2.0.15.153, -e 1e-5 --more-sensitive) was used to search against NCBI-NR database (release date: 2022-12-04).

To gain a more detailed insight on the function and taxonomy of the expressed genes, the decontaminated and non-redundant assembled transcripts were re-run with DIAMOND BLASTx113 (v2.0.15.153, -e 1e-5 --more-sensitive) against NCBI-NR database (release date: 2022-12-04). The BLASTx results with e-values stronger than 1e-5 were manually curated for expression of genes described in Supplementary Data 2 for each taxon. We recognize that any automated and manual pipeline that is used to assign gene function has the caveat that publicly available databases may contain some protein sequences that have not been functionally validated on the bench. The expression level of each transcript was estimated in units of transcripts per million (TPM) using Salmon (v1.9.0, --meta)118. The TPM values of all transcripts annotated to same gene were summed and were added to a value of 1 (to avoid zeros) and normalized using log2-transformation. Metatranscriptome reads were deposited to the National Center for Biotechnology Information Sequence Read Archive under accession numbers SRR22580929-SRR22580947.

Statistics

All statistical and multivariate analyses were done using R (version 4.2.1; R Core Team 2013) with the Vegan 2.6-4 package119. A Bray-Curtis dissimilarity matrix was created using Hellinger-transformed abundance tables120 with the Vegan vegdist function. nMDS was utilized for analyses of ASV distributions121. The stress value is reported as a measure of goodness of fit. Stress values less than 0.2 indicate low probabilities of producing the wrong inference. Environmental metadata were normalized by subtracting the mean and then dividing by the standard deviation. Fit statistics (r2 and p-value) of environmental metadata fitted to the nMDS axes after 10,000 permutations were assessed using the Vegan function envfit (Supplementary Data 1).

Hydrocarbon analyses

A total of thirty-two sediment cakes from the eight IODP385 sites were submitted for hydrocarbon analyses (Supplementary Data 5). The sediment cakes extend from 1.7 down to 500.6 mbsf covering the depth range used for our 16S rRNA amplicon surveys. Specifically, sediment cakes from shallow (1.7–4.3 mbsf) and deeper depths (17.4 and down to 500.6 mbsf) at each site were analyzed for traditional fingerprinting diagnostic compounds, (e.g., saturated hydrocarbons, polynuclear aromatic hydrocarbons (PAHs), alkylated PAHs) by Alpha Analytical using United States Environmental Protection Agency (EPA) method 8015 (GC-FID; saturates) and a modified method 8270D (GCMS; PAHs). Methods and performance details are provided in ref. 122.