Background

Wastewater treatment plants (WWTPs) are engineered microbial ecosystems important for human health, prevention of receiving water pollution, and for recovery of resources. The activated sludge (AS) plants are by far the most common design type worldwide, and the recent development of ecosystem-specific reference databases and a unified taxonomy for microbial community analyses at species level [1, 2] have made the AS microbiota one of the best studied microbial ecosystems. Amplicon profiling is popular for these studies and here the 16S rRNA gene V1-V3 variable region is able to obtain species-level resolution achieving 90.7% correct classifications when used with an ecosystem-specific reference database, which is comparable to full-length 16S rRNA gene amplicons [1, 3]. Altogether this makes it possible to address both theoretical and practical aspects of microbial ecology in engineered ecosystems.

The assembly of the AS microbiota is determined by a combination of deterministic and stochastic processes [4,5,6,7], with AS ecosystems worldwide comprising numerous taxa [1, 8]. However, the presence of a shared core community of the AS microbiota in municipal treatment systems has shown that it consists of only relatively few abundant taxa, although they constitute a large fraction of the total biomass [1, 8]. These taxa are mostly uncultured but are assumed to perform process-critical functions such as removal of nitrogen (N) and phosphorus (P) although the specific function is not known for all [1]. Taxa shared among the majority of WWTPs (> 80%) with a defined process design (e.g., removal of C and N), are denoted strict core taxa, while the general and loose core taxa are abundant in 50% and 20% of the WWTPs, respectively, thus being responsible for differences in the AS communities across WWTPs. Additionally, conditionally rare or abundant taxa (CRAT) (> 1% relative abundance in at least one plant and not part of core) can be very abundant in a few WWTPs, and may be related to process disturbances such as sludge foaming or bulking, potentially having a great impact on treatment performance [1, 9]. These studies have revealed a list of the “most wanted” taxa to study in greater detail as they make up the vast majority of process-critical taxa across global AS plants. The list consists of approx. 950 genera and 1500 species [1].

Recent studies have challenged the idea that all observed core/CRAT species are process-critical [4, 10]. Some of the core/CRAT species might be dead or inactive in the AS system and solely present due to continuous immigration from incoming wastewater and thus present in all WWTPs only due to similarities in source communities (i.e., sewer systems and gut). It is increasingly clear that immigration of bacteria from incoming wastewater is forming the background inputs to the AS community, determining community composition and affecting also the abundance of taxa in WWTPs [4, 11,12,13]. Many of those, e.g., gut bacteria, may come in large amounts in the influent, but will die off in the AS system, thus not representing active process-critical taxa, although potentially present in high abundance [2, 4, 10].

Despite the determination of the list of global core/CRAT taxa making up a large biomass fraction across global AS plants, the same studies also indicate differences in the microbiota across plants. Here, biogeographical analyses, including distance-decay relationships (DDR), across all 6 (excluding Antarctica) continents revealed a decrease in similarity with increasing geographical distance [1, 8] as is also observed in many other habitats, such as soil, lakes and forests [14,15,16]. In particular, the taxa resolved at amplicon sequence variants (ASV) level showed a significant decrease in similarity, whereas DDR was much smaller at the genus level, underlining that the genera in wastewater treatment systems are shared worldwide but perhaps not the species or strains [1]. This highlights that microorganisms display biogeographic patterns but it remains unknown if the DDR is in the same way effective on a local scale within shorter distances, which could potentially explain community differences observed in countries or regions [2, 17, 18].

Here, we have selected 84 municipal AS WWTPs treating ~ 70% of all wastewater within a confined geographical area, Denmark, all with similar design and process conditions. We wanted 1) to establish a complete list of species-level taxa found in these WWTPs, 2) identify core and CRAT species and their relative abundance, 3) investigate which core taxa are active key process-critical species in AS based on mass balances, hypothesizing that all are not, as some are only present due to mass-immigration from the wastewater, and 4) to test the hypothesis that geography is an important factor in controlling the microbial communities on a scale of a few hundred kilometers, the size of Denmark (43,000 km2).

Methods

Sample collection

Activated sludge samples were collected from 84 municipal WWTPs in Denmark and influent wastewater (IWW) from 69 of these plants. The plants were sampled during September-December 2019 except Aalborg East WWTP sampled during January-June 2022 (Supplementary Fig. S1). All plants collected 10 sets of IWW and AS samples (paired by date) at regular intervals over 1–2 months or 3 sets of IWW and AS samples within the same day, in the morning, midday and afternoon. IWW samples were collected as 50-mL subsamples from a 24-h flow proportional sampler while AS samples consisted of 2-mL subsamples of 50-mL grab samples collected from the aeration tank.

The resulting dataset accounts for 1365 samples, including 741 AS samples and 624 IWW samples. The following process parameters were recorded where possible: design type (biological removal of C and N and chemical or biological removal of P), presence of primary settlers, industrial load (as organic load), size (evaluated as person equivalents, PE), sludge retention time, concentration of suspended solids in the oxic process tank and influent organics measured as chemical oxygen demand (COD) found in sample metadata in Supplementary Dataset S1. All plants met the effluent quality standard and had nutrient removal of C, N, and P and similar environmental conditions.

Amplicon sequencing workflow

DNA was extracted from AS and IWW samples and prepared for 16S rRNA gene amplicon sequencing as described in Dottorini et al. [4] targeting the region V1-V3 using the 27F (AGAGTTTGATCCTGGCTCAG) [19] and 534R (ATTACCGCGGCTGCTGG) [20] primers. Sequencing was performed using Illumina MiSeq (Illumina, USA) paired-end (2 × 300 bp). Protocols for DNA extraction, sample preparation, and amplicon sequencing can be found here: https://www.midasfieldguide.org/guide/protocols. Raw amplicon sequences were processed with the ASV pipeline (https://github.com/KasperSkytte/ASV_pipeline/tree/master) to obtain ASVs, which were then mapped to the full-length ASVs (FL-ASVs) from the database generated by Dueholm and colleagues [3] for taxonomy assignment using the MiDAS 5.2 ecosystem-specific reference database [1, 21] (available at https://www.midasfieldguide.org/guide/downloads). The sequence identity cutoffs for taxonomic assignment followed Yarza et al. [22] with a species-level cutoff of 98.7% and when species-level classification was not available, the taxonomic classification was assigned by combining the ASV classification with the first available taxonomic level (e.g., genus).

Data analysis

Downstream data processing was performed in R v.4.2.2 [23] using Rstudio [24] mainly through the ampvis2 package v.2.8.6 [25]. Additional used R packages include: tidyverse v.2.0.0 [26], vegan v.2.6–4 [27], geosphere v.1.5–18 [28], mapDK v.0.3.0 [29]. Related R scripts can be found at https://github.com/SofieZacho/mfd-wwtp-microbiota. Prior to data analysis, duplicate samples were merged by taking the mean number of reads for each ASV. AS samples with less than 10,000 reads and IWW samples with less than 50,000 reads were discarded. All AS samples were rarefied to 10,000 reads and IWW samples to 50,000 reads. Rarefaction curves can be found in Supplementary Fig. S2.

The core and CRAT taxa were found as described in [1]. The core taxa groups were identified at ASV, species, and genus level as having an average abundance higher than 0.1% relative abundance in 80% (strict core), 50% (general core) and 20% (loose core) of WWTPs, respectively; CRAT taxa were identified as having average relative abundance higher than 1% in at least one WWTP, but not belonging to the core taxa. All average abundance values were calculated as mean relative abundances of all samples for each WWTP. Additionally, the DDR was determined in accordance with Dueholm et al. [1]. Briefly, DDR was determined for ASVs, species and genera using untransformed values of geographic distance against microbial community similarity distance (Bray–Curtis or Sørensen). The geographical distance between samples were calculated using the Haversine formula. Strength of correlation was examined with the Mantel test using Spearman correlation and 999 permutations.

Differences in overall microbial community structure were explored by principal coordinate analysis (PCoA) with the ampvis2 package using beta-diversity distances based on Bray–Curtis dissimilarity for ASVs. Statistical differences between PCoA clusters were assessed by permutational multivariate analysis of variance (PERMANOVA) using the adonis2 function in the vegan R package with 999 permutations. Heatmaps were made with the amp_heatmap function in the ampvis2 package taking the average relative abundance for each WWTP. Maps of Denmark were plotted with the mapDK package.

The fate of wastewater taxa when they were introduced into the AS plant was calculated as the apparent net-specific growth rate for each ASV based on mass balances between sample pairs of IWW and AS ([4], with minor modifications). Briefly, the fate of ASVs found in more than 10 sample pairs (IWW and AS of same day) with at least 0.01% in an AS sample or 0.005% in an IWW sample were determined. The remaining ASVs were directly categorized as 'inconclusive' due to their low abundance in the dataset. Additionally, ASVs with inconsistent trends between samples—not consistently assigned the same growth group in more than 50% of IWW-AS sample pairs were also categorized 'inconclusive', as they were found to either disappear, grow, or survive in a similar number of samples.

The overall similarity between microbial communities in terms of identity of ASVs in AS and IWW was investigated based on Jaccard distances and performed only on samples from plants with both AS and IWW samples (n = 69). A Wilcoxon test was performed to assess significance of difference, and p-values were Bonferroni corrected for multiple comparisons (Supplementary Fig. S12).

Results

Microbial community structure across 84 Danish WWTPs

The 84 municipal WWTPs investigated are treating almost 70% of the total volume of wastewater in Denmark [30] and represent 29% of all Danish WWTPs with advanced biological nutrient removal (nitrification/denitrification and P removal). They are broadly located across the country in five different regions, 3 in the mainland (Northern, Central, Southern) and 2 on major islands (Zealand, and Funen) (Fig. 1A). The plants have overall similar process design and performance, all meeting effluent standards. They varied in several parameters, including sizes (2500–1,000,000 PE), percentage of industrial loads (0–85%), total sludge retention times, and presence of primary settlers (Supplementary Dataset S2).

Fig. 1
figure 1

A Geographical distribution of 84 investigated WWTPs. Points are colored according to region (Northern, Central, Southern, Funen or Zealand) and number of plants in each region is given. B Community diversity in AS. Beta-diversity in AS visualized as PCoA plot based on Bray–Curtis dissimilarity matrix at the ASV-level. The explained variance (R2) refers to differences of communities among WWTPs, each represented by a different color

The AS microbiota from all plants contained in total almost 66,000 different ASVs as investigated by amplicon sequencing, which can be assigned to approx. 12,500 different species by classification. However, 33,000 ASVs could not be classified at species-level, corresponding to around 30% of total read abundance. On average, each plant contained 6,800 (standard deviation = 1,968) different ASVs classified as 2,500 different species (standard deviation = 611).

The microbiota of AS plants were overall very similar with the most abundant ASVs belonging to the genera Ca. Phosphoribacter (formerly Tetrasphaera), Rhodoferax, and Ca. Microthrix, found across all plants, which is typical for Danish WWTPs [2]. Their varying abundances, however, resulted in plant-specific communities as indicated by the different clustering of samples by plant (Fig. 1B) taking into account the taxa abundance (Bray–Curtis dissimilarity based on ASVs). The PERMANOVA analysis showed that 86% of the variance observed could be explained by the ID of plants, confirming that each plant was characterized by a community with a defined bacterial composition and abundance also to some extent reflected in the IWW samples with an R2 = 0.72 (Supplementary Fig. S3).

Core and conditionally rare or abundant taxa

Members of the communities were grouped into strict, general, and loose core taxa at three different taxonomic levels (ASV, species, genus) to identify important taxa (Figs. 2, 3, Supplementary Dataset S3). The cumulative relative abundance of the microbial community represented by the three core groups and CRAT at different taxonomic levels were generally similar across different plants, except for a few plants (mainly Kalundborg and Marstal) (Fig. 2). These plants appeared different, especially when the core was evaluated at ASVs level (Fig. 2A). Marstal had, for unknown reasons, an extremely high abundance of ASV3 (60% of all reads in this plant) belonging to the general core genus Ca. Amarolinea. Kalundborg had a high industrial load and high process temperature, affecting characteristics of influent wastewater and growth conditions, respectively, leading to an overall different microbial community compared to the other plants.

Fig. 2
figure 2

Proportion of core and CRAT taxa in WWTPs. Cumulative read abundance of ASVs (A), species (B), and genera (C), respectively, in all plants colored by strict core, general core, loose core, CRAT, other taxa, and unclassified taxa. The cumulative relative abundance of different core groups was calculated in each plant as mean across samples (left) and across all plants (right). Number of taxa belonging to each grouping is shown in parentheses. The core groups were identified for having an abundance higher than 0.1% relative abundance in 80% (strict core), 50% (general core) and 20% (loose core) of WWTPs; CRAT were identified for having relative abundance higher than 1% in at least one WWTP and not belonging to the core

Fig. 3
figure 3

Species of strict and general core. Heatmap of the 51 species making up the strict (n = 6) and general core (n = 45) in 84 WWTPs. On y-axis: species name colored in olive (top) and dark red (bottom) when they belong to strict and general core, respectively. The color of the heatmap refers to the mean relative read abundance of each species across samples in each WWTP

The genera belonging to the core groups included 56 strict core, 67 general core, 103 loose core, as well as 39 CRAT, a total of 265 genera. The most abundant strict core genera included Ca. Phosphoribacter, mean relative read abundance = 4.2% ± 4.4% (mean ± stdev); Rhodobacter, 3.0% ± 2.3%; Trichococcus, 2.1% ± 1.8%; Acidovorax, 1.9% ± 1.4%; and Rhodoferax, 1.4% ± 0.9% (Supplementary Fig. S4). On average, more than 75% of the accumulated read abundance in each plant was composed by genera belonging to the cores (Fig. 2C). In particular, more than 40% of the total read abundance was made up by strict core genera, which represented the biggest core fraction of the total cumulative read abundance. The loose core and the CRAT included genera such as Nitrotoga, Defluviicoccus, Kouleothrix, Brachymonas, and Azoarcus, most of which are well-known process-critical taxa [9, 31,32,33].

When the core was characterized at ASV or species level, the core community made up smaller fractions of the total biomass of the plants compared to core genera: 51.1% and 42%, respectively. This is mainly because the plants were more similar on genus-level compared to species- and ASV-level. However, also because of the more stringent requirements for identifying a core ASV or species. The same abundance threshold (> 0.1%) was used despite each genus might contain several ASVs and species. The relative abundance covered by the ASV-level core taxa should in theory not exceed that of the species-level core. However, the relatively high fraction of unclassified reads on species-level (~ 30%) made this possible. The majority of these reads were unclassified due to poor resolution of the 16S rRNA gene fragment (V1-V3) for certain species, with 23% out of the 30% unclassified reads having a species level reference (> 98.7% identity) in the database.

In total, 371 different ASVs made up the core and CRAT groups in the 84 WWTPs. Of these, 92% had genus-level classification and 67% had species-level classification. At ASV level, the loose core represented the biggest fraction (Fig. 2A) and included Ca. Saccharimonas aalborgensis ASV127, Dokdonella ASV120 (unclassified species), and Ca. Lutibacillus vidarii ASV15 among the top 3 most abundant ASVs on average across WWTPs. At species level, the general core species made up the biggest core fraction on average across plants (15.6%—Fig. 2B) and included Ca. Amarolinea dominans, Ca. Microthrix parvicella, and Acidovorax midas_s_1484 among the top 3 most abundant species on average across plants (Fig. 3).

Not all core taxa and CRAT grow in the activated sludge plants

The influent wastewater (IWW) data allowed us to assess whether core and CRAT species were growing in the activated sludge process or were primarily present due to mass-immigration with the influent, subsequently dying off and likely not being process-critical. Given the relative read abundance in both IWW and AS and by establishing a mass balance for flow and biomass across the plants, we calculated the apparent specific growth rate of every ASV in each AS plant and assigned one of the following growth patterns: growing in the system, disappearing due to low growth rates, decay or predation, surviving or inconclusive, which may grow or disappear depending on the WWTP operational conditions (Fig. 4A, Supplementary Dataset S4). The growth pattern for each ASV was very consistent across the different AS plants.

Fig. 4
figure 4

Fate of ASVs present in IWW and AS reflecting their growth pattern in AS. A Growth fate for all ASVs detected in IWW and AS. B Growth fate for ASVs belonging to different core/CRAT subgroups. Percentages are calculated as mean for all plants. Numbers in parentheses refers to the number of ASVs with the specific fate

Interestingly, only slightly more than half of the core/CRAT ASVs (210/371) were growing in the plants (Fig. 4B). Most ASVs in the strict core were growing (8/11), most coming with the IWW in low concentrations. For example, ASVs from the genus Ca. Phosphoribacter (formerly Tetrasphaera), a well-known polyphosphate-accumulating organism, was found in very low abundance in the IWW, but is a strict core member growing in all Danish WWTP. In contrast, some non-growing core/CRAT ASVs were ubiquitously present in high abundance in the IWW, such as ASVs belonging to Acidovorax and Trichococcus, constituting on average 22.2% of the incoming read abundance, and they disappeared in the AS (results not shown). Other ASVs belonging to species such as midas_s_59 (family Sphingomonadaceae), midas_s_68 (family Saprospiraceae), or midas_s_21 (genus CL500-29 marine group) lack in-situ physiology descriptions, yet belonging to the growing strict/general core implies a potential significance for the AS process. Among the process-critical (growing) but less common species (loose core and CRAT genera), several well-known genera were observed, such as Nitrotoga, Defluviicoccus, and Kouleothrix (Supplementary Dataset S3).

The growing core/CRAT ASVs made up in total 38.1% of the cumulative relative read abundance in AS, but only 2.3% in the incoming wastewater. Conversely, ASVs of disappearing, surviving and inconclusive fates made up only 11.7% of the cumulative relative read abundance in AS while constituting 31.5% of the incoming read abundance (Fig. 4B) suggesting most of them were not of functional importance and thus not growing in the AS. Interestingly, when examining the core and CRAT ASVs separately, 168/221 of core ASVs were growing in contrast to only 42/150 CRAT ASVs (Fig. 4B). In other words, only approx. a quarter of the CRAT ASVs were active in the AS while the rest were most likely only present due to continuous immigration.

Geography and immigration shape the microbiota in WWTPs

The microbiota within the Danish WWTPs were overall very similar with the same taxa being abundant, but beta-diversity measures revealed different plant-specific communities (Fig. 1B). To test our hypothesis that geography is an important factor in controlling the microbiota on a very limited geographical scale, we examined the variance in AS microbial communities explained by geographic regions. Our findings show that regional differences explained 9% of the observed variance (Fig. 5A). Despite the relatively low degree of explanation, we found that AS communities in the Northern, Central, and Southern region clustered separately from communities in the Funen and Zealand regions, indicating overall differences between mainland and island regions. The same geographical dynamics were observed for core community and not to the same extent present for CRAT ASVs (Supplementary Figs. S5 and S6). However, distinct regional patterns for the IWW community were only to some extent observed (Supplementary Fig. S7), most likely masked by shared human gut and sewer microorganisms across Denmark or the exceptionally heavy rainfall during the sampling period, which might have diluted and made the IWW communities more uniform.

Fig. 5
figure 5

Effects of environmental and process parameters on the microbial community structure of activated sludge. Principal coordinate analyses of Bray–Curtis beta-diversity are calculated for ASVs. Results from PERMANOVA are shown in the right upper corner and homogeneity of variance grouped by parameters are shown in Supplementary Fig. S8. A Regions (Northern, n = 12; Southern, n = 19; Zealand, n = 21; Central, n = 19; Funen, n = 13), B Design—all plants have removal of C, N, and P. P is removed primarily with chemical precipitation (ChemP, n = 38) or biological removal (BioP, n = 43) or unknown (n = 3). C Persons equivalents (PE) reported in thousands (4 samples, each from Lynetten WWTP (1 mio PE), exceed 420 thousand PE). For PERMANOVA test following groups have been applied, size: Small <  = 25,000 PE; medium = 25,000–100,000 PE, large > 100,000 PE. D Percentage of industrial load in incoming wastewater (plants with unknown industrial load are colored gray). For PERMANOVA test following groups have been used, industrial load: None = 0%, very low = 0–10%, low = 10–30%, medium = 30–50%, high = 50–85%

Furthermore, we calculated the distance-decay relationship on ASV level and found an effective decay for AS samples across Denmark (Fig. 6), suggesting that geographical differences and spatial distances also applied on the local scale investigated here. Also for genus- and species-level clear distance decays were observed (Supplementary Fig. S9), with similar effect on species-level and a weaker effect on genus-level. The distance decay was equally effective for the core ASV community as for the whole community when evaluated on weighted (Bray–Curtis) dissimilarities opposed to the unweighted (Sørensen) counterpart with a much lower Mantel r correlation value (Supplementary Fig. S10).

Fig. 6
figure 6

Distance-Decay relationship (DDR) of AS samples in all plants (n = 84) based on ASV distances. A DDR calculated based on Bray–Curtis (weighted) dissimilarity and B based on Sørensen (unweighted) dissimilarity. Each panel shows: the statistics and significance of the Mantel test calculated using Spearman correlation and 999 permutations; the equation for the linear regression and the R2 of the regression model. The color indicates the number of permutated distances between two samples in each bin (150 bins in both vertical and horizontal directions)

The geographical distribution of the top 100 most abundant species was further investigated. The majority had a uniform regional distribution, such as Ca. Microthrix parvicella, but some exhibited a particular abundance in some regions (e.g., Ca. Amarolinea dominans, on the islands) (Supplementary Fig. S11) and some were exclusively detected in very few plants, such as Ca. Epiflobacter midas_s_452 (Fig. S12). These differences in community structure could be explained by differences in the communities of the IWW. Overall, the similarity in terms of identity of ASVs in IWW and AS pairs from the same plant was significantly higher than in IWW and AS from different plants (Supplementary Fig. S13), but it could also explain the presence of specific species. For example, Ca. Epiflobacter midas_s_452, was only present in a few AS plants, and was consistently detected also in the corresponding IWW and not in others (Fig. S12).

Both plant size (PE) and industrial load may also affect the community structures, either by affecting the immigration and/or affecting the growth conditions in the AS plants. This was supported by the observed increased gradient along PCo2 of the community structure from different plants with the increase of plant size and industrial load, respectively (Fig. 5C,D). The presence of either biological or chemical P-removal did not consistently affect the AS microbial community with an explained variance of only 3% (R2 = 0.03) (Fig. 5B), and the plants with biological P removal were not enriched with the three known polyphosphate-accumulating organism genera, Ca. Accumulibacter, Azonexus (formerly Dechloromonas), and Ca. Phosphoribacter (formerly Tetrasphaera) (results not shown).

Discussion

The microbiota of wastewater treatment plants largely determines the performance of the plants, so a detailed understanding of microbial structure and function is essential for informed control and management. Many former studies of the microbiota have been hampered by limited taxonomic resolution or investigation of only a single or few plants, making it difficult to draw general conclusions. Recent methodological progress has changed this by allowing higher taxonomic resolution and providing large-scale global surveys [1, 3]. We can now revisit important ecological concepts and questions such as number and identity of core and CRAT taxa at high taxonomic resolution and combine with growth calculation to reveal true ecosystem-critical organisms. Additionally, the high sampling density allows for an unprecedented resolution when evaluating biogeographic patterns at this scale.

AS microbiota structure and core taxa

The total number of bacterial species in global WWTPs may be very high, perhaps billions [8] but considering only those in relative abundance above 0.1%, assumed to be important for the function of the plants, the number is very low, around 1,500 species belonging to 950 genera [1]. Our study of the 84 Danish WWTPs underlines that a limited number of taxa are important in a specific WWTP and that the same taxa are present across similar WWTP configurations in a regional area such as Denmark. Despite this, the specific plants all show a distinct microbial fingerprint reflecting variations in taxa and/or their abundances, also observed in other local studies [2]. Furthermore, for a given WWTP we have seen the communities to be very stable over several years despite clear seasonal variations observed for up to 70% of all species [10].

The core community is considered essential to identify putatively important organisms in a given, often complex but overall similar ecosystem [9, 34]. However, the definition of the core microbiota varies between studies and may impact the findings [34]. Despite the different core taxa criteria, a relatively low number of core genera have been consistently identified across WWTPs [1, 2, 9, 35, 36]. Previous studies confirm our results, emphasizing that the conserved part of WWTPs microbiota comprises a low number of taxa at both local/regional and global scale. However, these taxa constitute a substantial portion of the total biomass fraction as determined by accumulated read abundance at genus level, consistently comprising approximately 75% in Danish WWTPs (this study) and ranging from 57 to 68% in global WWTPs [1, 8].

The number of core taxa generally decreases when more locations are included in the analysis, particularly when geographically spread [34]. This was also very clear at species level when comparing the Danish and global WWTPs. The number of strict and general core species was much greater when investigating at national scale compared to the global (51 and 7 species, respectively). It was, however, less clear when comparing the cores at genus level. The 226 Danish AS core genera were similar to the global core (including 212 genera) with around 140 shared genera, showing that a large fraction of the local/regional cores was captured by the global, though with few genera being more pronounced or specific to different regions of the world.

These results are important, indicating that while many genera are shared and abundant across global WWTPs, the species diversity within these genera is high and exhibits variations across the planet. This calls for more region-level studies around the world to find regional core/CRAT at species level. This is supported by our recent species-level global WWTP surveys of Chloroflexota [37], polyphosphate-accumulating bacteria [38, 39] and the family Saprospiraceae e [40].

Abundant core species are assumed to be functionally important and process-critical in the WWTPs [9]. A substantial fraction (~ 30%) of the core ASVs were, however, non-growing in the Danish plants. These were most likely only present due to continuous mass-immigration from incoming wastewater, thus of no functional importance to the treatment process. Examples included ASVs classified as Acidovorax and Trichococcus midas_s_4, which could reach levels of around 30% in IWW of some WWTPs. Members of Acidovorax and Trichococcus are known to grow in the sewer systems [41, 42], and therefore transported to the AS in high abundances where they die off. Interestingly, the global core species also contained Trichococcus midas_s_4 and several other taxa that in our Danish study were recorded dying off, e.g., Leptotrichia midas_s_2907 and Subdoligranulum midas_s_348 [1]. Most likely they are also dying off across the world and rather belong to the gut or sewer microbiota than the AS microbiota. The high fraction of wastewater taxa dying off in the AS tanks corresponded to ~ 10% of the accumulated read abundance in Danish plants. Similar fractions of non-active members are likely found in any global WWTP treating municipal wastewater, a factor that should be considered when determining the core communities or in any other microbial ecology study, where immigration has an important contribution.

A relatively strict filtering for growth fate assignments were applied with an ASV being categorized only if observed in 10 sample pairs (IWW and AS) with a relative abundance above 0.01% in AS or 0.005% in IWW. This means that many ASVs will falsely be categorized as ‘inconclusive’, but simultaneously suggesting the remaining assignments to be correct. This can pose a problem especially for CRAT ASVs if they are only detected in a single plant meaning they need to be detected in all 10 sample pairs of the specific plant to be categorized. However, the fraction of ASVs categorized as ‘inconclusive’ constitutes only a small fraction of the cumulative read abundance and does not seem to be problematic for the overall analyses and conclusions of this study.

Insight into core species from genome recovery

In total 167 species were detected as core species (6 strict, 45 general, and 116 loose core species). By linking high-quality (HQ) metagenome-assembled genomes (MAGs) with 16S rRNA genes to the core species, we can predict the potential function and combine where possible with experimental validation [43]. From our earlier study of AS from 23 Danish WWTP by Singleton et al., [43] we have HQ MAGs for 90 out of 167 core species, showing that we already have extensive knowledge of main functions of many core species. This covers all important functional groups such as nitrifiers (Nitrosomonas midas_s_139, Nitrospira defluvii, Nitrotoga midas_s_181), denitrifiers (Rhodoferax midas_s_33, Thauera midas_s_256), polyphosphate-accumulating organisms (Ca. Phosphoribacter midas_s_5, Azonexus phosphorivorans, Azonexus phosphoritropha), and filamentous bacteria (Ca. Amarolinea dominans, Ca. Microthrix parvicella, Ca. Microthrix subdominans). Many CRAT species were less covered by MAGs (25 out of 91) as these were mainly detected in plants not investigated by Singleton et al., [43]. These plants include Skagen, Kalundborg, and Ringe which all receive industry-heavy influent probably with distinct CRAT species being specific to certain industries. As aforementioned, Kalundborg runs with high temperatures (16–32 °C) affecting growth conditions in the process tanks potentially increasing the presence of CRAT species.

Strict core members include known species such as Ca. Phosphoribacter midas_s_5, important for phosphorus removal at the WWTPs [44], and Rhodoferax midas_s_33 found to be denitrifying [45]. Yet it also includes two species (midas_s_59 and midas_s_179) having only MiDAS genus placeholder names and lacking in-situ physiology descriptions but with most of their ASVs categorized as growing. This clearly suggests that we still lack physiological and functional knowledge on some of the most abundant and important core species common to Danish WWTPs on which we should focus our future research efforts. Additionally, with the rapidly expansions of all MAG databases we anticipate having MAGs for all core species within the coming years, meaning that we can complete the overview of functional processes.

AS microbiota assembly is determined by immigration and biogeography

The assembly of the activated sludge microbiota is controlled by deterministic (such as substrate, predation, etc.) and stochastic processes (such as diversification, immigration, etc.) [4,5,6,7]. In our survey, all plants had overall similar process design (removal of C, N, and P), only the method for P-removal varied (either primarily chemical or biological) and similar environmental conditions (similar climate throughout Denmark). Thus, factors determining whether the bacteria were growing or not in the AS were very similar for all plants. This is also confirmed by the observation that the growth patterns of all ASV were similar across all plants when present.

Nearly all members of the AS microbiota in the specific WWTPs were also present in the incoming wastewater stressing that process-critical species were continuously, or most of the time, added by immigration. This is in accordance with previous observations in Danish sewer-AS systems that mass immigration is essential for the structure of the AS microbiota [4, 46]. Thus, the source communities, i.e. the sewer systems, form an environmental filtering step only allowing local species to enter the WWTPs where those that can grow in AS will do so. We observed a clear regional variation in AS communities, also to some extent reflected in the communities of the incoming wastewater suggesting interplay of immigration and biogeography.

It is known that microorganisms display biogeographic patterns with a non-random distribution of taxa [15, 16, 47], and both global and local surveys observed that biogeographical patterns apply for the AS microbiota [1, 8, 48]. We hypothesized that it would also work on a very local scale of a few hundred kilometers, the size of Denmark. An overall distance decay across Denmark was demonstrated at all investigated taxonomic levels (genus, species, and ASV), although most pronounced on ASV and species level. Similarly, an effective decay for core ASVs was found when evaluated on weighted (Bray–Curtis) dissimilarities and not for unweighted (Sørensen) dissimilarities, which showcases the innate characteristics of core taxa—being present in all plants—however in varying abundances probably influenced by biogeography. The distance decay may be a result of diversification due to interplay between fundamental evolutionary and ecological processes like selection, drift, dispersal, and mutation [49], factors that may affect the source sewer communities and consequently the AS microbiota via immigration. Additionally, other general environmental gradients following geographical distances may affect distance decay as well [48] by impacting the growth conditions in the source communities regulating at which abundance the specific bacteria are entering the AS.

Industrial load and size of WWTP also seemed to impact the AS microbiota. Often it is assumed that industrial load affects the microbial composition in the AS [35, 50, 51]. However, it is important to state that the industrial load may impact the composition in AS in different ways: by introducing toxic chemical etc. that may affect certain taxa, by adding special substrates and nutrients that may promote the growth of some taxa, or by introducing new species via the influent (immigration) either through discharge of certain taxa from the industry or/and by altering growth conditions in the source communities (the sewer). Plant size was generally correlated with densely populated cities with large complex sewer systems as source communities, likely producing more similar immigrating bacteria compared to small WWTPs located in the countryside. The communities in plants with either biological or chemical P-removal were very similar as also previously observed in fewer Danish plants [2].

Regional studies of WWTP microbiota and their source communities are needed

Our study shows that the statement often used in the field of wastewater treatment “Everything is everywhere, but, the environment selects” by Baas-Becking [47, 52] needs a rethinking especially at species and ASV level. The type of design and the operation of a WWTP will indeed allow certain species to grow, but if the species are not consistently provided through the source community, the sewer system, they will not be established in the AS plant [4, 5, 46]. This suggests that everything does not appear to be everywhere with the source community being responsible for which organisms have the chance to proliferate in AS. The source communities will vary at both local and global scale as demonstrated in our studies, in accordance with several other biogeographical studies of the microbiota in e.g., soil, sea, and coast [14,15,16]. Additionally, some of these studies show a global diversification within species at strain level [16]. It was not possible to evaluate this in our study by amplicon sequencing, but it will be important to study at genome level as strains within the same species may have different physiology.

Whether the spatial scale should be at the level of continents or smaller remains to be studied, but our results indicate the relevance of smaller regions. Importantly, new studies should include the source communities, the sewer systems, and the related catchment areas. As the process-critical bacteria come from somewhere, we need to reveal these sources and potentially use this knowledge for a holistic informed management of the sewer system and the WWTPs, e.g., by introducing targeted treatment steps before the wastewater reaches the WWTP [53].

Conclusions

Our study provides a complete overview of microbial communities in Danish activated sludge systems and influent wastewater at species level. Variation in process design of the plants had little effect on the microbiota, but geography and immigration played an important role, underlining that the general belief regarding wastewater treatment systems that “Everything is everywhere, but, the environment selects” by Baas-Becking is not true. Consequently, we need more region-level studies to find regional process-critical taxa (core and CRAT), especially at species and ASV level since the genera often are the same. Also, it is important to assign a growth pattern to each ASV and combine with core and CRAT knowledge to pinpoint the truly important taxa for the AS process. Altogether, the findings may aid in implementing enhanced process control and strategies like early warning systems or targeted resource recovery for specific WWTPs or regions based on monitoring of process-critical taxa (active core and CRAT members). Importantly, new studies should include the source communities, the sewer systems, and the related catchment areas for a holistic informed management of the entire wastewater transportation and treatment system.