Introduction

Dissolved organic matter (DOM) plays a fundamental role in nutrient cycling dynamics in riverine systems, accounting for the majority of the reactive carbon (C) present in aquatic ecosystems (Evans et al. 2017) and a bioavailable source of nitrogen (N) and phosphorus (P) for both stream autotrophs and heterotrophs (Brailsford et al. 2019, 2021; Mackay et al. 2020). As a result, DOM plays an important role in stream elemental nutrient cycling and the ecological functioning of aquatic systems (Cole et al. 2007; Prairie 2008; Battin et al. 2009; von Schiller et al. 2017). In addition, DOM also plays a vital role in the transport and bioavailability of trace metals (Christensen et al. 1996; Temminghoff et al. 1997) and other pollutants (Kalbitz et al. 1997; Aiken et al. 2011) within river systems. It is also a substrate for the formation of disinfection by-products in waters abstracted and treated for drinking (Lavonen et al. 2013; Ike et al. 2019), which are potentially carcinogenic. Recent research has confirmed that the concentration of DOM in a waterbody is not the only factor regulating its functional significance; the need to define the chemical composition of DOM is increasingly being recognised as an important research priority for scientists, water companies and regulators (Riedel et al. 2016; Kaiser et al. 2017; Worms et al. 2019) as different compound classes play differing roles and are metabolised in different ways by the stream biota. Therefore, in order to understand the distribution and chemical processing of organic matter in the environment, elucidating the origin of the uncharacterised portion at molecular level is crucial (Hedges et al. 2000).

Some studies of riverine DOM character (particularly early studies) largely rest on bulk nutrient quantification (DOC, DON, DOP) (e.g. Aud et al. 2003; Worrall et al. 2004; Halliday et al. 2014; Volta et al. 2020). More recent technological advancements have supported a shift towards more specific analytical methods which allow the characterisation of the DOM pool either at the compound class level or, most recently, the molecular level. The optical properties of DOM have been used widely to define composition at a compound class level, specifically UV–Vis spectrophotometry (Helms et al. 2008; Werner et al. 2019; Lee et al. 2021; Pang et al. 2021; Yates et al. 2019, 2016) and fluorescence spectrophotometry (Baker 2002; Hudson et al. 2007; Spencer et al. 2008, 2019; Wang et al. 2019; Yates et al. 2019; Ma and Li 2020; Pisani et al. 2020). The development of in-situ optical sensors capable of monitoring at specific wavelengths have even allowed highly temporally resolved datasets to be collected, such that the dynamic changes in DOM composition under diurnal or storm event conditions can start to be determined (Werner et al. 2019; Croghan et al. 2021). In addition to techniques which consider the optical properties of the DOM pool, there has also been an increase in the number of studies characterising riverine DOM using size-exclusion chromatography coupled with organic carbon, organic nitrogen or UV detectors (Dulaquais et al. 2018; Wu et al. 2019; Hawkes et al. 2019; Trubetskaya et al. 2020). These techniques allow the dynamic flux of different compound classes to be assessed, but they lack the analytical power to determine the individual organic compounds in a complex DOM mixture. Achieving molecular level resolution is vital as the concentrations of specific compounds will vary in relation to source character, temporally and spatially within and along the length of a river lake, estuary or wetland.

Many studies have used targeted analysis of organic compounds in rivers, for example pesticides (Tan & Mustafa 2004; Stone et al. 2014; Lundqvist et al. 2019), pharmaceutical and personal care products (PPCPs) (Ashton et al. 2004; Joshua et al. 2020; Ngo et al. 2021) and other industrial pollutants, such as flame retardants (Andresen et al. 2004; Cristale et al. 2013). These tend to be carried out exclusively on compounds which have been anthropogenically introduced and are of known concern. However, the latter are generally trace components leaving the majority of DOM present in aquatic systems (potentially tens of thousands of compounds representing the majority of the overall DOM load), meaning a wide range of both natural and anthropogenic compounds remain uncharacterised. Hence, there is a need for updated molecular-scale methodologies which provide a wider analytical window and allow untargeted characterisation of DOM pool composition across the all the major molecular classes, and not merely target compounds, such as priority pollutants.

Technological advances in chromatography and mass spectrometry have begun to allow more in-depth probing of DOM compounds from complex environmental mixtures, and in some cases have provided unequivocal identification and quantification of compounds. Some studies have used chemical degradation methods followed by GC/MS analysis to examine specific compound classes in more detail, for example for dissolved black carbon (Dittmar 2008; Stubbins et al. 2012, 2015) and dissolved lignin (Kaiser & Benner 2012). In addition, there has been an increase in the number of studies using high-resolution mass spectrometry techniques, such as Orbitrap MS (Pemberton et al. 2020; Prasert et al. 2021) or Fourier-Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) (Kamjunke et al. 2019; Simon et al. 2019; Spencer et al. 2019; Chen et al. 2021) as an analytical tool to investigate riverine DOM composition. A recent review by Kim et al. (2021), for example, discusses the most recent advances in ultrahigh-resolution MS for terrestrial organic matter. These novel molecular-scale approaches are now enabling significant advances in the elucidation of riverine DOM composition. However, it is important to appreciate that no analytical method is capable of achieving comprehensive molecular characterisation of DOM, with all analytical methods only considering a defined analytical window. Due to the low concentrations of individual organic compounds and other confounding factors such as the presence of inorganic salts, analytical protocols involving extraction and pre-concentration steps are always necessary prior to molecular level analysis. DOM pre-concentration is often undertaken using solid-phase extraction (SPE) and it that case the analytical window is limited by the chosen solid-phase or eluting solvent (Li & Minor 2015; Li et al. 2017). In addition, the type of MS (FTICR, TOF, Quadrupole), ionisation mode and efficiency (Patriarca et al. 2020), i.e. electron ionisation (EI) versus electrospray ionisation (ESI), ionisation polarity, i.e. positive versus negative (Hawkes et al. 2020), and mass range will affect the range of compounds detected. Therefore, a range of extraction and mass spectrometric methods is essential for obtaining a more complete characterisation of the composition of DOM at the molecular level.

The most common protocol for DOM extraction in environmental research utilises SPE cartridges packed with a styrene–divinylbenzene polymer-based sorbent and using methanol as the eluting solvent, followed by ESI and MS analysis in the negative ion mode (e.g.Stubbins et al. 2010; Spencer et al. 2019; Prasert et al. 2021). This provides detailed information regarding the polar components of DOM which are of intermediate molecular weight (m/z 150–2000). Meanwhile, pyrolysis–gas chromatography/MS (Py-GC/MS) has been employed specifically to investigate the presence of lignin phenols in river water, providing information about plant-derived DOM components (e.g. Williams et al. 2016; Kaal et al. 2020). In another example Lynch et al. (2019) used EI GC/MS analysis after a SPE preconcentration step, to characterise DOM components which could be chromatographically separated using a low polarity phase (TG-5MS) column. Each approach tackles a particular size fraction and class of compounds. In order to achieve this, it is necessary to not only optimise the compositional information gained from an individual technique, but to use it in combination with other molecular scale analyses to maximise the analytical window, such that potential impact of DOM in the environment is to be understood in relation to the molecular composition key compound classes.

Herein, we report the assembly of a suite of complementary chromatographic and mass spectrometric methodologies which can be used to intensively examine riverine DOM chemistry over a wide molecular weight range, encompassing low molecular weight chemical species to macromolecular components. The utility of these methods is then tested in a selection of streams draining catchments of contrasting environmental character, displaying marked differences in both their DOC, DON and DOP and inorganic nutrient loading rates, and the optical properties of their DOM pool (Lloyd et al. 2019; Yates et al. 2019) (Fig. 1).

Fig. 1
figure 1

Hierarchy of analytical methods for non-targeted analysis of riverine DOM

Methodology

Site descriptions

To demonstrate the applicability of this complementary approach across a selection of DOM chemistries, three well-researched contrasting UK exemplar field sites were chosen. Figure 2 shows the location of the field sites, which include a ‘natural’ peatland site in the headwaters of the River Conwy in North Wales (Nant y Brwyn hereafter), and two anthropogenically impacted catchments: a clay site impacted by agriculture in the headwaters of the River Sem, a tributary of the Hampshire Avon (Priors Farm hereafter) and a clay site impacted by a sewage treatment works on the River Chew at Chew Stoke, Somerset (Chew Stoke hereafter). The headwaters of the River Conwy drain blanket peatland supporting extensive sheep grazing, as it flows north before eventually reaching the Irish Sea. The River Sem predominately drains clay soils supporting intensive cattle production, as it flows west to join the River Nadder before flowing south into the Hampshire Avon which drains through Poole Harbour to the English Channel. The River Chew has its source in Chewton Mendip, draining an intensively farmed Carboniferous Limestone catchment, from where it drains over heavy clay soils supporting cattle production running northwest to Chew Stoke, then northeast to join the Bristol Avon, ultimately discharging to the Severn Estuary. Table 1 summarises the catchment characteristics and summary nutrient data from our prior research at each of these field sites. Of note, the human population density and discharge of treated effluent and untreated human waste from septic tanks varies markedly between the three catchments.

Fig. 2
figure 2

Map showing the location of the three contrasting UK field sites

Table 1 Catchment characteristics and background water chemistry for the three field sites

Complementary analytical chemical methods

There were a number of criteria which needed to be fulfilled when designing a robust methodology for untargeted characterisation of riverine DOM. The methods needed to: (a) allow compound identification; (b) encompass a range of compound polarities, molecular weights and functional groups; (c) allow the analysis of natural and anthropogenic DOM, and (d) encompass a wide range of concentrations from trace biomarker components to macronutrients.

With these criteria in mind, complementary chromatographic and MS-based protocols were developed and tested as shown in Fig. 1. These were optimised for the analysis of riverine DOM, and crucially the number of laboratory processing steps was kept to a minimum to avoid the potential loss of volatile or low abundance compounds. Details of the methodologies and their optimisation are outlined below. In all cases samples of river water were collected using a pre-combusted (450 °C for 4 h) amber glass bottle, attached to a solvent-washed stainless-steel pole if access to the river was difficult. The bottles were filled to the rim to avoid any headspace and topped with a piece of pre-combusted (450 °C for 4 h) aluminium foil before the lid was applied to avoid any sources of plastic, and therefore contamination of the sample. The samples were transported back to the laboratory within 24 h of collection at 4 °C in the dark and filtered using glass filter kits (Merck Millipore, Feltham, UK) and glass fibre (GF) filter papers with a nominal pore size of 0.5 µm (Advantec, Cole‐Palmer, Hanwell, UK). All glassware and filter papers were pre-combusted before use as above. The use of pre-combusted GF filters was found to be important due to contamination of molecular-scale analyses, most notably via py-GC/MS, by cellulose nitrate filters at the ultra-low concentrations for individual DOM compounds addressed in this study (Lloyd et al. In press). Cellulose nitrate filters are the most commonly used filter for the filtration of river water samples for DON and DOP analysis. A pore size of 0.5 µm was chosen to match as closely as possible the standard 0.45 µm pore size used to define the ‘dissolved fraction’ in freshwater research. Water samples were stored at 4 °C, then filtered within 24 h of collection and were either processed immediately or stored at − 20 °C before being freeze-dried depending on the method requirements. For a full discussion of all aspects of sampling, pre-processing and storage of samples for molecular-scale analysis of DOM see Lloyd et al. (In press). HPLC grade water was used for procedural blanks which were processed via all the methods described below to check for any laboratory-derived contamination. This is critical when analysing compounds at trace concentrations.

Non-polar compounds

River water is rarely analysed for non-polar compounds as they are hydrophobic and often associated with particulates and therefore would not be expected to be present or their concentrations negligible in the filtered fraction. However, the notion of the purely ‘dissolved’ fraction is flawed as in reality particulate and dissolved organic matter should be viewed as a continuum, spanning from full solution phase through colloidal matter to particulate phase. As a result, the < 0.45 µm fraction will contain colloidal material, including non-polar compounds which may be in colloidal or fine particulate phase. These compounds are diagnostically useful as they can include a variety of both natural and anthropogenically produced ‘biomarker’ compounds (van Bergen et al. 1997; Bull et al. 1998) which can help in defining the character of the riverine DOM and distinguish between different contributing sources.

The analysis of the non-polar fraction in this programme was carried out using liquid–liquid extraction of ‘raw’ filtered river water. Filtered river water (150 mL) was placed into a glass separatory funnel along with 5α-androstane as an internal standard. The mixture was extracted with n-hexane (3 × 20 mL), chosen due to its low polarity and immiscibility in water. This meant that it was possible to obtain a water-free fraction without the need for a drying step (using a sodium sulphate column for example) which could introduce potential contaminants and risk compounds of interest being lost by adsorption onto the drying column. A number of other organic solvents were trialled including dichloromethane, diethyl ether and cyclohexane. Dichloromethane and n-hexane extracts produced very similar chromatograms, and the widest range of compounds compared with the other solvents and therefore n-hexane was chosen as it had better immiscibility. The three aliquots of n-hexane were combined and rotary evaporated until 1–2 mL of solvent remained, and samples were stored at − 20 °C until analysis. Prior to analysis via gas chromatography/mass spectrometry (GC/MS), the samples were blown down to dryness under a gentle stream of N2, then 70 µL of N-methyl-N-(trimethylsilyl)trifluoroacetamide with 1% trimethylchlorosilane (MSTFA) (Sigma–Aldrich, Gillingham, UK) was added and the mixture heated at 37 °C for 40 min to produce TMS-derivatives. MSTFA was chosen over N,O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) as during method testing it was found that lower molecular weight compounds such as urea were not effectively derivatised using BSTFA, yet an optimal signal was observed when the MSTFA reagent was used. No other differences were observed between the range of compounds derivatised between the two reagents. The derivatised samples were then blown to dryness under N2 and re-dissolved in ethyl acetate for analysis. They were then analysed using a Thermo Scientific TRACE 1300 gas chromatograph coupled with a Thermo Scientific ISQ LT Single Quadrupole MS. The GC was fitted with a HP-1 (50 m × 0.32, film thickness 0.17 µm) column and used helium as the carrier gas. The GC temperature programme was: 50 °C (1 min) to 300 °C (held for 10 min) at 10 °C min−1. The MS was operated using electron ionisation at 70 eV with the quadrupole mass analyser scanning a range of m/z 50–650 (scan time 0.2 s). Compounds were assigned based on library mass spectra, known elution orders and comparisons of retention times with reference compounds.

Polar building block compounds

Amino acids and carbohydrates are polar building block compounds which are fundamental in all ecosystems, they are the base constituents of biomacromolecules. The ability to detect their presence in the stream DOM pool would provide an insight into nutrient processing within streams of differing environmental character. Here we refine and test methods previously developed for the determination of amino acids and carbohydrates in soils, adapting them for freshwater samples. The methods extract the total hydrolysable pools (from proteinaceous material) as it is generally assumed that the free pools will be extremely small in river water (Robarts et al. 1990) or undetectable (Lytle & Perdue 1981) as the compounds are turned over so rapidly by the stream biota.

Amino acids

Total hydrolysable amino acids were extracted using the method previously reported by Charteris et al. (2016). Approximately 10 mg of freeze-dried residue (from filtered river water) along with 10 µL of norleucine (1000 µg mL−1) as an internal standard was subjected to hydrolysis using 5 mL of HCl (6 M) for 24 h at 100 °C under N2. Centrifugation was then used to separate the hydrolysates, which were then dried at 60 °C under a stream of N2 and stored at − 20 °C in 0.1 M HCl. A Dowex® 50WX8 cation exchange resin column was used to extract the amino acids which were subsequently eluted using 2 M NH4OH. The amino acids were then derivatised to produce their N-acetyl, O-isopropyl derivatives. Acetyl chloride mixed with isopropanol (4:1 v/v) was added to the hydrolysed amino acids (0.25 mL) and heated at 100 °C for 1 h. After quenching the reaction at − 25 °C, the excess reagent was removed under a gentle stream of N2. To ensure all the solvent was removed, the sample was redissolved in 0.25 mL of dichloromethane and the solvent evaporated again under N2. Following this, 1 mL of acetone, triethylamine and acetic anhydride (5:2:1 v/v/v) was added and the sample heated at 60 °C for 10 min and once cooled the reagent were evaporated under N2. Then the sample was redissolved in 2 mL ethyl acetate and 1 mL of saturated sodium chloride solution was added. After mixing a phase separation was performed by extracting the organic top layer, this step was done three times and the organic phases combined and the solvent again evaporated under N2. These derivatives were then extracted twice with 3 mL of dichloromethane to remove any remaining aqueous reagents, then dried under a gentle stream of N2 at room temperature then redissolved in ethyl acetate and stored at − 20 °C until analysis.

The individual amino acid compounds were then separated using GC and detected using a flame ionization detector (GC-FID) (Agilent Technologies 7890B, CA, USA). The GC was fitted with a DB-35 column (35%-phenyl-methylpolysiloxane, mid-polarity, 30 m × 0.32 mm inner diameter, film 0.5 µm). Helium was used as a carrier gas (2 mL min−1), operating under constant flow conditions. The GC oven was programmed as follows: 70 °C (held for 2 min) to 150 °C at 15 °C min−1, then increasing to 210 °C at 2 °C min−1, then to 270 °C (held for 10 min) at 8 °C min−1. The compounds were identified by their retention times compared with a mixed amino acid standard. The samples were also screened using a GC-Quadrupole-Time-of-Flight mass spectrometer (GC-Q-TOF MS) (Agilent Technologies 7200 Accurate Mass, CA, USA) to confirm the identity of the compounds. The instrument was fitted with the same column as described above for the GC-FID and the GC was operated at constant flow, using helium as the carrier gas at 1.5 mL min−1. The GC oven was set to the same temperature programme described above. The MS was operated in electron ionization (EI) mode at 70 eV, the source temperature was 230 °C and the emission current was 3.1 µA. Data acquisition was between 50 and 1050 amu at 5 spectra s−1.

Neutral carbohydrates

Neutral carbohydrates were extracted from approximately 10 mg of freeze-dried residue from the filtered river water. The samples, along with an internal standard (20 µL of myo-inositol, 450 µg mL−1), were subjected to strong acid hydrolysis using H2SO4 (12 M) at room temperature for 1 h under N2, followed by mild acid hydrolysis (H2SO4 1.2 M, 100 °C, 3 h), also under N2. The two-step hydrolysis is designed to breakdown any α-cellulose which may have been introduced from vascular plant material into glucose. The hydrolysates were separated by centrifugation, then the solution was neutralised to pH ~ 6 with CaCO3. NaBH4 0.5 mg mL−1 was added to the supernatant, and the mixture heated to 40 °C for 90 min to reduce the neutral sugars. Any remaining NaBH4 was then removed by adding glacial acetic acid (200 µL) and methanol (4 × 4 mL) and then dried under a gentle stream of N2. Acetic anhydride (2 mL) was added, and the mixture heated to 100 °C for 16–18 h to acetylate the alditols. Any excess reagent was quenched by the addition of 5 mL of double distilled water, then the acetylated alditols were extracted using dichloromethane (4 × 2 mL), dried under N2, redissolved in ethyl acetate, and stored at -20 °C until analysis.

The acetylated alditols were separated and detected using GC-FID (Hewlett-Packard 5890 Series II, TX, USA). A VF23-ms column (cyanopropyl, high polarity, 60 m × 0.32 mm inner diameter, film 0.15 µm) was used for the separation. Helium was used as the carrier gas (2 mL min−1), operating under constant pressure. The GC oven was programmed as follows: 50 °C (held for 1 min) to 200 °C at 20 °C min−1, then to 230 °C (held for 22 min) at 4 °C min−1. Compounds were identified by their retention times compared with individual standard solutions. The samples were also screened using the GC-Q-TOF MS to confirm their identification. The GC was set up the same as for the GC-FID, using the same column and oven conditions. The MS was set up as described for the AA analysis above.

Polar intermediate molecular weight compounds

The analysis of the polar, intermediate molecular weight fraction follows that of Pemberton et al. (2020). The compounds in this fraction can be described as polar but have lower polarity compared with amino acids and carbohydrates described above. One litre of filtered river water was acidified to pH 2 using HCl and then the DOM was pre-concentrated using Oasis Hydrophilic–Lipophilic Balance (HLB) solid phase extraction cartridges (SPE, 400 mg bed mass, 60 μm particle size, Waters Ltd, Elstree, UK). Extracts were then eluted from the cartridges using methanol and dried under N2. The samples were then redissolved in methanol water (1:1, v/v) before being analysed. Direct-infusion High Resolution MS (DI-HRMS) spectra were recorded in positive ion mode using an Orbitrap™ Elite Hybrid Ion Trap‐Orbitrap™ Mass Spectrometer (Thermo Scientific, Hemel Hempstead, UK) with a heated electrospray ionisation (HESI) source. Electrospray ionisation (ESI) is a soft ionisation method that yields almost no fragmentation so each ion is a molecular species corresponding to either protonated adducts [M + H]+, or Na+ or K+ adducts. Thus, ESI mass spectra constitute distributions of molecular species, complicated by adductions. These can be recognised by the mass defects of 22 and 38 between the [M + Na]+ and [M + K]+ adducts and [M + H]+ adducts, respectively. Data handling protocols are as described by Pemberton et al. (2020).

Macro-molecular material

The analysis of macro-molecular material provides insight into higher molecular weight fraction which will likely include proteinaceous, polysaccharide and lignin-derived components. Previous studies have investigated the presence of lignin-phenols in water by isolating the compounds using solid phase extraction, followed by analysis via GC/MS or pyrolysis-GC/MS (Louchouarn et al. 2000; Williams et al. 2016). This method is very effective for specifically targeting lignin derived compounds but if a more comprehensive, untargeted approach is desired then a different pre-concentration procedure is necessary. Therefore, in this study ‘raw’ filtered river water was freeze-dried to dryness and  ~ 1 mg of the residue was weighed into pre-combusted (1000 °C for 4 h) quartz tubes. An internal standard, 2 µL of 5α-androstane (0.01 µg mL−1), was added. Samples were then pyrolysed at 610 °C for 20 s at 20 °C ms−1, using a pyroprobe (CDS Analytical 6200, PA, USA). The pyrolysis products were transferred to a temperature programmable vaporisation inlet (transfer line temperature was maintained at 310 °C) which was operated at a temperature of 300 °C with a split ratio of 10:1. The products were separated using a Trace 1310 GC (Thermo Scientific, Bremen, Germany), which was fitted with a Rtx-1 column (dimethyl polysiloxane, non-polar, 60 m × 0.32 mm ID, film 0.25 µm). The GC was operated under constant flow (He, 2 mL min−1) using a temperature programme as follows: 40 °C (held for 4 min) to 300 °C at 4 °C min−1 and held for 15 min. The separated compounds were then analysed using an ISQ7000 mass spectrometer (Thermo Scientific, Bremen, Germany) using electron ionisation at 70 eV; the source temperature was 300 °C and the emission current was 50 µA. Data were acquired between 50 and 650 amu at 0.2 scan s−1). Samples were analysed both with and without thermochemolysis (when 5 µL of tetramethylammonium hydroxide (TMAH) (25% in methanol, Sigma Aldrich) was added to the samples prior to analysis).

Results and discussion

Non-polar fraction

Figure 3 shows total ion chromatograms gained from the analysis of the non-polar fractions from each of the three contrasting field sites. The three sites show some common compounds in the form of fatty acids (generic structure is shown in Fig. 4a i), particularly palmitic and stearic acid (C16 and C18) most likely derived from plant material surrounding the sample locations as these make up the largest proportion of the fatty acids in higher plants (Tjellström et al. 2012; Meï et al. 2015), or from in-stream aquatic productivity (Sushchik et al. 2010). The chromatogram from Nant y Brwyn displays the widest range of straight-chain fatty acids where a series of 9 homologues can be observed ranging from C9 to C24. Also, the C9 dicarboxylic acid (azelaic acid) is present (generic structure shown in Fig. 4a ii). These compounds have previously been identified in earlier research examining the lipidic oxidation products of peat (Hanninen 1986). Recent molecular characterisation of peat carried out by Vinci et al. (2020) showed that the majority of the organo-soluble fraction of peat was comprised of saturated long-chain fatty acids, with C16 and C18 being the most prevalent. This finding is reflected in the river water draining through the peat landscape in this study. In addition to these fatty acids, two steroidal compounds which elute in the last 10 min of the GC analysis, were identified as cholesterol (C27) (Fig a iii) and sitosterol (C29) (Fig a iv). Sitosterol is a common plant sterol and cholesterol could have algal origins (Patterson 1971). These compounds along with the organic acids are indicative of plant material which will comprise most of the organic matter input to this peat headwater stream. This distribution of compounds can be contrasted with those seen in the chromatograms from the two intensively farmed and populated catchments.

Fig. 3
figure 3

Total ion chromatograms of the non-polar fraction of DOM from the a River Conwy at Nant y Brwyn, b River Sem at Priors Farm and c River Chew at Chew Stoke

Fig. 4
figure 4

Chemical structures identified in a the non-polar, polar intermediate molecular weight and macro-molecular, b amino acid and c neutral carbohydate fractions

At Priors Farm, a range of saturated long-chain fatty acids are observed (C8, C9, C16, C18) along with the C26 alcohol (Fig. 4a v) which are indicative of the grassland character of the field location. While the C26 alcohol was the only identifiable alcohol at this site it is well-established in the literature that the C26 is the dominant fatty acid observed in grassland and grazed sites (van Bergen et al. 1997; Otto et al. 2005; Trendel et al. 2010) and therefore likely the only compound which was abundant enough to be detected in the river water draining through this intensively grazed grassland catchment. In addition, there are some key diagnostic compounds which reflect cattle grazing, such as urea which is abundant in cattle urine and slurries and the steroidal compound 5ẞ-stigmastanol (Fig. 4a vi), which is only produced via biohydrogenation of plant matter in the gut during rumination (Bull et al. 2002; Lloyd et al. 2012). Also visible on the chromatogram is a slight hump of yet unresolved material between 19 and 24 min, this highlights the complexity of the DOM signal, and this would be a target for future research.

At Chew Stoke, the fatty acids are less dominant, with only C9, C16 and C18 detectable. Instead, the some of the most abundant compounds are the steroidal biomarkers eluting between 25 and 28 min. The most abundant of these is cholesterol, followed by coprostanol (Fig. 4a vii), a C27 stanol produced by the biohydrogenation of cholesterol in higher animals and therefore an important biomarker of human sewage (Leeming et al. 1996; Bull et al. 2002). Unlike the other two sites, the sampling site on the River Chew is immediately downstream from a major sewage treatment works (STW) at Chew Stoke. The presence of stearamide (Fig. 4a viii), a fatty amide of stearic acid, is also indicative of human activity with potential origins from industrial products such as adhesives or additives from plastic packaging (Lau and Wong 2000; Hahladakis et al. 2018). Fatty acid amides are the most commonly used slipping agents in plastic films and stearamide has been shown to readily migrate (up to 95%) from low density polyethylene film packaging into food (Cooper and Tice 1995). Further evidence of anthropogenic chemicals in the river is provided by the presence of tris(1-chloro-2-propyl)phosphate (TCIPP) (Fig. 4a ix), which may originate from industrial processing as it is a flame retardant commonly used in polymers such polyurethane foams (Van den Eede et al. 2016). Its use has been increasing since the ban on brominated flame retardants and as a result it has been reported as an emerging contaminant in the natural environment (Van der Veen and de Boer 2012). Phosphate-based flame retardants are not normally chemically bonded into the manufactured materials and therefore are easily leached into the environment (Rodriguez et al. 2006). These industrially produced compounds are not observed at either of the other sites further supporting the argument that these anthropogenic contaminants are being introduced to the river via the final effluent discharged from the Chew Stoke STW.

Polar building block compounds

Amino acids

The distribution of amino acids measured for the three sites is provided in Fig. 5a (the chemical structures are shown in Fig. 4b). A total of 14 hydrolysed amino acids were identified (including the internal standard norleucine) from two of the freeze-dried river water samples, from Conwy and Priors Farm. Compound identification was confirmed using the GC-Q-TOF/MS. The total hydrolysable amino acid pool was an order of magnitude higher in the Conwy compared with Priors Farm, measuring 0.46 mg L−1 (58 µg N L−1, 0.17 mg C L−1, ~ 5% DOC) and 0.04 mg L−1 (5.1 µg N L−1, 5 µg C L−1, ~ 0.3% DOC), respectively, suggesting that that the DOM at Priors Farm is more degraded and perhaps proteins are more rapidly utilised by an active and/or larger microbial community. This is supported by further work conducted by Brailsford et al. (2021) where water was collected from both Nant y Brwyn and Priors Farm and 14C-labelled amino acids were added in an incubation experiment. The results showed after 24 h double the amount of labelled amino acids had been degraded in the Priors Farm water compared with Nant y Brwyn, suggesting that turnover rates of these compounds were greater in the agricultural catchment. However, further experiments would be needed to confirm this holds for complex proteins. Figure 4a shows the distribution of the amino acids as a percentage of the total amino acid pool at the two field sites. While the distributions are not wholly dissimilar there are some key differences, with threonine and serine and lysine notably more prominent in the Nant y Brwyn, whereas alanine, glycine and proline are more prominent at Priors Farm. In general the distribution of amino acids at Nant y Brwyn are in agreement with reported compositions for peatland sites. Silvester et al. (2021) show the most abundant amino acids in a peatland river as glycine, aspartic acid, glutamic acid, alanine, serine, and threonine, which is also true at Nant y Brwyn. The Nant y Brwyn sample did show an increased relative abundance of lysine compared with the literature, although this isan important amino acid in sphagnum moss which is the dominant vegetation at the field site, along with sedges. Glutamic acid, glycine, alanine, and aspartic acid have been shown previously as the most dominant amino acids in DOC (Hedges et al. 1994; Dittmar et al. 2001), however, at Priors Farm the relative abundance of leucine, valine and proline were also increased compared with Nant y Brwyn. These three amino acids are important components (along with glutamic acid, glycine, alanine and aspartic acid) in cow manure (Chen et al. 2003) suggesting that their prominence in the DOM profile reflects the intensive cattle production in this catchment, compared to low intensity sheep grazing in the Nant y Brwyn catchment.

Fig. 5
figure 5

Plots showing the distribution of a total hydrolysable amino acids and b total neutral carbohydrates at the three field sites (Note for Chew Stoke amino acids were undetectable using the current method)

Overall, the total amino acid concentrations and the relative contributions of each compound at each site agrees with other field sites reported in the literature and reflect the environments from which the samples were collected, demonstrating the sensitivity of this approach to characterising DOM composition from catchments of contrasting environmental character and anthropogenic disturbance. By contrast, despite repeated attempts, at Chew Stoke only a small number of the amino acids were visible from the GC analysis and importantly the internal standard added was lower than would be expected and therefore suggesting interference with the derivatisation process. While the exact cause is not yet known, possible reasons for this could be the high concentrations of inorganic components in the sample compared with the other field sites, which may cause interference with the extraction or derivatisation steps in the protocol. The sample when freeze-dried, yielded a mass of residue an order of magnitude greater than the other field sites, while the DOC concentration at the site is the lowest (5 ± 0.53 mg L−1), suggesting a large quantity of inorganic material was present. In addition, the wide range of anthropogenically-derived compounds in the river close to the sewage outfall could have hitherto unknown impacts of the derivatisation chemistry, for example metals or pharmaceuticals (Pemberton et al. 2020). Further method testing is needed here to allow the method to be used at such field sites which have more complex, anthropogenically-modified background chemistries.

Neutral carbohydrates

A total of 9 neutral carbohydrates (including myo-inositol as an internal standard) were detected at all three field sites (Figs. 4c and 5b), the identity of each compound being confirmed by comparing the chromatographic retention time with standards run of each individual compound. Nant y Brwyn shows the highest concentrations, with a total carbohydrate concentration of 1.58 mg L−1 (0.6 mg C L−1, ~ 16% DOC) of river water, compared with 0.53 mg L−1 (0.2 mg C L−1, ~ 10% DOC) and 0.15 mg L−1 (0.06 mg C L−1, ~ 1% DOC) for Chew Stoke and Priors Farm, respectively. The high concentrations at the peatland site are likely due to the continual leaching of carbohydrates from the organic soils surrounding the river (Jia et al. 2008). At the anthropogenically disturbed field sites, as with the amino acids, the lower concentrations might be because of the presence of a more active or larger microbial pool which rapidly utilises the organic pool instream. Alternatively, the residence time of the inputs might be playing a role: if the material has travelled further before being sampled there will have been more time for degradation processes to have occurred. Figure 4b shows the distribution of the neutral carbohydrates as a percentage of the total pool. Like the amino acid pool, the distribution is broadly similar across the sites but with some compounds showing variations. Previous literature has shown that the source of individual neutral sugars is non-specific (Cowie & Hedges 1984; Guggenberger et al. 1994a, b; Dungait et al. 2009), but when used in combination they do have some predictive power. For example, arabinose and xylose are particularly enhanced in the Conwy site and these compounds are important constituents of plant material (Oades 1984; Guggenberger et al. 1994a), and less readily synthesised by microbes. There is evidence that deoxysugars, fucose and rhamnose can be microbially derived (Guggenberger et al. 1994b; Biersmith & Benner 1998), and these compounds are relatively more abundant at both Priors Farm and Chew Stoke where the microbial communities are hypothesised to be particularly active because of the contribution of farm and human wastes. This is supported by the microbial response to 14C-labelled tracers introduced to water and sediments in the earlier work of Brailsford and colleagues in both the Conwy catchment, including Nant y Brwyn (Brailsford et al. 2019, 2021) and Hampshire Avon (including Priors Farm) catchment (Brailsford et al. 2021).

It has been suggested that the ratio between the neutral carbohydrate components (FUC + RHA/ARA + XYL) can be used to assess the contribution of plant and microbial sources in any environmental medium, with a ratio of 0.5 or lower indicative of plant-derived carbohydrates and values around 2 being microbially-derived (Oades 1984; Guggenberger et al. 1994a). The ratio for the sites studied here are 0.47, 1.17 and 2.76 for Nant y Brwyn, Priors Farm and Chew Stoke respectively, confirming the shift from plant-derived carbohydrates at the peatland site towards an increasingly microbially-derived composition in the agricultural and sewage impacted sites. It has also been suggested that the ratio of hexoses to pentoses can be used to distinguish between terrestrial plant (higher hexose) and phytoplankton (higher pentose) contributions (Cowie and Hedges 1984; Ittekkot and Arain 1986). The hexose/pentose ratios for the sites are 1.69, 3.0 and 6.55 for the Nant y Brwyn, Priors Farm and Chew Stoke respectively, further supporting the hypothesis that there is a decrease in the relative proportion of plant-derived compounds in the DOM pool in the two anthropogenically disturbed catchments.

Polar intermediate molecular weight compounds

DI-HRMS was carried out at each of the three field sites, but the extract from the Nant y Brwyn showed contamination of the sample during collection most likely caused by a rope used to lower the sample container into the river, which was necessary because of high water levels making access to the water difficult (for a full discussion of this issue see (Lloyd et al. In press). As a result, an alternative example is shown here from another well-researched comparative headwater peatland site in mid-Wales draining into Llyn Brianne at Nant Esgair Garn (hereafter Nant Esgair Garn). Yates (2021) (this volume) provide full details of the field site, its catchment and background nutrient chemistry. Figure 6 shows the DI-HRMS mass spectra recorded for the DOM extracts from a) Nant Esgair Garn, b) Priors Farm and c) Chew Stoke. The extraction efficiency was calculated by measuring the DOC concentration of the extracts following SPE and comparing it to the concentration measured in the raw filtered river water (as NPOC using a TOC-L analyser (Shimadzu, Milton Keynes, UK)). The extraction efficiency of the samples from Nant Esgair Garn and Priors Farm were both 58%, whereas it was lower for Chew Stoke, at 40.5%, meaning that roughly half of the total C in the samples never reaches the mass spectrometer, and some compounds again might not be detected due to inefficiencies in the electrospray ionisation (Patriarca et al. 2020). The ions were extracted from each mass spectrum and yielded a similar number of ions, with 3601, 3201 and 3158 ions for Nant Esgair Garn, Priors Farm and Chew Stoke, respectively. Each of these peaks have the potential to be individual compounds in the sample, although a number could represent isotope peaks of the same compound. The sample from Nant Esgair Garn shows almost a bimodal distribution of ions with a cluster of compounds between m/z 150 and 250 and another cluster between m/z 250 and 500, with major ions (defined as > 50% relative intensity) occurring at m/z values of 267.1699, 156.1019, 376.2629, 170.1174 and 163.0697. In contrast, the sample from Priors Farm shows one Gaussian-like distribution of ions from m/z 150 to 500. There were three major ions present with m/z values of 165.0705, 431.2258 and 288.1961. The total ion current for Priors Farm was the smallest of all the samples (3.2E7), despite the DOC concentration of the water and the extraction efficiency being the highest, suggesting that the compounds present were more difficult to ionise using electrospray. The sample from Chew Stoke displayed a higher total ion current (6.6E7), suggesting that it contained compounds which were easily ionisable via electrospray. The samples exhibited a gaussian-like distribution of ions from m/z 250–750, with a modal value of m/z 447.2921, however, the ions extend beyond m/z 1000. At first glance it appears like there may be significantly fewer ions present in the sample compared with the others, however, this is not the case. The main difference is that the mass spectrum is dominated by overlapping series of prominent ions with a spacing of m/z 58 units, indicative of a mixture of oligomeric components. These oligomeric series were investigated in more detail by Pemberton et al. (2020), where they were identified as 18 separate overlapping series of polypropylene glycol (PPG) (Fig. 4a x). Pemberton et al. (2020), through further interrogation of this sample using HPLC/HRMS/MS, were able to identify 22 other compounds including pharmaceuticals such as codeine, propranolol and cocaine (Fig. 4a xi, xii, xiii). The presence of the pharmaceutical and the PPG, commonly used as an additive in food and personal care products amongst other uses, reflects the influence input of DOM the sewage treatment works is having on the DOM character in the adjacent river.

Fig. 6
figure 6

DI-HRMS spectra recorded for a Llyn Brianne at Nant Esgair Garn, b River Sem at Priors Farm and c River Chew at Chew Stoke catchments

While it was beyond the scope of this paper to identify individual compounds from all the HRMS samples, this was carried out for the Chew Stoke site in our previous paper (Pemberton et al. 2020) and full details can be found therein. It was possible here to compare the number of common masses between the sites to assess the level of similarity between the DOM constituents. Figure 7 presents a Venn diagram of the three sites which shows that the three samples share 2147 ions, suggesting that these represent ubiquitous, and potentially natural, organic matter. Priors Farm only exhibited 3 unique ions in this analytical fraction but shared 114 ions with the sample from Nant Esgair Garn which were absent from Chew Stoke. Nant Esgair Garn showed 25 unique ions which did not appear in either of the other sampling sites, suggesting that these represent peat-derived DOM compounds. This site shared 148 ions with the sample collected from Chew Stoke, which were absent in the Priors Farm sample. The identification of these compounds and their likely provenance in each catchment, using source fingerprinting approaches would be interesting avenue of further study. As could be expected the largest number of unique ions was recorded for Chew Stoke (68), this is logical given the input of DOM from the sewage works which is not present at either of the other two sites. Further analysis of these samples to isolate and identify the ions unique to each of the sites would be beneficial to better understand the specific origins of DOM compounds in each stream system.

Fig. 7
figure 7

Venn diagram showing the number of unique and shared ions detected via DI-HRMS at the three field sites

Macro-molecular material

Figure 8 shows pyrograms recorded for each of the three field sites, which also highlight differences in the character of the DOM in each catchment. While py-GC/MS is a semi-quantitative technique, by comparing the size of the 5α-androstane (IS) peak in each sample the relative abundance of DOM visible by the technique at different sites can be compared. The highest concentration of macro-molecular DOM can be seen in Nant y Brwyn, followed by Priors Farm then Chew Stoke, which reflects the DOC measurements at the field sites (See Table 1). Thus, both the quantity of DOM and the composition of the pool varies greatly between sites.

Fig. 8
figure 8

Pyrograms showing the pyrolysis products from the macro-molecular DOM for a River Conwy at Nant y Brwyn, b River Sem at Priors Farm and c River Chew at Chew Stoke field sites. The superscripts on the lignin compounds denote p-hydroxyphenyl (P), guaiacyl (G) and syringyl (S) lignin units

The pyrogram from Nant y Brwyn (Fig. 8a) shows pyrolysis products after thermochemolysis, which was dominated by lignin derived compounds, which make up the majority of plant cell walls. Twenty individual compounds can be identified from the three lignin classes, determined by their phenylpropanoid units (p-hydroxyphenyl (P), guaiacyl (G) and syringyl (S) units). This illustrates the importance of plant sources contributing to the riverine DOM signal. This lignin dominated signature is typical of peat landscapes where sedges are present and has been reported previously in the literature (e.g. McClymont et al. 2011; Abbott et al. 2013). Alongside the lignin components, the two most abundant straight-chain fatty acids (C16 and C18), which were seen in the non-polar fraction are also visible. In an analysis of peat samples, Abbott et al. (2013) also reported C16 fatty acid, along with 1,2,4-trimethoxybenzene (Fig. 4a xiv), which was also seen in the present study. It should be noted that the fatty acids seen here are potentially being volatilised in the samples, rather than pyrolysis products of macro-molecular material and are likely the same source as seen in the non-polar fraction. 1,2,4-trimethoxybenzene has been shown to be produced during TMAH thermochemolysis of both carbohydrates (Fabbri & Helleur 1999) and condensed tannins (Nierop et al. 2005). A small peak assigned to methylated 4-isopropenylphenol (Fig. 4a xv) was also identified in the sample, which has been shown to be a biomarker for Sphagnum mosses (van der Heijden et al. 1997; Abbott et al. 2013). Also recorded were a small number of compounds likely of polysaccharide origin, or N-containing components, as well as toluene, phenol and 3-methyl phenol (Fig. 4a xvi, xvii, xviii), which are often reported as being of unspecific origin (Dignac et al. 2005).

In contrast to Nant y Brwyn, Priors Farm (Fig. 8b) shows a different array of pyrolysis products, the majority of which are lower molecular weight compounds. Unlike the sample from the Conwy, no additional compounds were observed when TMAH thermochemolysis was carried out. There was only one potentially lignin-derived component, 2,3-dimethoxy phenol (Fig. 4a xix) which could be gymnosperm or angiosperm monocot in origin (Ralph et al. 2004), which is likely given that Priors Farm has a grassland-dominated catchment with large areas of woodland in its headwaters. The most abundant compounds in the pyrogram are phenolic or N-containing compounds. These may be indicative of the presence of proteinaceous material in the river, as these compounds have been previous linked to polypeptides (Bracewell and Robertson 1984), which could have been introduced through intensive cattle grazing in the catchment. These compounds have also been observed in the top soil layers sampled from permanent pasture (Nierop et al. 2001) and have been attributed to amino acids, amino sugars, proteins and polypeptides (e.g. Sorge et al. 1993; Leinweber and Schulten 1998; van Bergen et al. 1998) There were also more polysaccharide derived products such as 2,5-dimethylfuran and 3-Methyl-1H-pyrrole (Fig. 4a xx, xxi), which could be of plant or animal waste origin. Overall, the DOM pool at Priors Farm contains a higher proportion of microbially-degraded, lower molecular weight compounds, and contains key compounds indicative of anthropogenic disturbance of land cover, and intensive agricultural production in its catchment.

The DOM pool sampled at the Chew Stoke yielded the lowest number of pyrolysis products suggesting the presence of less macromolecular material compared with the other two field sites. While there were a small number of compounds which were likely to be of protein (e.g. benzonitrile, phenol (Fig. 4a xxii, xvii)) or polysaccharide (e.g. 3-furaldehyde, 5-methyl 2-furancarboxaldehyde (Fig. 4a xxiii, xxiv)) origin, the pyrogram was dominated a series of dicarboxylic acids from C4 through to C8. They were the most abundant compounds aside from the straight-chain fatty acids (C16 and C18), which were ubiquitous across the three sites. This series of dicarboxylic acids is most likely anthropogenic in origin, illustrating the impact that the processed sewage effluent is having on the DOM character in the river at this site. The exact origin of the dicarboxylic acids present, namely succinic, glutaric, adipic, pimelic, suberic and azelaic acids, is currently unknown. However, these compounds are used in manufacture of synthetic polymers and additives for plastics and can be thermal and photo-oxidation degradation products of polyethylene (Hakkarainen and Albertsson 2004). It is therefore possible that these compounds have been produced simply as pyrolysis products of micro- or nano-plastics within the waterbody, or during treatment processes at the STW. Dicarboxylic acids have been shown to be abundant in aerosol samples associated the burning of plastic-enriched waste (Kumar et al. 2015) which may well occur in this populated catchment. Alternatively, the compounds may be the pyrolysis products of already partially degraded plastic which has been the subject of photo- and microbially mediated-oxidation potentially during the process of sewage treatment. This theory of the compounds being polymer-derived is also supported by data presented by Pemberton et al. (2020) who reported a number of oligomeric series in the polar DOM fraction at the same site.

Implications and future directions

The use of a range of complementary methods, as presented here, allows the analytical window to be widened to provide a more comprehensive picture of the character of DOM present in rivers. The data presented from the three contrasting case studies show that the techniques were able to identify a wide range of riverine DOM compounds, some of which would not have been identified through targeted analysis. The analysis of the non-polar fraction revealed some important diagnostic biomarker compounds, which reflected the vegetation and land cover and nature of anthropogenic disturbance, if any, in the catchment for each site. By using the distribution of sterols and stanols, faecal inputs were detected, and the specific source could be distinguished. The non-polar fraction is often overlooked when characterising DOM because compounds in this class are hydrophobic and therefore usually associated with the analysis of sediment or suspended sediment, however, they provide useful supporting information concerning DOM sources. The biomarkers identified in this fraction can provide unequivocal evidence for the contributing sources of the DOM, for example in the nature of the faecal indicator compounds as illustrated in this study.

The analysis of amino acids and carbohydrates provides information about important building block compounds, which are key ecosystem drivers. They are generally quite low in abundance in the dissolved fraction; however, this is likely because they are turned over very quickly by the stream biota, rather than an indication that they are of low importance. Both the overall quantity and the distribution of amino acids and carbohydrates were shown to vary across the range of field sites providing insights into the potential for DOM turnover by microbial communities as well as the contribution of plant and microbial sources to the carbohydrate pool.

Using DI-HRMS the complexity of the intermediate molecular weight (mostly m/z 150- 800) polar fraction was clearly shown. Thousands of ions were detected, and by comparing the distribution of ions between the three field sites many common molecular species were observed. This provides useful information about DOM components which are likely to be ubiquitous across different landscapes. In addition, ions detected which were unique to particular field sites and these ions would make interesting targets for identification and as specific source indicators in support of any source apportionment study.

The characterisation of the riverine DOM was further extended in our approach by using py-GC/MS to identify the sources of macro-molecular material within the DOM pool. Pyrolysis is a technique which is mainly used to analyse solid samples with the exception of lignin phenols in river water, particularly soils, sediments and vegetation yielded some surprising products. The advantage of using the freeze-dried residue of the whole water sample means that the pre-concentration step is not selective in the DOM recovery and therefore provides a more complete picture of the macro-molecular material within the river water samples.

While the application of this suite of methodologies has improved our ability to characterise DOM there are a number of remaining challenges in this field still to be addressed. One of the most fundamental of these is quantification. Being able to understand DOM quantitatively as well as qualitatively is the next logical step but for many of the complex mixtures described here it is non-trivial. The non-polar fraction could be quantified using a carefully chosen range of internal standards once the type of compounds present is known. The amino acid and carbohydrate fractions can be quantified using a combination of internal and external standards whereby the response factor for each individual compound in the FID can be ascertained, because each will respond differently on combustion. The greatest quantification challenges however remain when considering the DI-HRMS and pyrolysis methods. In the case of electrospray ionisation for DI-HRMS the response seen in the mass spectrometer will be controlled by the ability of each compound to ionise, therefore compounds may be present which are very abundant in the sample but difficult to ionise and, thus show a low signal when analysed and vice versa. There are several approaches which could be attempted to tackle these quantification challenges. One is using an external calibration curve run on the instrument in the same batch as the samples. However, this would have to be done for each compound of interest. Another approach would be to use standard addition, whereby samples are analysed twice with and without a spike of known concentration of target compounds and the concentration calculated based on the change in response between the mass spectra. This method also requires target compounds for quantification; however, it has the advantage that you can target multiple compounds per analytical run. The standard addition approach also has the advantage of taking account of matrix effects which may supress or enhance molecular responses. Once quantification can be robustly carried out the power of these analytical techniques will be enhanced further, and research questions based around source apportionment and ecological impact can be better addressed.

Further to quantification, there is also the challenge of defining the remaining uncharacterised DOM. There are a number of candidate compounds classes which could be added to the suite of techniques discussed herein, for example dissolved black carbon. The identity of the remaining DOM composition is a significant challenge for the scientific community to address as research questions demand. Also, more work is necessary to extend our understanding of the exact origins of the compounds seen in the analyses performed here. For example, in the non-polar fraction is the pool limited to free fatty acids? or are there ester-bound moieties present which could be captured using an additional hydrolysis step.

The other important remaining question relates to the current definition of DOM, that is, any material which passes through a 0.45 µm filter. In reality, it can be argued that organic matter should be considered a continuum, ranging from particulates of various sizes through to colloidal material and then eventually compounds that are truly dissolved. There are examples of compounds identified in this paper, for example the faecal biomarker compounds in the non-polar fraction that mostly likely exist not truly dissolved in the water but exist either as free-floating organics or associated with colloidal particles in the sub-0.45 µm fraction. With this in mind, along with the fact that the particulates are known to play an important role in nutrient cycling and ecosystem function (see for example Mena-Rivera et al. (2022) for a detailed discussion) perhaps it is most appropriate to move away from the concept of operationally defined DOM and consider all the organic matter within a particular sample window. The methodologies presented in this paper can be applied directly, or with small modifications, to particulate samples or to unfiltered river water samples containing particles along with dissolved components to extend further the analytical window.

Conclusions

Overall, these contrasting case studies illustrate the analytical power of a complementary suite of approaches to characterise DOM pool composition in environmental matrices, specifically here for river water samples. A similar approach has been used previously for expanding the characterisation of soil organic matter (Poirier et al. 2005). Figure 9 shows a schematic of how the theoretical analytical windows of methods used herein fit together, with some compounds being identifiable by multiple techniques. The diagram also illustrates that although using this approach has greatly extended our ability to comprehensively characterise the DOM pool, some components will remain uncharacterised. However, the data presented here clearly demonstrates the applicability of the methods to distinguish DOM source character and contributions as these vary according to environmental character and anthropogenic disturbance. The identification of unexpected compounds, for example the dicarboxylic acids and PPG oligomers at Chew Stoke clearly illustrate the advantage of adopting this type of analytical approach over targeting specific compounds known to be of importance. In addition, the suite of approaches presented here, in combination with the characterisation of the polar fraction as presented in Pemberton et al. (2020) highlights the extreme complexity of the DOM pool at molecular level and therefore how important it is to characterise this pool robustly in order to better understand its composition and likely role in stream ecosystems and their metabolic function. Furthermore, through the elucidation of the DOM character in stream and source samples using the methods presented here, information can be collected about the likely sources of DOM within a catchment, enabling the targeting of mitigation measures at key contributing sources within any catchment nutrient management plan. The identification of individual key compounds within the DOM pool at a particular site can also be used in the design of robust and informative bioassay experiments, to understand the ecosystem responses to instream DOM as this varies not only in space and time, but in terms of its molecular composition. This is critical if we are to fully understand the role of DOM as a bioavailable nutrient resource in freshwater ecosystems.

Fig. 9
figure 9

Schematic of the analytical windows of the methods applied using the hierarchical MS analytical approach (Note that the size of the circles is not proportional to the amount of DOM represented)