Identification of levoglucosan degradation pathways in bacteria and sequence similarity network analysis

Kaur, Arashdeep; Scott, Nichollas E.; Herisse, Marion; Goddard-Borger, Ethan D.; Pidot, Sacha; Williams, Spencer J.

doi:10.1007/s00203-023-03506-y

Identification of levoglucosan degradation pathways in bacteria and sequence similarity network analysis

Original Article
Open access
Published: 31 March 2023

Volume 205, article number 155, (2023)
Cite this article

Download PDF

You have full access to this open access article

Archives of Microbiology Aims and scope Submit manuscript

Identification of levoglucosan degradation pathways in bacteria and sequence similarity network analysis

Download PDF

Arashdeep Kaur^1,2,
Nichollas E. Scott³,
Marion Herisse³,
Ethan D. Goddard-Borger^4,5,
Sacha Pidot³ &
…
Spencer J. Williams ORCID: orcid.org/0000-0001-6341-4364^1,2

2063 Accesses
Explore all metrics

Abstract

Levoglucosan is produced in the pyrolysis of cellulose and starch, including from bushfires or the burning of biofuels, and is deposited from the atmosphere across the surface of the earth. We describe two levoglucosan degrading Paenarthrobacter spp. (Paenarthrobacter nitrojuajacolis LG01 and Paenarthrobacter histidinolovorans LG02) that were isolated from soil by metabolic enrichment using levoglucosan as the sole carbon source. Genome sequencing and proteomics analysis revealed the expression of a series of genes encoding known levoglucosan degrading enzymes, levoglucosan dehydrogenase (LGDH, LgdA), 3-keto-levoglucosan β -eliminase (LgdB1) and glucose 3-dehydrogenase (LgdC), along with an ABC transporter cassette and an associated solute binding protein. However, no homologues of 3-ketoglucose dehydratase (LgdB2) were evident, while the expressed genes contained a range of putative sugar phosphate isomerases/xylose isomerases with weak similarity to LgdB2. Sequence similarity network analysis of genome neighbours of LgdA revealed that homologues of LgdB1 and LgdC are generally conserved in a range of bacteria in the phyla Firmicutes, Actinobacteria and Proteobacteria. One group of sugar phosphate isomerase/xylose isomerase homologues (named LgdB3) was identified with limited distribution that is mutually exclusive with LgdB2, and we propose that they may fulfil a similar function. LgdB1, LgdB2 and LgdB3 adopt similar predicted 3D folds, suggesting overlapping function in processing intermediates in LG metabolism. Our findings highlight diversity within the LGDH pathway, through which bacteria utilize levoglucosan as a nutrient source.

Comparative genomics and metabolic profiling of the genus Lysobacter

Article Open access 23 November 2015

Metagenomics study to compare the taxonomic composition and metabolism of a lignocellulolytic microbial consortium cultured in different carbon conditions

Article 24 March 2022

Complete Genome Sequence of Pedobacter sp. PAMC26386 and Their Low Temperature Application in Arabinose-containing Polysaccharides Degradation

Article 26 February 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Levoglucosan (LG; 1,6-anhydro-β-d-glucose) is a bicyclic anhydride of d-glucose (McGill and Williams 2008). It is formed by a range of chemical processes including pyrolysis of glucose (Sugisawa and Edo 1966), starch and cellulose (Lakshmanan et al. 1969; Itabaiana Junior et al. 2020). LG is formed from the burning of cellulose above 300 °C (Simoneit et al. 1999) and is produced by the burning of biomass (Suciu et al. 2019). Emissions of LG into t he atmosphere arise from burning of plant matter in open fires (1.7 million tonnes per annum) and biofuels (2.1 million tonnes per annum) and contributes to aerosols in the troposphere that affect air quality (Li et al. 2021). Consequently, LG can be used as a molecular marker for biomass burning and as a measure of air quality (Li et al. 2021), and has been used to trace ancient fire histories (paleo-fire records) in glacier snow and ice (You and Xu 2018). While LG is short lived in the atmosphere and degrades with a lifetime of 1.8 days, an estimated 0.7 million tonnes of LG are deposited annually through dry deposition and in rain: ‘sugar from the heavens’ (Li et al. 2021). Consequently, soils across the earth likely experience continuous, low levels of LG deposition, irrespective of proximity to biomass combustion processes or precipitation.

Microorganisms that can grow on LG as sole carbon source are readily recovered from soils (Kitamura et al. 1991; Yasui et al. 1991; Nakahara et al. 1994; Iwazaki et al. 2018) including those associated with wood burning (Arya et al. 2022), as well as plant leaves (Iwazaki et al. 2018), and pond (Iwazaki et al. 2018) and waste water (Arya et al. 2022), suggesting that LG degrading organisms are widespread in Nature. While humans and mice do not metabolize LG (Migliaccio et al. 2009; Bergauff et al. 2010), pathways for its degradation have been elucidated in fungi and bacteria. Fungi appear to exclusively use a pathway involving LG kinase, an ATP-dependent enzyme that converts LG in one step to glucose-6-phosphate (Kitamura and Yasui 1991; Bacik et al. 2015). On the other hand, bacteria use a multistep pathway (Yasui et al. 1991) that was first elucidated in Bacillus smithii S-720 M (Kuritani et al. 2020) (Fig. 1). In this organism, LG metabolism is initiated by the action of NADH-dependent LG dehydrogenase (LGDH, also known as LgdA), which forms 3-keto-LG. LGDH from Pseudarthrobacter phenanthrenivorans Sphe3 has been studied extensively, including determination of its 3D structure, identification of its active site, and the binding mode of LG (Sugiura et al. 2018). The 1,6-anhydro ring of 3-keto-LG is cleaved to give 2-hydroxy-3-keto-d-glucal, catalysed by 3-keto-LG β -eliminase (LgdB1), and this is hydrated to give 3-keto-d-glucose, catalysed by 3-ketoglucose dehydratase (LgdB2). Finally, 3-keto-d-glucose is reduced to d-glucose, catalysed by NADH-dependent glucose 3-dehydrogenase (LgdC). LgdB2 is a bifunctional enzyme that can also catalyse the conversion of 3-keto-LG to 2-hydroxy-3-keto-d-glucal directly. Overall, this complex pathway involves 3–4 enzymes to achieve the formal hydrolysis of LG. This pathway bears some similarity to glycoside hydrolases of carbohydrate active enzyme family GH4 (Yip et al. 2004) and GH109 (Liu et al. 2007), which in a single step catalyse glycoside hydrolysis in an NADH-dependent manner through a redox mechanism involving a 3-keto sugar intermediate (Yip and Withers 2006), and the 2,7-anhydro-N-acetylneuraminic acid hydrolase from Ruminococcus gnavus, which cleaves this anhydro sugar via a 4-keto sialic acid intermediate (Bell et al. 2020). Possibly, the requirement of multiple enzymes for the net hydrolysis of LG relates to the profound conformational change that occurs when the 1,6-anhydro ring of LG is cleaved, meaning that different enzymes are required to recognize the substrate and intermediates in the pathway.

All LG-degrading bacteria studied at a genetic level contain close homologs of lgdA, and previous work has sought to identify genes involved in LG transport and metabolism by analysis of the surrounding genomic regions. In the case of Bacillus smithii S-270 M, a genomic region containing lgdA-lgdB1-lgdB2-lgdC also contains genes encoding ABC transporter subunits and an ABC solute binding protein (Kuritani et al. 2020). However, other LG degrading organisms such as P. phenanthrenivorans Sphe3, Paenibacillus athensensis MEC069, Shinella sumterensis MEC087, Microbacterium MEC084, and Klebsiella pneumoniae MEC097 contain lgdA-lgdB1 and genes encoding ABC transporter subunits and substrate binding proteins, but homologs of lgdB2 and lgdC were not identified (Arya et al. 2022), raising questions as to whether other enzymes may be used for the final steps of LG metabolism.

Here, we isolate two LG degrading bacteria from soil and report their draft genome sequences. Both organisms contain close homologs of lgdA and lgdC that encode the enzymes that catalyse the first and last steps of the bacterial LG degradation pathway, but lack the full complement of homologues for the intervening genes. We perform comparative proteomics to identify the genes that are upregulated upon growth on LG, and use sequence similarity network analysis to study the genome neighbourhoods surrounding lgdA. We identify a range of putative sugar phosphate isomerase/xylose isomerases with weak similarity to LgdB2 that have a distribution that is mutually exclusive with LgdB2, and which we suggest may fulfil a similar function. Our findings highlight the diversity of bacterial LGDH pathways through which bacteria utilize levoglucosan as a nutrient source.

Materials and methods

Bacterial growth media

LG growth media was prepared using M9 salt medium, trace metal solution and vitamin mixture and 5 mM levoglucosan as sole carbon source. M9 minimal medium consisted of 12.8 g/L Na₂HPO₄, 3 g/L KH₂PO₄, 1 g/L NH₄Cl, 0.5 g/L NaCl, 0.6 g/L MgSO₄, 0.22 g/L CaCl₂.2H₂O salts and trace metal solution consisted of 14.4 mg/L FeCl₃, 4.0 mg/L CoCl₂.6H₂O, 2.9 mg/L CuCl₂.2H₂O, 3.0 mg/L MnCl₂.4H₂O, 15.0 mg/L ZnCl₂, 0.2 mg/L NiCl₂, 3 mg/L Na₂MoO₄.6H₂O and 6 mg/L H₃BO₃. Vitamin mixture consisted of 2 µg/L biotin, 5 µg/L calcium pantothenate, 10 µg/L thiamine hydrochloride, 10 µg/L p-aminobenzoic acid, 20 µg/L nicotinic acid, 50 µg/L pyridoxamine dihydrochloride, and 4 µg/L B12 (cyanocobalamin).

Isolation of levoglucosan degrading species

Paenarthrobacter nitrojuajacolis LG01 and Paenarthrobacter histidinolovorans LG02 were isolated from enriched culture, obtained from granular soil sampled from the A horizon, Royal Park (Melbourne, Australia). There was no apparent fire history of the sampled soil.

Approximately 1 g of soil samples were suspended in 5 mL of sterilized LG growth media containing 5 mM of LG as sole carbon source. The culture was incubated at 28 ℃ for 5 days with continuous shaking at 250 rpm. A subsample of 50 µL was supplemented into freshly prepared levoglucosan media and grown for a further 3 days. This step was repeated 5 times and after the 5th transfer, cells were plated on LB agar plates consisting of 10 g/L tryptone, 5 g/L NaCl, 5 g/L yeast, 15 g/L agar and incubated overnight at 28 ℃ in the dark. Single colonies were picked and used to inoculate 2 mL of fresh LG growth media and incubated at 28 ℃ with continuous shaking at 250 rpm. Then, cells were plated and re-inoculated into fresh liquid LG growth media 2 more times. Stock cultures were prepared by addition of 10% glycerol and stored at − 80 ℃. Genomic DNA of LG degrading species was extracted using the GenElute DNA extraction kit (Sigma) with inclusion of lysozyme and RNAase.

Phenotypic assays

Strains LG01 and LG02 were grown to visible turbidity in LG growth media containing 20 mM of levoglucosan at 28 ℃ for 3 days with continuous shaking at 250 rpm. Cultures were centrifuged at 10,000 rpm for 10 min (using Sigma laborzentrifugan model 1–15, rotor 12,124) and the supernatant liquid was diluted with 50% D₂O. ¹³C-NMR spectra of diluted supernatant liquid was recorded using a 500 MHz instrument.

Genome sequence, assembly, and annotation

Genomic DNA was sequenced using an Illumina MiSeq at the Peter Doherty Institute for Infection and Immunity, Parkville, Victoria, Australia. DNA was prepared for sequencing on the Illumina MiSeq platform using the Nextera XT DNA preparation kit (Illumina) with × 300 bp paired end chemistry and with a targeted sequencing depth of > 50 × . Draft genomes were assembled using Shovill v1.1.0 (https://github.com/tseemann/shovill) and annotated using Prokka v1.14.5 (Seemann 2014). GC percentage and ANI calculations were performed using ANI calculator (https://www.ezbiocloud.net/tools/ani) (Yoon et al. 2017). Taxonomic annotations used 16S rRNA sequences extracted from the assembled genomes as a query in a BLASTn search against the NCBI 16S rRNA database. Assembled genomes have been deposited at the NCBI (Bioproject accession: LG01, PRJNA925556; LG02, PRJNA925557). General features for isolated bacteria are reported in Table 1.

Table 1 Key features of the LG01 (BioProject accession: PRJNA925556) and LG02 (BioProject accession: PRJNA925557) draft genome assemblies

Full size table

Sample preparation for proteomic analysis

Cells were grown to mid-log phase and washed 3 times with phosphate buffered saline (PBS), collected by centrifugation at 10,000 rpm, and then snap frozen in liquid nitrogen and stored at − 80 °C. Pelleted snap frozen bacterial cells were solubilized in 4% SDS, 100 mM HEPES by boiling for 10 min at 95 °C. Protein concentrations were assessed using the bicinchoninic acid protein assay (Thermo Fisher Scientific) and 100 mg of each independent replicate prepared for digestion using Micro S-traps (Protifi, USA) according to the manufacturer’s instructions. Samples were reduced with 10 mM DTT for 10 min at 95 °C and then alkylated with 40 mM iodoacetamide in the dark for 1 h. Samples were acidified to 1.2% phosphoric acid and diluted with seven volumes of S-trap wash buffer (90% methanol, 100 mM tetraethylammonium bromide, pH 7.1), then were loaded onto S-traps and washed 3 times with 400 ml of S-trap wash buffer. Samples were then digested with 3 mg of trypsin (a 1:33 protease/protein ratio) in 100 mM tetraethylammonium bromide for 4 h at 37 °C, then were collected by centrifugation with washes of 100 mM tetraethylammonium bromide, followed by 0.2% formic acid followed by 0.2% formic acid / 50% acetonitrile. Samples were dried and further cleaned up using C18 Stagetips to ensure the removal of particulate matter (Rappsilber et al. 2003, 2007).

Reverse phase liquid chromatography–mass spectrometry for proteomics

Prepared purified peptides from each sample were re-suspended in Buffer A* (2% acetonitrile, 0.01% trifluoroacetic acid) and separated using a two-column chromatography setup composed of a PepMap100 C₁₈ 20-mm by 75-mm trap and a PepMap C₁₈ 500-mm by 75-mm analytical column (Thermo Fisher Scientific). Samples were concentrated onto the trap column at 5 ml/min for 5 min with Buffer A (0.1% formic acid, 2% DMSO) and then infused into an Orbitrap Eclipse Mass Spectrometer (Thermo Fisher Scientific) or an Orbitrap Fusion Lumos equipped with a FAIMS Pro interface both at 300 nl/min via the analytical columns using a Dionex Ultimate 3000 UPLC (Thermo Fisher Scientific). For samples infused into the Orbitrap Fusion Lumos, 125-min analytical runs were undertaken by altering the buffer composition from 2% Buffer B (0.1% formic acid, 77.9% acetonitrile, 2% DMSO) to 22% B over 95 min, then from 22 B to 40% B over 10 min, then from 40% B to 80% B over 5 min. The composition was held at 80% B for 5 min, and then dropped to 2% B over 2 min, then held at 2% B for another 8 min. For samples infused into the Orbitrap Eclipse 95-min analytical runs were undertaken by altering the buffer composition from 2% Buffer B (0.1% formic acid, 77.9% acetonitrile, 2% DMSO) to 22% B over 65 min, then from 22% B to 40% B over 10 min, then from 40% B to 80% B over 5 min. The composition was held at 80% B for 5 min, and then dropped to 2% B over 2 min, then held at 2% B for another 8 min. The Orbitrap Eclipse mass spectrometer was operated in a data-dependent mode automatically switching between the acquisition of a single Orbitrap MS scan (375–1500 m/z and a resolution of 120 k) and 3 s of Orbitrap MS/MS HCD scans of precursors (NCE of 30%, a maximal injection time of 22 ms, and a resolution of 15 k). The Fusion Lumos Mass Spectrometer was operated in a stepped FAIMS data-dependent mode at two different FAIMS CVs − 40 and − 60. For each FAIMS CV a single Orbitrap MS scan (300–1600 m/z and a resolution of 60 k) was acquired every 1.5 s followed by Orbitrap MS/MS HCD scans of precursors (Stepped NCE 28,32,37%, maximal injection time of 100 ms and a resolution of 30 k).

Proteomic data analysis

Identification and LFQ analysis were accomplished using MaxQuant (v2.0.2.0) (Cox and Mann 2008) using the in house generated proteomes of LG01 and LG02 allowing for oxidation on methionine. Prior to MaxQuant analysis datasets acquired on the Fusion Lumos were separated into individual FAIMS fractions using the FAIMS MzXML Generator (Hebert et al. 2018). The LFQ and “Match Between Run” options were enabled to allow comparison between samples. The resulting data files were processed using Perseus (v1.4.0.6) (Tyanova et al. 2016) to compare the growth conditions using Student’s t-tests as well as Pearson correlation analyses.

Bioinformatic analysis

The LGDH sequence from isolated strain LG01 was submitted as query to phmmer web server (https://www.ebi.ac.uk/Tools/hmmer/search/phmmer) (Potter et al. 2018). Standard parameters were used except the cut-off for E-value was set to 10⁻¹³⁰. The resulting sequences were used to build a hidden Markov model (HMM), which was used as query for hmmsearch in hmmer web server (https://www.ebi.ac.uk/Tools/hmmer/search/hmmsearch). The results were filtered and protein sequences with E-value ≤ 10 ⁻²⁰⁰ (see Supplementary file 2) were used to generate a sequence similarity network (SSN) using the Enzyme Function Initiative-Enzyme Similarity Tool (https://efi.igb.illinois.edu/efi-est/) with minimum alignment score 120 (corresponding to ≥ 60% identity) and 170 (corresponding to ≥ 70% identity; see Supplementary file 3) (Zallot et al. 2019). The SSN generated using the alignment score 120 was used to generate genome neighbourhood diagrams (GND) with open reading frame (ORF) ± 10 neighbours using the Enzyme Function Initiative-Genome Neighbourhood Tool (https://efi.igb.illinois.edu/efi-gnt/). A script was used to extract the accession codes of the retrieved neighbours (Fig. S1). The LGDH neighbours were used to generate an SSN of neighbours (SSNN; see Supplementary file 4). The resulting SSNN was plotted using Cytoscape v3.9, with minimum alignment score of 50.

The individual nodes of the LGDH SSN were manually coloured according to the occurrence of lgdB1, lgdB2, lgdB3 and lgdC gene homologues in their neighbourhood (ORF ± 10 neighbours). Initially, the EFI tool was used to generate a cluster number for each unique isofunctional cluster within the LGDH SSNN. The cluster number of the neighbourhood genes from the SSNN for lgdB1, lgdB2, lgdB3 and lgdC were mapped onto the LGDH SSN (alignment score of 170).

Structural analysis of LG degrading proteins

3D structural models were predicted for LgdA, LgdB1, LgdB2, LgdB4, LgdB4 and LgdC proteins using AlphaFold2 (Jumper et al. 2021; Varadi et al. 2022).

Results

Isolation and characterization of bacteria that can grow on LG

Bacteria able to grow on LG were selected using soil samples (collected from the South-East corner of Royal Park, Melbourne) to inoculate a vitamin-enriched minimal media containing LG as sole carbon source. Sequential subculturing into fresh LG-minimal media, followed by repeated plating onto LB agar and then regrowth in LG-minimal media, led to isolation of strains LG01 and LG02 (Table S1). Strains LG01 and LG02 grew robustly on LG with peak growth rates of 0.073 and 0.288 A₆₀₀/min and achieved stationary phase after approximately 82 and 18 h, respectively (Fig. S2). The absorbance at stationary phase for cultures grown on LG were approximately the same for cultures grown on equimolar glucose. ¹³C-NMR analysis of spent culture medium of LG01 and LG02 grown on LG (20 mM) media did not reveal any carbon containing metabolic products.

To investigate the genetic basis for LG consumption by LG01 and LG02, genomic DNA extracted from these bacteria were sequenced using the Illumina NextSeq platform. Table 1 shows the key features of the two draft genomes. Based on 16S rRNA gene sequence analysis the organisms were assigned as Paenarthrobacter nitrojuajacolis LG01 and Paenarthrobacter histidinolovorans LG02.

Identification of genetic loci expressed upon growth on LG

To identify possible proteins involved in LG metabolism we examined changes in the proteome of LG01 and LG02 cultivated on glucose versus LG at mid-log phase. For LG01, the largest and most significant changes we observed for LG growth were an increase in the abundance of 19 proteins encoded in two distinct regions in the genome of LG01 (Figs. 2a, 3a). For LG02, the largest and most significant changes were observed in 12 proteins encoded in three genomic regions (Figs. 2b, 3b). The first genomic region identified in LG02 is situated at the start of a contig, suggesting it is incomplete.

SSN analysis of LG metabolizing genetic loci

The LgdA sequence from P. nitrojuajacolis LG01 was used as a query for a BLAST search on the phmmer webserver, which retrieved 379 sequences with E-value < 10⁻¹³⁰. These were used to build a hidden Markov model (HMM) that was used as a query in the hmmsearch server. This retrieved 631 sequences with E-value < 10⁻¹³⁰, which were filtered to E-value < 10⁻²⁰⁰ (percentage identity > 60%) to give 265 sequences. These sequences were used to build an SSN using the Enzyme Function Initiative tools (https://efi.igb.illinois.edu/). At an alignment score of > 170, this SSN breaks into subclusters that are closely aligned with taxonomy (Fig. 4a). The sequences were used to retrieve the genome neighbourhoods (± 10 open reading frames around lgdA), providing 4877 neighbours (for representative examples, see Fig. 4b). The lgdA neighbours were used to build an SSN of neighbours (SSNN) (Fig. 5, Fig. S3).

Discussion

Bacteria able to grow on LG were selected by sequential subculturing of media containing LG as sole carbon source and inoculated with soil. ¹³C NMR analysis revealed complete consumption of LG and no apparent carbon containing intermediates, supporting their ability to completely consume all the carbon in LG. Genome sequencing and analysis of the 16S rRNA revealed these to be Paenarthrobacter nitrojuajacolis LG01 and Paenarthrobacter histidinolovorans LG02. Comparative proteomics of samples of each bacteria grown on glucose or LG allowed identification of groups of genes that were upregulated in the LG-grown bacteria (Fig. 2). The identified regions within both LG01 (Fig. 3a) and LG02 (Fig. 3b) encode close homologs of P. phenanthenivorans Sphe3 and B. smithii S-270 M LGDH proteins with 96.7% and 70.1% identity (for LG01) and 97.9% and 70.7% (for LG02), respectively. Both organisms also encode homologs of LgdB1 and LgdC, but with lower percentage identity (Tables S3 and S4). Collectively, the presence of genes encoding LgdA, LgdB1 and LgdC provides strong evidence for canonical LGDH pathways. Neither organism contains genes encoding homologs of LgdB2. Given that the genomic region encoding LG metabolism in LG02 appears incomplete, we focussed our attention on LG01.

Previous work demonstrated that B. smithii contains four genes lgdA-lgdB1-lgdB2-lgdC that between them contribute to degradation of LG (Kuritani et al. 2020), whereas Arya and co-workers identified several organisms, such as P. athensensis MEC069 and Microbacterium MEC084 that appeared to contain only lgdA-lgdB1 (Arya et al. 2022). On the other hand, LG01 contains lgdA-lgdB1-lgdC. LgdA and LgdC belong to the Gfo/Idh/MocA superfamily, which contains NAD(P)-dependent oxidoreductases that act on a wide range of hydroxylated substrates (Taberman et al. 2016). LgdB1 and LgdB2 belong to protein family PF01261 within the PFAM database, a large family of proteins that contain diverse activities including inositol catabolism (Fry et al. 2001).

To gain insight into the structure and composition of LgdA-encoding genomic regions, we used SSN analysis. SSN analysis uses all-by-all pairwise sequence alignments to generate networks that show sequence relationships as a graph (Atkinson et al. 2009). We retrieved 265 sequences by an HMM search approach. The SSN was constructed by choosing an alignment score that allowed clustering based on taxonomy. Using the Enzyme Function Initiative tools (https://efi.igb.illinois.edu/), the 265 LgdA sequences were used to retrieve genome neighbourhoods (± 10 ORFs) surrounding each lgdA gene. Examples of genome neighbourhood diagrams for bacteria from different phyla are provided in Fig. 4b. Each of these genomic regions show conservation of lgdB1, but not of lgdB2 or lgdC (vide infra).

The genomic regions surrounding each lgdA gene (± 10 ORFs) were used to construct an SSN of neighbours (SSNN) (Fig. 5, Fig. S3). The SSNN contains isofunctional protein clusters that allow analysis of the common neighbours of lgdA. The SSNN reveals three large protein clusters that contain components of ABC transporters that are presumably dedicated to the importation of LG: ATP binding protein, substrate binding protein (for binding LG), and ABC transporter permease; along with several smaller clusters containing putative multifunction transporters (MFS), which presumably also facilitate binding and import of LG. Multiple clusters are present that correspond to ROK, LytTR, PucR and LacI type transcription factors.

Numerous oxidoreductase clusters are present in the SSNN. Three clusters, including the largest oxidoreductase cluster, belong to PFAM protein families PF01408 and PF02894, and correspond to the nicotinamide binding and C-terminal domains of LgdC/LgdA. The LgdC sequences from B. smithii, and LG01 and LG02 are all located in the largest oxidoreductase cluster. Six clusters belong to PF01261, are annotated as sugar phosphate isomerase or xylose isomerase/epimerase. LgdB1 from B. smithii, LG01 and LG02 are present within the largest of these clusters, which contains proteins of around 275 residues. LgdB2 from B. smithii is located within a smaller cluster, which contains proteins of around 325 residues. The remaining four clusters do not contain any characterized members. The genomes of LG01 and LG02 encode proteins that fall into two of these orphan clusters, which we have named LgdB3 (71 sequences in cluster) and LgdB4 (4 sequences in cluster).

To visualize the organization and composition of the LG degrading gene groups we coloured the SSN of LgdA homologues according to the presence of lgdB1, lgdB2, lgdB3 and lgdC in their genome neighbourhood (Fig. 6a–d). Almost all LgdA homologs (253 out of 265) have LgdB1 encoded in their gene neighbourhood, while only a minority have LgdB2 (57 out of 265) or LgdB3 (71 out of 265) encoded in their vicinity. A Venn diagram reveals that: (1) 126 organisms contain only LgdB1; (2) LgdB2 always co-occurs with LgdB1 but never with LgdB3; and (3) LgdB3, with one exception, co-occurs with LgdB1 (Fig. 6e). LgdB1 is widely distributed, occurring in Actinobacteria, Firmicutes, Proteobacteria, Deinococcus and Bacteroidetes. LgdB2 is more restricted, occurring mainly in Firmicutes and one Deinococcus. LgdB3 occurs Actinobacteria and Proteobacteria. Collectively, these data suggest that LgdB2 and LgdB3 may have overlapping function, but are not essential for LG metabolism. The rarer LgdB4 sugar phosphate isomerase is encoded within just 4 Actinobacteria, and co-occurs with only LgdA and LgdB1.

Approximately half (151 out of 265) of LgdA-containing organisms have a gene for LgdC in their genome neighbourhood. It has been suggested that LgdA may possess dual activity, allowing it to function as an oxidoreductase capable of both oxidizing LG and reducing 3-keto-glucose, which could explain the absence of LgdC in some bacteria (Arya et al. 2022). LgdC is distributed across Actinobacteria, Firmicutes, Proteobacteria and Deinococcus.

Only a single 3D structure has been experimentally determined for an LG-degrading protein in the LGDH pathway, LGDH from P. phenanthrenivorans Sphe3 (PDB 6A3I) (Sugiura et al. 2018). Comparison of this structure with the AlphaFold2 model for LgdA from LG01 shows almost identical 3D fold (RMSD 0.252 Å; Fig. 7a), and the presence of nicotinamide cofactor and LG binding sites (Fig. 7b). A comparison of the AlphaFold2 models for B. smithii and LG01 LgdB1 proteins shows similar 3D folds (RMSD 0.43 Å) (Fig. 7c), suggestive of similar function. The peptide chain of LgdB1 is much shorter than LgdB2, LgdB3 and LgdB4, and the comparisons of AlphaFold models showed low similarity and high RMSD values (Table S4b). Overlay of B. smithii LgdB2, LG01 LgdB3 (RMSD 1.34 Å versus B. smithii LgdB2), and LgdB4 (RMSD 3.29 Å versus B. smithii LgdB2) shows broadly similar (αβ)₈ barrel folds, but weaker structural similarity (Fig. 7d). The weaker structural similarity is difficult to interpret in the absence of structural studies to determine the molecular basis of substrate recognition and catalysis, but these proteins could have a similar function, may act on modified intermediates in the pathway (e.g. 6-phospho-2-hydroxy-3-keto-d-glucal, formed by hexokinase), or may not be involved in the pathway.

A comparison of the AlphaFold2 models of B. smithii and LG01 LgdC reveals similar 3D folds (RMSD 0.64 Å) (Fig. 7e). Overlay of LG01 LgdC and the P. phenanthrenivorans LgdA in complex with NADH gave a model that showed LgdC could accommodate NADH, consistent with its demonstrated function as an NADH-dependent glucose 3-dehydrogenase (Fig. 7f).

Conclusions

Two LG degrading soil bacteria belonging to the Paenarthrobacter genus (strains LG01 and LG02) were isolated by enrichment culture involving growth on LG. These organisms effect the complete degradation of LG, and give a stationary phase optical density on LG that is similar to that on glucose. Comparative proteomics revealed the presence of key genes encoding homologues of proteins involved in LG degradation, namely LGDH (LgdA), 3-keto-LG β-eliminase (LgdB1), and glucose 3-dehydrogenase (LgdC). However, the upregulated proteins included several additional members of the PFAM PF01261 sugar phosphate isomerase/xylose isomerase family that exhibit low homology to LgdB1. Structural modelling shows that these adopt similar folds to LgdB1 and their occurrence within the genomic regions of a wide range of bacteria encoding lgdA-lgdB1-lgdC genes suggest that they may assist in the processing of intermediates in the LG-degradation pathway.

Availability of data and materials

The genome datasets generated and/or analysed during the current study are available in the NCBI database with BioProject accession codes PRJNA925556 (strain LG01) and PRJNA925557 (strain LG02). The mass spectrometry proteomics data have been deposited in the Proteome Xchange Consortium via the PRIDE partner repository with the data set identifier: PXD039086.

Code availability

Not applicable.

Abbreviations

ABC:: ATP-binding cassette
DNA:: Deoxyribonucleic acid
GND:: Genome neighbourhood diagram
HMM:: Hidden Markov Model
LG:: Levoglucosan
LGDH:: Levoglucosan dehydrogenase
NADH:: Reduced nicotinamide adenine dinucleotide
ORF:: Open reading frame
SSN:: Sequence similarity network
SSNN:: Sequence similarity network of neighbours

References

Arya AS, Hang MTH, Eiteman MA (2022) Isolation and characterization of levoglucosan-metabolizing bacteria. Appl Environ Microbiol 88:e0186821. https://doi.org/10.1128/aem.01868-21
Article PubMed Google Scholar
Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC (2009) Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS ONE 4:e4345. https://doi.org/10.1371/journal.pone.0004345
Article CAS PubMed PubMed Central Google Scholar
Bacik JP et al (2015) Producing glucose 6-phosphate from cellulosic biomass: structural insights into levoglucosan bioconversion. J Biol Chem 290:26638–26648. https://doi.org/10.1074/jbc.M115.674614
Article CAS PubMed PubMed Central Google Scholar
Bell A et al (2020) Uncovering a novel molecular mechanism for scavenging sialic acids in bacteria. J Biol Chem 295:13724–13736. https://doi.org/10.1074/jbc.RA120.014454
Article CAS PubMed PubMed Central Google Scholar
Bergauff MA et al (2010) Urinary levoglucosan as a biomarker of wood smoke: results of human exposure studies. J Expo Sci Environ Epidemiol 20:385–392. https://doi.org/10.1038/jes.2009.46
Article CAS PubMed Google Scholar
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/nbt.1511
Article CAS PubMed Google Scholar
Fry J, Wood M, Poole PS (2001) Investigation of myo-inositol catabolism in Rhizobium leguminosarum bv. viciae and its effect on nodulation competitiveness. Mol Plant Microbe Interact 14:1016–1025. https://doi.org/10.1094/mpmi.2001.14.8.1016
Article CAS PubMed Google Scholar
Hebert AS et al (2018) Comprehensive single-shot proteomics with FAIMS on a hybrid Orbitrap mass spectrometer. Anal Chem 90:9529–9537. https://doi.org/10.1021/acs.analchem.8b02233
Article CAS PubMed PubMed Central Google Scholar
Itabaiana Junior I, AvelardoNascimento M, de Souza ROMA, Dufour A, Wojcieszak R (2020) Levoglucosan: a promising platform molecule? Green Chem 22:5859–5880. https://doi.org/10.1039/D0GC01490G
Article CAS Google Scholar
Iwazaki S, Hirai H, Hamaguchi N, Yoshida N (2018) Isolation of levoglucosan-utilizing thermophilic bacteria. Sci Rep 8:4066. https://doi.org/10.1038/s41598-018-22496-2
Article CAS PubMed PubMed Central Google Scholar
Jumper J et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Article CAS PubMed PubMed Central Google Scholar
Kitamura Y, Yasui T (1991) Purification and some properties of levoglucosan (1,6-anhydro-β-d-glucopyranose) Kinase from the Yeast Sporobolomyces salmonicolor. Agric Biol Chem 55:523–529. https://doi.org/10.1080/00021369.1991.10870618
Article CAS Google Scholar
Kitamura Y, Abe Y, Yasui T (1991) Metabolism of levoglucosan (1,6-anhydro-α-d-glucopyranose) in microorganisms. Agric Biol Chem 55:515–521. https://doi.org/10.1080/00021369.1991.10870617
Article CAS Google Scholar
Kuritani Y et al (2020) Conversion of levoglucosan into glucose by the coordination of four enzymes through oxidation, elimination, hydration, and reduction. Sci Rep 10:20066. https://doi.org/10.1038/s41598-020-77133-8
Article CAS PubMed PubMed Central Google Scholar
Lakshmanan CM, Gal-or B, Hoelscher HE (1969) Production of levoglucosan by pyrolysis of carbohydrates. Ind Eng Chem Res Dev 8:261–267. https://doi.org/10.1021/i360031a010
Article CAS Google Scholar
Li Y et al (2021) Impacts of chemical degradation on the global budget of atmospheric levoglucosan and its use as a biomass burning tracer. Environ Sci Technol 55:5525–5536. https://doi.org/10.1021/acs.est.0c07313
Article CAS PubMed Google Scholar
Liu QP et al (2007) Bacterial glycosidases for the production of universal red blood cells. Nat Biotechnol 25:454–464
Article CAS PubMed Google Scholar
McGill N, Williams SJ (2008) Anomeric anhydro sugars. In: Fraser-Reid BO et al (eds) Glycoscience: chemistry and chemical biology, 2nd edn. Springer, Heidelberg, Germany, pp 737–753
Chapter Google Scholar
Migliaccio CT, Bergauff MA, Palmer CP, Jessop F, Noonan CW, Ward TJ (2009) Urinary levoglucosan as a biomarker of wood smoke exposure: observations in a mouse model and in children. Environ Health Perspect 117:74–79. https://doi.org/10.1289/ehp.11378
Article CAS PubMed Google Scholar
Nakahara K, Kitamura Y, Yamagishi Y, Shoun H, Yasui T (1994) Levoglucosan dehydrogenase involved in the assimilation of levoglucosan in Arthrobacter sp. I-552. Biosci Biotechnol Biochem 58:2193–2196. https://doi.org/10.1271/bbb.58.2193
Article CAS PubMed Google Scholar
Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD (2018) HMMER web server: 2018 update. Nucleic Acids Res 46:W200–W204. https://doi.org/10.1093/nar/gky448%JNucleicAcidsResearch
Article CAS PubMed PubMed Central Google Scholar
Rappsilber J, Ishihama Y, Mann M (2003) Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal Chem 75:663–670. https://doi.org/10.1021/ac026117i
Article PubMed Google Scholar
Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2:1896–1906. https://doi.org/10.1038/nprot.2007.261
Article CAS PubMed Google Scholar
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. https://doi.org/10.1093/bioinformatics/btu153
Article CAS PubMed Google Scholar
Simoneit BRT et al (1999) Levoglucosan, a tracer for cellulose in biomass burning and atmospheric particles. Atmos Environ 33:173–182. https://doi.org/10.1016/S1352-2310(98)00145-9
Article CAS Google Scholar
Suciu LG, Masiello CA, Griffin RJ (2019) Anhydrosugars as tracers in the earth system. Biogeochemistry 146:209–256. https://doi.org/10.1007/s10533-019-00622-0
Article CAS Google Scholar
Sugisawa H, Edo H (1966) The thermal degradation of sugars I. Thermal polymerization of glucose. J Food Sci 31:561–565. https://doi.org/10.1111/j.1365-2621.1966.tb01905.x
Article CAS Google Scholar
Sugiura M, Nakahara M, Yamada C, Arakawa T, Kitaoka M, Fushinobu S (2018) Identification, functional characterization, and crystal structure determination of bacterial levoglucosan dehydrogenase. J Biol Chem 293:17375–17386. https://doi.org/10.1074/jbc.RA118.004963
Article CAS PubMed Google Scholar
Taberman H, Parkkinen T, Rouvinen J (2016) Structural and functional features of the NAD(P) dependent Gfo/Idh/MocA protein family oxidoreductases. Protein Sci 25:778–786. https://doi.org/10.1002/pro.2877
Article CAS PubMed PubMed Central Google Scholar
Tyanova S et al (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13:731. https://doi.org/10.1038/nmeth.3901. https://www.nature.com/articles/nmeth.3901#supplementary-information
Varadi M et al (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439–D444. https://doi.org/10.1093/nar/gkab1061
Article CAS PubMed Google Scholar
Yasui T, Kitamura Y, Nakahara K, Abe Y (1991) Metabolism of Levoglucosan (1,6-Anhydro-β-d-glucopyranose) in Bacteria. Agric Biol Chem 55:1927–1929. https://doi.org/10.1080/00021369.1991.10870881
Article CAS Google Scholar
Yip VL, Withers SG (2006) Breakdown of oligosaccharides by the process of elimination. Curr Opin Chem Biol 10:147–155
Article CAS PubMed Google Scholar
Yip VL et al (2004) An unusual mechanism of glycoside hydrolysis involving redox and elimination steps by a family 4 β-glycosidase from Thermotoga maritima. J Am Chem Soc 126:8354–8355. https://doi.org/10.1021/ja047632w
Article CAS PubMed Google Scholar
Yoon SH, Ha SM, Lim J, Kwon S, Chun J (2017) A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van Leeuwenhoek 110:1281–1286. https://doi.org/10.1007/s10482-017-0844-4
Article CAS PubMed Google Scholar
You C, Xu C (2018) Review of levoglucosan in glacier snow and ice studies: Recent progress and future perspectives. Sci Total Environ 616–617:1533–1539. https://doi.org/10.1016/j.scitotenv.2017.10.160
Article CAS PubMed Google Scholar
Zallot R, Oberg N, Gerlt JA (2019) The EFI web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. Biochemistry 58:4169–4182. https://doi.org/10.1021/acs.biochem.9b00735
Article CAS PubMed Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions. S.J.W is supported by the Australian Research Council (DP210100233, DP210100235). N.E.S is supported by the Australian Research Council (FT200100270, DP210100362). E.D.G-B. acknowledges support from: The Walter and Eliza Hall Institute of Medical Research; National Health and Medical Research Council of Australia (NHMRC) project grant GNT2000517; the Australian Cancer Research Fund; and the Brian M. Davis Charitable Foundation Centenary Fellowship. We thank the Melbourne Mass Spectrometry and Proteomics Facility of The Bio21 Molecular Science and Biotechnology Institute for access to MS instrumentation.

Author information

Authors and Affiliations

School of Chemistry, University of Melbourne, Parkville, VIC, 3010, Australia
Arashdeep Kaur & Spencer J. Williams
Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC, 3010, Australia
Arashdeep Kaur & Spencer J. Williams
Department of Microbiology and Immunology, University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Victoria, 3000, Australia
Nichollas E. Scott, Marion Herisse & Sacha Pidot
Department of Medical Biology, University of Melbourne, Parkville, VIC, 3010, Australia
Ethan D. Goddard-Borger
ACRF Chemical Biology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3010, Australia
Ethan D. Goddard-Borger

Authors

Arashdeep Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Nichollas E. Scott
View author publications
You can also search for this author in PubMed Google Scholar
Marion Herisse
View author publications
You can also search for this author in PubMed Google Scholar
Ethan D. Goddard-Borger
View author publications
You can also search for this author in PubMed Google Scholar
Sacha Pidot
View author publications
You can also search for this author in PubMed Google Scholar
Spencer J. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SJW conceived the study with substantial input from AK. AK performed enrichment culture and microbial growth and characterization. NS conducted proteomics. MH and SP performed genome sequencing, assembly and analysis. AK and EGB performed bioinformatic analysis. AK and SJW wrote the paper with input and approval of the final draft from the authors.

Corresponding author

Correspondence to Spencer J. Williams.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Additional information

Communicated by Yusuf Akhter.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 6072 KB)

Supplementary file2 (XLSX 34 KB)

Supplementary file3 (CYS 966 KB)

Supplementary file4 (CYS 6725 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kaur, A., Scott, N.E., Herisse, M. et al. Identification of levoglucosan degradation pathways in bacteria and sequence similarity network analysis. Arch Microbiol 205, 155 (2023). https://doi.org/10.1007/s00203-023-03506-y

Download citation

Received: 30 January 2023
Revised: 20 March 2023
Accepted: 22 March 2023
Published: 31 March 2023
DOI: https://doi.org/10.1007/s00203-023-03506-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Identification of levoglucosan degradation pathways in bacteria and sequence similarity network analysis

Abstract

Similar content being viewed by others

Comparative genomics and metabolic profiling of the genus Lysobacter

Metagenomics study to compare the taxonomic composition and metabolism of a lignocellulolytic microbial consortium cultured in different carbon conditions

Complete Genome Sequence of Pedobacter sp. PAMC26386 and Their Low Temperature Application in Arabinose-containing Polysaccharides Degradation

Introduction

Materials and methods

Bacterial growth media

Isolation of levoglucosan degrading species

Phenotypic assays

Genome sequence, assembly, and annotation

Sample preparation for proteomic analysis

Reverse phase liquid chromatography–mass spectrometry for proteomics

Proteomic data analysis

Bioinformatic analysis

Structural analysis of LG degrading proteins

Results

Isolation and characterization of bacteria that can grow on LG

Identification of genetic loci expressed upon growth on LG

SSN analysis of LG metabolizing genetic loci

Discussion

Conclusions

Availability of data and materials

Code availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 6072 KB)

Supplementary file2 (XLSX 34 KB)

Supplementary file3 (CYS 966 KB)

Supplementary file4 (CYS 6725 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation