Background

Gene transcription by RNA polymerase II (Pol II) is a fundamental process in eukaryotic cells. The precise regulation of transcription is necessary for cellular growth and development and requires the coordinated activity of numerous proteins and protein complexes [1, 2]. For example, Pol II alone cannot initiate transcription but must cooperate with a large set of general transcription factors (GTFs), including TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, to perform this essential function [2, 3]. In addition to GTFs and other basal factors, sequence-specific transcription factors, transcriptional coactivators, chromatin modifiers, and chromatin remodeling factors have critical roles in regulating transcription in response to developmental or environmental cues [1, 4,5,6]. Despite considerable progress in understanding the molecular basis of transcriptional regulation, many questions remain unanswered. The budding yeast Saccharomyces cerevisiae remains an excellent system for addressing unresolved questions in transcription, as the underlying biology is broadly conserved across eukaryotes.

Defining the direct effects of transcriptional regulators is critical for elucidating their function [7, 8]. Rapid and specific degradation of target proteins is a powerful approach for identifying the direct targets of transcriptional regulators. The auxin-inducible degron (AID) system is one method for rapid protein degradation widely used to study transcriptional regulation in eukaryotes, including S. cerevisiae [8,9,10,11,12,13]. Derived from a plant-specific module of the conserved eukaryotic SKP1-CUL1-F-box (SCF) E3 ubiquitin ligase complex, the AID system requires the expression of the plant F-box protein TIR1 and the genetic fusion of an AID tag to a target protein to function outside of plants [11]. The addition of the plant hormone auxin promotes the association of TIR1 with the AID tag, recruitment of endogenous SCF complex components, and degradation of the target protein via the 26S proteasome. Unlike most classical genetic approaches, essential genes or synthetic lethal interactions can be studied using the AID system. We and others have used the AID system to achieve specific degradation of transcription factors, transcriptional coactivators, chromatin modifiers, and chromatin remodeling factors in S. cerevisiae [10, 12,13,14,15,16].

RNA synthesized immediately following a perturbation, such as rapid protein degradation, more accurately reflects the direct effects of the target factor than steady-state RNA [7, 8, 17]. Several sequencing-based methods selectively quantify nascent or newly synthesized RNA (reviewed in [18, 19]). These methods use various strategies to track nascent or newly synthesized RNA including the isolation of RNA associated with chromatin [20, 21], Pol II (e.g., NET-seq and mNET-seq) [22, 23], or transcriptionally competent Pol II (e.g., PRO-seq and ChRO-seq) [24, 25] or the isolation of metabolically labeled RNA (e.g., 4sU-seq, cDTA-seq, and TT-seq) [26,27,28]. We and others have used 4sU-seq and the equivalent method 4tU-seq to quantify newly synthesized RNA following targeted protein depletion in S. cerevisiae [10, 14, 29,30,31]. 4sU-seq and 4tU-seq involve labeling newly synthesized RNA with 4-thiouridine (4sU) or 4-thiouracil (4tU), followed by biochemical purification of labeled RNA [10, 26, 32, 33]. Alternatively, methods such as SLAM-seq, TUC-seq, and TimeLapse-seq avoid biochemical purification by directly quantifying 4sU- or 4tU-labeled RNA following chemical recoding [7, 34,35,36,37]. In SLAM-seq, total RNA is treated with the thiol-reactive alkylating reagent iodoacetamide, which chemically recodes 4sU- or 4tU-labeled RNA and results in the misincorporation of guanosine during reverse transcription. Thus, thymine-to-cytosine (T > C) conversions define newly synthesized RNA and are quantified using dedicated data analysis software (e.g., SLAM-DUNK or GRAND-SLAM) or alternative approaches [38,39,40,41]. SLAM-seq has several advantages compared to 4tU-seq, including increased reproducibility as the purification of labeled RNA is not required. Additionally, since SLAM-seq captures both labeled and unlabeled RNA, read counts derived from newly transcribed RNA can be normalized by total read counts (i.e., the sum of reads derived from labeled and unlabeled RNA) under the assumption that steady-state RNA levels do not change considerably during a given perturbation, as is frequently the case. Thus, SLAM-seq can circumvent the need for external spike-in normalization even when global changes in transcription are observed [8, 42, 43]. Newly synthesized RNA represents only a small fraction of the total RNA pool. As such, a potential limitation of SLAM-seq compared to 4tU-seq is a decreased detection of very lowly expressed transcripts, although this limitation can be partially bypassed through increased sequencing depth. SLAM-seq and a version of cDTA-seq using iodoacetamide-mediated alkylation of labeled RNA have been applied to S. cerevisiae, largely to study the synthesis and decay of RNA [16, 38, 44].

Here, we provide an end-to-end workflow for rapidly degrading a target protein using the AID system and quantifying newly synthesized mRNA using SLAM-seq in S. cerevisiae. We include methods for targeted protein degradation, 4tU incorporation, rapid methanol fixation, RNA purification, RNA alkylation, 3´ mRNA-seq library construction, and data analysis. Although the individual methods described here are not novel per se, this workflow provides the first complete resource for turnkey implementation of these methods, which will benefit others working with S. cerevisiae. Additionally, the modular structure allows individual methods to be replaced or removed based on user preference alleviating some limitations to implementing the workflow. Lastly, this workflow is readily adaptable to other systems, including industrial, pathogenic, or other model fungi, and will benefit the larger research community.

Materials and procedure

The protocol described in this peer-reviewed article is published on protocols.io, updated May 31, 2024, https://doi.org/10.17504/protocols.io.n2bvj3dj5lk5/v4. Additional experimental details critical for generating results presented in this article are described below.

Strains and culturing conditions

Saccharomyces cerevisiae and Schizosaccharomyces pombe strains used in this study are listed in Table 1. Strain RDY234 was engineered using standard methods and validated using PCR and western blotting (data not shown). Saccharomyces cerevisiae was grown in 40 ml of YPD medium (10 g/l yeast extract, 20 g/l peptone, 20 g/l glucose, 20 mg/ml adenine sulfate) at 30 °C with shaking at 220 rpm. Cultures were grown to an OD600 of ~ 0.45 and split into three 10 ml cultures. Following brief equilibration, cultures were treated with 1 mM 3-indoleacetic acid (IAA) freshly prepared in dimethyl sulfoxide (DMSO) or an equivalent volume of DMSO, vigorously mixed, and incubated for 25 min at 30 °C with shaking at 220 rpm. Immediately after IAA treatment, cultures were treated with 5 mM 4tU freshly prepared in DMSO or an equivalent volume of DMSO, vigorously mixed, and incubated for 4 min at 30 °C with shaking at 220 rpm. Immediately after 4tU treatment, each culture was decanted into 5 ml of cold 100% methanol on dry ice. The resulting slurry was gently mixed by swirling to ensure homogenization and kept on dry ice. An aliquot of the slurry was collected for cell counting and western blotting. The remaining slurry was centrifuged at 3,000 g and 4 °C for 10 min to pellet the cells. Cells were resuspended in DNA/RNA shield (Zymo Research, Irvine, CA, USA) by pipetting, flash frozen on dry ice, and stored at –80 °C. The final OD600 of S. cerevisiae cultures was ~ 0.7. Schizosaccharomyces pombe was grown in YE medium (5 g/l yeast extract, 30 g/l glucose) at 30 °C with shaking at 220 rpm to a final OD600 of ~ 1.0. A similar strategy as described for S. cerevisiae was used for 4tU labeling and rapid fixation of S. pombe cultures. For whole-cell spike-in, S. pombe cells were combined with S. cerevisiae cells in a mass-to-mass ratio of 1:19 based on OD600. Three biological replicates were collected in all experiments.

Table 1 Saccharomyces cerevisiae and Schizosaccharomyces pombe strains used in this study

RNA alkylation and 3′ mRNA-sequencing library construction

RNA alkylation was performed as previously described [7, 16] with minor modifications. A total of 5 μg of total RNA in 20 μl of nuclease-free water was used for alkylation. 3′ mRNA-sequencing library construction is based on previously described methods with modifications [45, 46]. A total of 200 ng of alkylated RNA in 5 μl of nuclease-free water was used for library construction. The remaining steps of RNA alkylation and library construction are provided in detail in the full protocol. Libraries were sequenced on a NovaSeq 6000 (Illumina, San Diego, CA, USA) at the Oklahoma Medical Research Foundation (OMRF) Clinical Genomics Center (Oklahoma City, OK, USA).

Data analysis

Demultiplexed, paired-end 150 bp reads were preprocessed using fastp (version 0.23.2) [47] to extract unique molecular identifiers (UMIs) and bbduk from the BBMap package (version 39.06) (https://sourceforge.net/projects/bbmap/) to trim adapter sequences and polyA tails. The quality of raw and preprocessed data was assessed using FastQC (version 0.12.1) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and complied using MultiQC (version 1.21) [48]. When required, read depth was adjusted by random subsampling using SeqKit2 (version 2.0.1) [49, 50]. Forward reads were processed using SLAM-DUNK (version 0.4.3) [40]. Reads were aligned against the S. cerevisiae genome (version R64-3–1) or the S. pombe genome (version ASM294v2) using NextGenMap (version 0.5.5) [51] and single-nucleotide polymorphisms (SNPs) were called using VarScan 2 (version 2.4.5) [52] within SLAM-DUNK. Total read counts were defined as all reads remaining after alignment filtering and recovery of multimapping reads in SLAM-DUNK (Additional file 1B). Unless otherwise indicated, a conversion per read threshold of ≥ 2 T > C conversions was used to define T > C read counts (Additional file 1C). Data analysis is described in detail in the full protocol.

To define counting windows in SLAM-DUNK, custom BED files were created for both S. cerevisiae and S. pombe. Open reading frames (ORFs) were extracted from the S. cerevisiae annotation (version R64-3–1) or the S. pombe annotation (version ASM294v2) using BEDOPS (version 2.4.3). Preliminary analyses demonstrated that reads aligned to gene bodies and 3′ untranslated regions (UTRs) (data not shown). Therefore, all ORFs were extended 250 bp beyond their stop position to include 3′ UTRs using SAMtools (version 1.14) and BEDTools (version 2.30.0). Custom BED files were deposited in the Zenodo repository (https://doi.org/10.5281/zenodo.10714018). Chromosome names were modified in the S. cerevisiae reference files (i.e., genome assembly and BED file) to reflect standard nomenclature [53].

Spike-in normalization factors were calculated as the sum of all S. pombe read counts (T > C read counts and non-T > C read counts) divided by 1 × 105 (Additional file 1D). The factor 1 × 105 was arbitrarily chosen to ensure spike-in normalization factors were between one and ten. Total counts normalization factors were calculated as the sum of all S. cerevisiae read counts (T > C read counts and non-T > C read counts) divided by 1 × 106 (Additional file 1D). The factor 1 × 106 was arbitrarily chosen to ensure total counts normalization factors were between one and ten. Normalization and differential gene expression analysis were performed on raw data (total read counts or T > C read counts) using DESeq2 (version 1.38.3) [54]. Adjusted p-values were calculated with independent hypothesis weighting using IHW (version 1.26.0) [55] within DESeq2. Genes classified as dubious open reading frames (ORFs) or pseudogenes or that did not meet criteria for fold change and/or p-value calculations in DESeq2 were excluded from subsequent analyses. The remaining genes were filtered against 5313 genes previously found to be reliably detected under experimental conditions comparable to those used in this study [10], leaving 5311 genes. To compare SLAM-seq and 4tU-seq data, genes were further filtered against 4883 genes that were previously found to provide high reproducibility and low information loss in 4tU-seq experiments with strain RDY73 [10], leaving 4882 genes. For strict filtering, dubious ORFs and pseudogenes were removed. The remaining genes were filtered on the requirement of having ≥ one T > C read count across all samples. Data processing and plotting were performed using R (version 4.2.3) (https://www.R-project.org/) and RStudio (version 2023.12.0 + 369) (http://www.rstudio.com/).

General notes and troubleshooting

In the protocol described here, newly transcribed RNA is metabolically labeled with 4tU. Both S. cerevisiae and S. pombe readily import 4tU without genetic manipulation or specific media requirements [56,57,58]. Although not a requirement, previous studies suggest that in certain contexts the expression of an additional copy of endogenous uridine or uracil permeases (Fui1 and Fur4, respectively) or depletion of uracil in growth media can improve 4tU incorporation in S. cerevisiae [33]. Previous studies also suggest that brief exposure to 4tU at low concentrations, such as the conditions used in this protocol, has minimal effects on the growth of S. cerevisiae [56, 57], although high concentrations or prolonged exposure can be detrimental. Relatedly, the auxin, IAA, is used in the described protocol to induce protein depletion via an AID system. While IAA is known to affect aspects of S. cerevisiae biology [59, 60], relatively short exposure to IAA at low concentrations is generally considered to have minimal effects. As previously suggested, the optimal conditions for IAA treatment (e.g., time, concentration) should be empirically determined to minimize off-target effects [61]. It should also be noted that low levels of depletion in the absence of exogenous IAA have been reported for some target proteins [62, 63]. Modified AID systems, such as the β-est AID, super-sensitive AID, or AID2 systems [62,63,64], may reduce off-target effects and, as more data emerges for the use of these systems in S. cerevisiae, may improve this protocol.

A key step in SLAM-seq is the alkylation of a uracil analog (i.e., 4sU or 4tU) incorporated into RNA using iodoacetamide [7, 44]. We note that iodoacetamide is unstable and light sensitive. Therefore, iodoacetamide should be prepared immediately before use and protected from light [7]. Additionally, alkylation reactions should be performed in the dark and simultaneously on all samples within an experiment when possible. We and others [8] provide data that a conversion per read threshold of ≥ 2 T > C conversions effectively reduces background signal. However, we recommend that initial SLAM-seq experiments include an untreated (i.e., –4tU) control to assess spurious nucleotide conversions and polymorphisms between experimental and reference strains. SLAM-seq quantifies both steady-state (non-T > C read counts) and newly transcribed RNA (T > C read counts) simultaneously [7, 40]. Newly transcribed RNA is a small fraction of the total. Similar to what others have demonstrated [8, 38], we present evidence that SLAM-seq data can be normalized using exogenous whole cell spike-in or total read counts (i.e., the sum of T > C read counts and non-T > C read counts). However, the proposed total read counts normalization strategy is specific for data derived from SLAM-seq and related methods (e.g., TimeLapse-seq). Additionally, although this strategy works well for our example case (Bdf1/2 depletion), it does not supplant spike-in normalization, and we recommend initially testing total counts normalization in parallel with spike-in normalization for different target proteins and/or experimental conditions.

Methods that use a template switching oligo (TSO) for constructing sequencing libraries, such as that described here, often generate forward reads that start with three low-diversity bases. It is therefore necessary to increase the diversity of the library pool when sequencing on an Illumina instrument. A final library pool should contain 10–15% libraries constructed using standard approaches [65] or a control library (e.g., PhiX Control v3; Illumina) (see Veeranagouda et al. [45] for additional details). Third, the method for constructing mRNA-seq libraries described here supports barcoding in the forward and reverse reads, although like many 3′ mRNA-seq methods (e.g., QuantSeq FWD; Lexogen, Greenland, NH, USA) the reverse read is generally of low quality and provides little information beyond barcoding. In our experience, paired-end sequencing is more cost-effective than single-end sequencing at many service providers. We also note that the number of recovered T > C read counts is dependent on read length, for example, more T > C read counts per gene recovered from 150 bp reads compared to 100 bp reads at an equivalent read depth (data not shown). Therefore, together with data presented here, we recommend sequencing 5 to 10 million paired-end 150 bp reads per sample depending on the goals of an experiment.

Lastly, we anticipate this workflow will be readily adapted to other systems. Although we acknowledge that some optimization will be necessary depending on the system, our data from spike-in, 4tU-labeled S. pombe cells suggests this workflow can be directly applied to S. pombe. Additionally, using parts of the workflow without modification, we have successfully prepared alkylated RNA and 3′ mRNA-seq libraries using 4sU-labeled total RNA derived from human cell lines (data not shown).

Expected results

Applying the SLAM-seq workflow to study transcriptional regulation in Saccharomyces cerevisiae

To illustrate expected results, we performed experiments with two S. cerevisiae strains (RDY73 and RDY234) related to our studies on BET proteins. BET (bromodomain and extra-terminal domain) proteins are conserved chromatin readers characterized by two bromodomains and an extra-terminal (ET) domain [66]. BET proteins have an integral role in transcriptional regulation, although the mechanisms by which they regulate transcription are not well understood [10, 66,67,68]. There are two BET proteins (Bdf1 and Bdf2, referred to here as Bdf1/2) in S. cerevisiae, and deletion of both BDF1 and BDF2 is lethal [69]. Although Bdf1 and Bdf2 have overlapping functions, the depletion of Bdf1 has a more significant impact on global transcription [10]. Strain RDY73 carries a codon-optimized Oryza sativa TIR1 protein (OsTIR1) [11] and mAID-tagged Bdf1/2 and was previously used to study changes in transcription after depletion of Bdf1/2 [10]. The mAID (miniAID) is a codon-optimized minimum region (65–132 aa) of the Arabidopsis thaliana IAA17 protein [70, 71]. To obtain strain RDY234, an additional, unmodified copy of BDF1 driven by its endogenous promoter was integrated at the TRP1 locus in strain RDY73.

In this study, cells of strain RDY73 or RDY234 in the logarithmic phase were treated with the auxin 3-indoleacetic acid (IAA) for 25 min to deplete Bdf1/2 then treated with 4-thiouracil (4tU) for 4 min to label newly synthesized RNA. Immediately after 4tU labeling, cells were rapidly fixed in cold methanol on dry ice as described [33, 72,73,74]. It was previously demonstrated that within this timeframe IAA treatment resulted in more than a 90% reduction in the level of Bdf1 and an approximately 85% reduction in the level of Bdf2 [10]. We observed a similar depletion of Bdf1/2 by western blotting (data not shown). Subsequently, RNA was purified, alkylated, reverse transcribed, and sequenced. The resulting data was processed using SLAM-DUNK. All experiments were performed in biological triplicate.

To assess how well our workflow quantifies 4tU-labeled transcripts, we first calculated percent conversion rates in reads mapped to defined windows. We observed a strong and specific accumulation of T > C conversions in 4tU-treated cells compared to untreated cells (Fig. 1A-B). Second, we calculated percent read counts with zero, one, or ≥ 2 T > C conversions mapped to defined windows. We observed that most reads have zero T > C conversions in both 4tU-treated and untreated cells and that, although a percentage of reads with one T > C conversion in untreated cells was observed, 4tU-treated cells have a much higher percentage of reads with one or ≥ 2 T > C conversions compared to untreated cells (Fig. 1C-D). We also calculated the percent background signal in reads with ≥ 1 or 2 T > C conversions mapped to defined windows. Here, we observed a substantially lower percent background signal in reads with ≥ 2 T > C conversions compared to reads with ≥ 1 T > C conversions (Fig. 1E). Together, these results illustrate that our workflow can selectively quantify 4tU-labeled transcripts and that counting reads with ≥ 2 T > C conversions effectively reduces background signal.

Fig. 1
figure 1

Application of SLAM-seq to Saccharomyces cerevisiae. A Boxplots comparing percent conversion rates in reads mapped to defined windows. SLAM-seq data was derived from strain RDY73 treated with ( +) or without ( −) 4-thiouracil (4tU) for 4 min. Mean conversion rates for three biological replicates across the indicated number of transcripts (n = 5311) are presented as Tukey boxplots. Outliers are not shown. B Boxplots comparing percent conversion rates in reads mapped to defined windows. SLAM-seq data was derived from strain RDY234 treated with ( +) or without ( −) 4tU for 4 min. Mean conversion rates for three biological replicates across the indicated number of transcripts (n = 5311) are presented as Tukey boxplots. Outliers are not shown. C Barplots comparing percent read counts with 0, 1, or ≥ 2 T > C conversions mapped to defined windows. SLAM-seq data was derived from strain RDY73 treated with ( +) or without ( −) 4tU for 4 min. Mean values for biological replicates (n = 3) are plotted. Error bars represent the standard deviation of the mean. The y-axis begins at 70 percent. D Barplots comparing percent read counts with 0, 1, or ≥ 2 T > C conversions mapped to defined windows. SLAM-seq data was derived from strain RDY234 treated with ( +) or without ( −) 4tU for 4 min. Mean values for biological replicates (n = 3) are plotted. Error bars represent the standard deviation of the mean. The y-axis begins at 70 percent. E Boxplots comparing percent background signal in reads with ≥ 1 or 2 T > C conversions mapped to defined windows. SLAM-seq data was derived from strains RDY73 or RDY234 treated with ( +) or without ( −) 4tU for 4 min. Mean background signals for three biological replicates across the indicated number of transcripts (n = 5311) are presented as Tukey boxplots. Outliers are not shown

SLAM-seq detects global transcriptional changes associated with rapid depletion of Bdf1/2 and is robust to normalization using whole-cell spike-in or total read counts

Normalization by whole-cell spike-in

In SLAM-seq experiments, we performed whole-cell spike-in normalization using 4tU-labeled S. pombe cells as previously described [10, 14]. Here, we calculated spike-in normalization factors based on S. pombe total read counts. We then used the spike-in normalization factors to normalize (1) total read counts or (2) T > C read counts in DESeq2 (Additional file 1E-G). Interestingly, we observed a global loss of transcription following Bdf1/2 depletion for both spike-in normalized total read counts and spike-in normalized T > C read counts derived from strain RDY73 (Fig. 2A-B). These results indicate that substantial changes in transcription are already apparent in steady-state RNA 30 min after inducing Bdf1/2 depletion. However, the magnitude of the transcriptional changes is much more dramatic in newly synthesized RNA compared to steady-state RNA (Fig. 2C). We did not observe global changes in transcription for either spike-in normalized total read counts or spike-in normalized T > C read counts derived from strain RDY234 (Fig. 2D-F), indicating that ectopically expressed BDF1 largely complements depletion of endogenous Bdf1/2. Together, these results illustrate the importance of measuring newly synthesized RNA to study the direct effects of a perturbation such as rapid protein depletion.

Fig. 2
figure 2

SLAM-seq detects global transcriptional changes associated with rapid depletion of Bdf1/2. A Scatterplot comparing log2 fold change in transcription and log10 baseline transcription following Bdf1/2 depletion (strain RDY73) across the indicated number of transcripts (n = 5311). Total read counts were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Mean values for three biological replicates are plotted. B Scatterplot comparing log2 fold change in transcription and log10 baseline transcription following Bdf1/2 depletion (strain RDY73) across the indicated number of transcripts (n = 5311). T > C read counts were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Mean values for three biological replicates are plotted. C Boxplots comparing log2 fold changes in transcription following Bdf1/2 depletion (strain RDY73) normalized using exogenous whole cell spike-in. Mean changes in transcription for three biological replicates across 5311 transcripts are presented as Tukey boxplots. D Scatterplot comparing log2 fold change in transcription and log10 baseline transcription following Bdf1/2 depletion in a strain ectopically expressing BDF1 (RDY234) across the indicated number of transcripts (n = 5311). Total read counts were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Mean values for three biological replicates are plotted. E Scatterplot comparing log2 fold change in transcription and log10 baseline transcription following Bdf1/2 depletion in a strain ectopically expressing BDF1 (RDY234) across the indicated number of transcripts (n = 5311). T > C read counts were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Mean values for three biological replicates are plotted. F Boxplots comparing log2 fold changes in transcription following Bdf1/2 depletion in a strain ectopically expressing BDF1 (RDY234) normalized using exogenous whole cell spike-in. Mean changes in transcription for three biological replicates across 5311 transcripts are presented as Tukey boxplots

We additionally tested normalization using S. pombe reads with ≥ 2 T > C conversions rather than total S. pombe reads, though no substantial benefit was found using this approach (data not shown). However, as a general consideration, we note that 4tU-labeled spike-in may be needed to control for alkylation efficiency if samples within an experimental set are independently treated with iodoacetamide or if samples will also be used for methods that rely on the enrichment of labeled RNA (e.g., 4tU-seq or TT-SLAM-seq [14, 75]).

Normalization by total read counts

In previous studies, SLAM-seq data has been normalized using total read counts rather than whole-cell spike-in [8, 42, 43]. These studies demonstrated that normalizing T > C read count data by total read count data is a robust approach even when global changes in transcription are observed. A similar normalization strategy has also been proposed for TimeLapse-seq data [36]. In addition, potential issues with spike-in normalization have been noted [76, 77]. Therefore, we tested a normalization approach using total read counts to circumvent the need for whole-cell spike-in and streamline our workflow.

Previous studies used DESeq2 to calculate size factors based on total read counts. Total read counts were defined as the sum of T > C read counts and non-T > C read counts following alignment and filtering. The DESeq2 size factors were then used to normalize T > C read counts. We initially applied this approach to data derived from strain RDY73. We observed a global loss of transcription following Bdf1/2 depletion for DESeq2 normalized T > C read counts (Fig. 3A). However, the magnitude of these changes was less than what we observed for spike-in normalized T > C read counts (Fig. 2B). The median-ratio method implemented by DESeq2 to estimate size factors assumes that the expression of most genes is not affected by experimental conditions [54]. However, previous work has demonstrated that depletion of Bdf1/2 results in global changes in newly transcribed mRNA [10]. Here, we demonstrate that depletion of Bdf1/2 results in substantial changes in steady-state mRNA within 30 min (Fig. 2A). Thus, we presume that the extent to which Bdf1/2 depletion affects newly transcribed and steady-state mRNA violates assumptions underlying DESeq2 calculations and that, although DESeq2 size factors (calculated on total read counts) can be used to effectively normalize SLAM-seq data in certain contexts (i.e., mammalian cells), this approach does not recapitulate the results we observed using exogenous whole cell spike-in (Fig. 2B). Although not tested in this study, we note that other strategies for calculating size factors can be implemented in DESeq2, such as calculating size factors on a subset of control genes expected to be stably expressed across all samples in a given experiment.

Fig. 3
figure 3

SLAM-seq is robust to normalization using total read counts. A Scatterplot comparing log2 fold change in transcription and log10 baseline transcription following Bdf1/2 depletion across the indicated number of transcripts (n = 5311). T > C read counts were normalized using size factors estimated on total read counts using DESeq2. Mean values for three biological replicates are plotted. B Scatterplot comparing log2 fold change in transcription and log10 baseline transcription following Bdf1/2 depletion across the indicated number of transcripts (n = 5311). T > C read counts were normalized using size factors calculated on total read counts. Mean values for three biological replicates are plotted. C Scatterplot comparing log2 fold change in transcription following Bdf1/2 depletion determined across 5311 transcripts. T > C read counts were normalized using size factors estimated on total read counts using DESeq2 or normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Spearman’s correlation coefficient (rs) is shown. D Scatterplot comparing log2 fold change in transcription following Bdf1/2 depletion determined across 5311 transcripts. T > C read counts were normalized size factors calculated on total read counts or size factors calculated on total read counts from exogenous whole cell spike-in. Spearman’s correlation coefficient (rs) is shown

Based on these results, we reasoned that an alternative normalization approach using total read counts might yield better results for a rapidly dividing organism like S. cerevisiae. Thus, we calculated normalization factors based on total read counts for each sample (see Materials and Procedure). We then used these total counts normalization factors to normalize T > C read counts in DESeq2 (Additional file 1H-I). Using this approach, we again observed a global loss of transcription following Bdf1/2 depletion (Fig. 3B). However, the magnitude of transcriptional changes was more dramatic when T > C read counts were normalized by total counts normalization factors compared to DESeq2 size factors. This was more apparent when DESeq2 or total counts normalized T > C read counts were compared to spike-in normalized T > C read counts (Fig. 3C-D). Notably, only a minimal shift in read counts per gene was observed for total counts normalized T > C read counts compared to spike-in normalized T > C read counts (Fig. 3D). Together, these results illustrate that SLAM-seq data derived from S. cerevisiae can be reliably normalized using normalization factors derived from total read counts.

SLAM-seq correlates well with 4tU-seq and identifies extensive differential expression following depletion of Bdf1/2

Next, we compared our SLAM-seq data (spike-in or total counts normalized T > C read counts) to published 4tU-seq data derived from strain RDY73 under comparable experimental conditions [10]. A striking loss of transcription is apparent in the data from both SLAM-seq (Fig. 2B; Fig. 3B) and 4tU-seq [10]. Comparing the change in transcription across the two datasets, we observe a strong correlation between SLAM-seq and 4tU-seq data (Fig. 4A and C). Furthermore, we observed extensive differential expression following Bdf1/2 depletion (Fig. 4B). For example, we detected six upregulated genes and 3425 downregulated genes for spike-in normalized T > C read counts using a false discovery rate (FDR) adjusted p-value ≤ 0.05 and log2 fold change ≥|1| as cutoffs (Fig. 4B). Additionally, we detected eight upregulated genes and 2622 downregulated genes for total counts normalized T > C read counts using the same cutoffs (Fig. 4D). Together, these results illustrate a strong correlation between SLAM-seq and 4tU-seq data and demonstrate that our workflow can help define the direct targets of transcriptional regulators such as Bdf1/2 using either spike-in or total counts normalization strategies.

Fig. 4
figure 4

SLAM-seq correlates well with 4tU-seq and identifies extensive differential expression following Bdf1/2 depletion. A Scatterplot comparing log2 fold change in transcription following Bdf1/2 depletion determined using SLAM-seq or 4tU-seq across 4882 transcripts. T > C read counts derived from SLAM-seq were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Mean values for three biological replicates are plotted. Read counts derived from previously published 4tU-seq data were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Spearman’s correlation coefficient (rs) is shown. B Volcano plot comparing − log10 false discovery rate (FDR) adjusted p-value and log2 fold change in transcription following Bdf1/2 depletion across 5311 transcripts. T > C read counts were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Mean values for three biological replicates are plotted. Differentially expressed genes are defined using an FDR adjusted p-value ≤ 0.05 and log2 fold change ≥|1| as cutoffs. Numbers of differentially expressed genes are shown in respective regions of the plot. C Scatterplot comparing log2 fold change in transcription following Bdf1/2 depletion determined using SLAM-seq or 4tU-seq across 4882 transcripts. T > C read counts derived from SLAM-seq were normalized size factors calculated on total read counts. Mean values for three biological replicates are plotted. Read counts derived from previously published 4tU-seq data were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Spearman’s correlation coefficient (rs) is shown. D Volcano plot comparing − log10 false discovery rate (FDR) adjusted p-value and log2 fold change in transcription following Bdf1/2 depletion across 5311 transcripts. T > C read counts were normalized using size factors calculated on total read counts. Mean values for three biological replicates are plotted. Differentially expressed genes are defined using an FDR adjusted p-value ≤ 0.05 and log2 fold change ≥|1| as cutoffs. Numbers of differentially expressed genes are shown in respective regions of the plot

Lastly, we compared how the number of genes recovered by SLAM-seq correlates with read depth. Here, we randomly subsampled SLAM-seq data derived from strain RDY73. After processing subsampled data using SLAM-DUNK, we calculated spike-in normalization factors based on S. pombe total read counts and used the spike-in normalization factors to normalize T > C read counts in DESeq2. Results were then filtered using relaxed or strict criteria (see Materials and Procedure). Unsurprisingly, the number of genes we could reliably quantify decreased with read depth, which was most apparent when “strict” filtering was applied (Fig. 5). However, these results demonstrate that our workflow recovers a substantial number of protein-coding genes. For example, when we subsampled 5.5 million reads per sample, we recovered 5273 genes using “relaxed” filtering criteria and 3428 genes using “strict” filtering criteria, representing approximately 87 percent or 57 percent of protein-coding genes, respectively (Fig. 5; Additional file 1J).

Fig. 5
figure 5

The number of genes recovered by SLAM-seq correlates with read depth. Lineplot comparing the number of genes recovered post-filtering and initial read depth. SLAM-seq data was derived from strain RDY73 treated with 4-thiouracil (4tU) for 4 min. T > C read counts were normalized using size factors calculated on total read counts from exogenous whole cell spike-in. Genes were filtered as described in Materials and Procedure (relaxed filtering; dark blue line) or on the requirement that genes have ≥ one T > C read count across all samples (n = 6) (strict filtering; light blue line). Genes classified as dubious open reading frames (ORFs) or pseudogenes were removed in both filtering procedures. To adjust initial read depth, reads were randomly subsampled prior to SLAM-DUNK processing (see Materials and Procedure)

Conclusions

We provide an end-to-end workflow for rapid and specific protein degradation using the AID system and selective quantification of newly synthesized RNA using SLAM-seq in S. cerevisiae. The workflow builds on established methods to provide a complete resource for implementation. We demonstrate that the workflow can help define the direct effects of transcriptional regulators using the BET proteins Bdf1/2 as an example. Together, this workflow will help address outstanding questions underlying the molecular basis of transcription and other biological processes in S. cerevisiae. In addition, this workflow is readily adaptable to other systems and will benefit the larger research community.