Introduction

Ehrlichia chaffeensis is a tick-transmitted intracellular bacterial pathogen causing human monocytic ehrlichiosis (HME) and it also infects dogs, deer, goats, and coyotes1,2,3,4. Mutations at certain genomic locations, leading to gene expression changes, impact the pathogen’s ability to cause infection and persistence in a host5,6. The genome of E. chaffeensis may have evolved within a host cell environment leading to the development of mechanisms to undermine the host immune response7. Pathogenesis-associated E. chaffeensis genes are likely highly active in a host microenvironment and consistent with this hypothesis, differential gene expression in response to host cell defense is known to occur8. Progress has been made towards identifying genes crucial for Ehrlichia survival in a host cell environment9,10,11. However, to date only a few abundantly expressed genes are identified as associated with pathogenesis. Defining the genes involved in pathogenesis and virulence, and documenting their differential expression may aid in the discovery of novel proteins valuable as targets for therapeutic interventions and vaccine development for HME.

Genetically mutated intracellular pathogens are important resources for studying microbial pathogenesis, and also aid in the efforts of vaccine development12,13. Our previous study demonstrated the feasibility of transposon-based mutations in E. chaffeensis5,6. We also found that some insertion mutations resulting in transcriptional inactivation of membrane protein genes cause attenuation of the growth of the pathogen in vertebrate hosts. Insertions within the coding regions of ECH_0379 and ECH_0660 genes offered varying levels of protection against infection in a vertebrate host14. In this study, we hypothesized that the mutations’ specific genomic locations may impact global gene expression and contribute to the pathogen’s altered survival, infection progression, and replication in a host cell environment. To test this hypothesis, we assessed the impact of three mutations, reported earlier by Cheng et al.5, on global gene transcription. We selected two mutants with mutations within the coding regions of the ECH_0660 gene encoding for a phage like protein (ECH_0660) and the ECH_0379 gene encoding for an anti-porter protein (ECH_0379). Insertion mutation in ECH_0660 gene is located at the nucleotide position 213 of the 555 base long open reading frame. Similarly, mutation in ECH_0379 gene is located at the nucleotide position 682 of the 1056 base long open reading frame. The third insertion mutant strain, ECH_0490, has the insertion mutation 166 nucleotides downstream from the stop codon of ECH_0490 gene.

High throughput RNA sequencing (RNA seq) technologies have proven to be reliable and robust tools for determining global transcriptome activity in obligate intracellular bacteria12,15,16,17. Comparative genomic studies identified several classes of virulence factors involved in secretion and trafficking of molecules between the pathogen and host cells and modulation of the host immune response18,19,20. However, studies focused on Ehrlichia gene expression have been limited mostly to outer membrane proteins genes, Type IV Secretion System (T4SS) genes, tandem repeat protein (TRP) genes, and ankyrin repeat genes (Anks)9,19,21,22,23. Among them, genes encoding for T4SS proteins and p28-OMP proteins have been found to be critical for pathogenicity9,24.

The obligate intracellular nature of E. chaffeensis poses a challenge in obtaining cell-free Ehrlichia from host cells25. Technical constraints in isolating Ehrlichia RNA from highly abundant host RNA remains an impediment in profiling of pathogen transcripts26. To overcome this limitation, we used an effective cell lysis strategy followed by density gradient centrifugation. Further, we enriched Ehrlichia RNA by efficiently removing polyadenylated RNA (poly(A) RNA) and eukaryotic and prokaryotic ribosomal RNAs from host and bacteria RNA mixtures. Sequencing of the enriched RNA aided in the detection of transcripts for 66–80% of the annotated E. chaffeensis genes as per the annotated genome: GenBank #CP000236.1. Comparison of transcript levels from wildtype and mutant strains revealed the highest degree of modulation in immunogenic and secretory protein genes, particularly in the mutant strains of ECH_0490 and ECH_0379, while minimal changes were observed in the ECH_0660 mutant strain.

Results

Isolation and purification of cell-free E. chaffeensis from host cells

The major challenge of undertaking transcriptome studies of intracellular pathogens is the difficulty in isolating host-cell free bacteria and subsequently recovering high-quality bacterial RNA. Rickettsial organisms, including E. chaffeensis, constitute only a very small fraction of isolated total RNA27,28. Because of the presence of highly abundant host cell RNA, recovery of bacterial RNA is a challenge for executing RNA seq analysis experiments. In this study, we first purified the host cell-free bacteria from infected host cells (canine macrophage cell line, DH82) by employing an efficient cell lysis method, coupled with density gradient centrifugation protocols. Host cell lysis was performed to efficiently rupture the host cells without causing a major damage to the bacteria. E. chaffeensis organisms are about 0.5 to 1 µm in diameter. Therefore, infected host cell lysate was filtered through 2 µm membrane to remove most of the host cell debris. A high-speed Renografin density gradient centrifugation of the resulting E. chaffeensis cell suspension aided in pelleting bacteria while host cell debris remained at the top layer of the solution. After total RNA isolation and DNase treatment, Bioanalyzer analysis revealed that despite the prior fractionation of host cell-free bacteria, the host 28 S and 18 S RNA remained at high concentrations in the recovered RNA. Bacterial mRNA enrichment was carried out by depleting the host poly(A) RNA and eukaryotic ribosomal RNA using a bacterial RNA enrichment protocol, resulting in nearly undetectable levels of host 28 S and 18 S RNA (Supplementary Figures; Fig. S1 and Fig. S2). The absence of contaminating E. chaffeensis genomic DNA in the purified RNA samples was confirmed by real-time quantitative PCR using E. chaffeensis 16 S rRNA gene primers27. We also confirmed the absence of DNA sequences in the RNA seq raw data by aligning 20 randomly selected E. chaffeensis intergenic non-coding DNA sequences (data not shown).

Ubiquitous transcription of genes in E. chaffeensis mutants

Illumina HiSeq. 4000 RNA seq of E. chaffeensis wildtype and mutants generated between 75–130 million reads. The transcriptome data were deposited in the NCBI Bio-Project ID:PRJNA428837 and SRA accession:SRP128532 (https://www.ncbi.nlm.nih.gov/sra/SRP128532). Despite efficient depletion of host ribosomal RNA, only a fraction (less than19%) of reads were mapped to E. chaffeensis genomes. Mapping of reads (10 reads minimum/gene) identified about 66–80% of the genes being expressed from the Ehrlichia genome as per the annotated genome (GenBank # CP000236.1); the transcriptome of wildtype organisms (n = 3) contained transcripts for about 920 genes of the total of 1158 genes, and similarly 888, 895, and 768 gene transcripts (n = 3) were identified in mutant organisms ECH_0660, ECH_0379, and ECH_0490, respectively (Table 1). Table S1 lists total numbers of genes, and the expression value of the genes identified in the wildtype and all three mutant organisms. The replicate RNA seq data of wildtype (R² = 0.9) (Fig. 1A) and mutants ECH_0379 (R² = 0.93), (Fig. 1B), ECH_0490 (R² = 0.68) (Fig. 1C), and ECH_0660 (R² = 0.89) (Fig. 1D) showed a high degree of expression correlation. The scatter plot expression data of wildtype vs. ECH_0379 (R² = 0.18) (Fig. 1E) and wildtype vs. ECH_0490 (R² = 0.38) (Fig. 1F) showed a negative correlation. Notably, the expression plot of wildtype vs. ECH_0660 showed a positive correlation (R² = 0.96) (Fig. 1G). Only transcripts with reads per kilobase transcriptome per million mapped reads (RPKM) ≥ 1 were considered for differential expression analysis.

Table 1 Number of E. chaffeensis genes identified in three replicates of wildtype and its mutants.
Figure 1
figure 1

Scatter plot of RNA seq expression analysis. Scatter plots of transcript expression data for replicates of E. chaffeensis wildtype (A) and mutants ECH_0379 (B), ECH_0490 (C), and ECH_066 (D) showing a high degree of correlation. Scatter plots of transcript expression data for wildtype vs. mutants: (E) wildtype vs. ECH_0379, (F) wildtype vs. ECH_0490, and (G) wildtype vs. ECH_0660. Transcripts with ≥ 1 FPKM and minimum of 10 mapped reads were used. The plot is on a log-transformed scale.

Global transcriptome of E. chaffeensis

Distribution of the transcripts in wildtype E. chaffeensis (Fig. 2) included 481 transcripts represented by less than five transcripts, followed by hypothetical protein transcripts (178) representing 19% of transcriptome, and 127 ribosomal protein gene transcripts (14%). Transcripts of major outer membrane proteins (22 transcripts) represent the next most abundant group. Conserved domain protein transcripts encoded from 14 genes are associated with NADH dehydrogenase I complex. Other highly expressed genes included molecular chaperones, ATP synthase, putative membrane protein, cytochrome c oxidase, GTP-binding protein, putative lipoprotein, translation elongation factor, ABC transporter, and DNA polymerases; all of which represented 0.5–1.7% of the transcriptome. Table S2 lists the top 100 highly expressed genes in transcriptome of wildtype E. chaffeensis.

Figure 2
figure 2

Distribution of the identified transcripts in wildtype E. chaffeensis. The inlaid numbers represent the percentage of transcripts detected in the RNA seq data (n = 3). The number of identified transcripts associated with each gene category is shown in the brackets. The minimum transcripts representation for each gene category was set to 5. The total number of genes identified was 920.

ECH_0379 mutation caused transcriptional down-regulation of many genes involved in antiporter activity, phage proteins, and those involved in transport and transcription function

Differential gene expression (DGE) was determined by comparing the RPKM expression values of mutants and wildtype (Fig. S3). Fold changes were considered significant with a p-value < 0.05, False Discovery Rate (FDR) ≤ 0.001, and consistency of expression values between replicates. The change in gene expression was not significant between wildtype and mutants for housekeeping genes. Based on these criteria, 41 genes were identified as predominantly downregulated and two genes were upregulated in the ECH_0379 gene mutant compared to wildtype (Table 2). The most prominent genes that showed a significant decrease in the transcription levels were those encoding for antiporter proteins, ABC transporters, and ATP-dependent Clp protease (ECH_0367). Four antiporter protein genes: monovalent cation/proton antiporter (ECH_0466), Na(+)/H(+) antiporter subunit C (mrpC) (ECH_0469), potassium uptake protein TrkH (ECH_1093), and nitrogen regulation protein NtrY (ECH_0299) showed a significant decline in the transcript levels. In addition, transcripts for two membrane transporters: cation ABC transporter permease protein transcript of the gene ECH_0517 and another ABC transporter permease protein transcript of the gene ECH_0972 were downregulated. Three genes coding for phage-like proteins {phage prohead protease (ECH_0032), phage portal protein (ECH_0033), and phage major capsid protein (ECH_0830)} were also downregulated in the mutant strain. Transcripts for 6 genes involved in transcription, namely DNA replication and repair protein RecF (ECH_0076), formamidopyrimidine-DNA glycosylase (ECH_0602), dimethyladenosine transferase (ECH_0648), GTP-binding protein EngA (ECH_0504), leucyl-tRNA synthetase (ECH_0794), and endonuclease III (ECH_0857) were also downregulated in this mutant strain. The enzymes of metabolic processes such as glutamate cysteine ligase (GCL) (ECH_0125), DNA/pantothenate metabolism flavoprotein (PMF) (ECH_0374), ATPase, AGF1 (ECH_0392), uroporphyrinogen III synthase (UPGS) (ECH_0480), diaminopimelate decarboxylase (DAPDC) (ECH_0485), biotin-acetyl-CoA-carboxylase ligase (BACL) (ECH_0848), and argininosuccinate lyase (ASL) (ECH_0937) are also down-regulated. Transcripts for 8 hypothetical protein genes; ECH_0021, ECH_0161, ECH_0264, ECH_0289, ECH_0725, ECH_0879, ECH_0913, and ECH_1053 were also among the downregulated genes in this mutant.

Table 2 E. chaffeensis genes differentially transcribed in ECH_0379 compared to wildtype.

Differential transcriptional regulation of T4SS and p-28 OMP gene cluster genes in mutant ECH_0490

In the ECH_0490 mutant strain, 37 genes were significantly downregulated and 17 genes were up-regulated (Table 3). Four of the downregulated genes belonged to the T4SS are ECH_0494 (VirB3), ECH_0496 (VirB6), ECH_0498 (VirB6), and ECH_0499 (VirB6); and a type I secretion membrane fusion protein (T1SS_HlyD) (ECH_0970). Molecular chaperone genes, such as a cold shock protein (CSP) (ECH_0298) and ATP-dependent Clp protease, and a ATP-binding subunit ClpA (ClpA) were also downregulated. The transport proteins including the protein export membrane protein (SecF) (ECH_0095), preprotein translocase (SecY) (ECH_0428), potassium uptake protein (TrkH) (ECH_1093), and nitrogen regulation protein (NtrY) (ECH_0299) were also among the downregulated genes. Metabolic enzymes involved in biosynthetic processes, {tetrahydropyridine-2-carboxylate N-succinyltransferasem (dapD) (ECH_0058), quinone oxidoreductase (ECH_0385), metalloendopeptidase, (MEP) (ECH_0644), peptide deformylase (PDF) (ECH_0939), serine/threonine phosphatase (PSP) (ECH_0964), pyrophosphatase (PPi) (ECH_1014), and orotate phosphoribosyltransferase (OPRTase) (ECH_1108)}, were also down-regulated. Transcription- and translation-related genes, such as elongation factors (EF-Tu) (ECH_0515), aminoacyl-tRNA synthetases (IARS) (ECH_0538), DNA-binding protein (HU) (ECH_0804), 3′-5′ exonuclease domain (ECH_1011), and DNA-binding response regulator (ECH_1012), were also downregulated.

Table 3 E. chaffeensis genes differentially transcribed in ECH_0490 compared to wildtype.

Upregulated protein genes in this mutant included 7 that belonged to the transmembrane protein category. Of these, four belonged to the p-28 OMP gene cluster {ECH_1143 (OMP-p28), ECH_1146 (OMP-p28-2), ECH_1136 (OMP-1B), and ECH_1121 (OMP-1N)}. In addition, two putative membrane protein genes (ECH_0009, ECH_0230) and an immunodominant surface protein gene (ECH_0039) were upregulated. Transcripts for the heat shock proteins ATP-dependent Clp protease, ClpA (ECH_0567) and ATP-binding chaperon, ClpB (ECH_0367), and the stress response-associated RNA polymerase sigma factor (RpoH) (ECH_0655) were also upregulated. Transcripts for two genes coding for iron sulfur proteins {BolA family protein (ECH_0303) and FeS cluster assembly scaffold (IscU) (ECH_0630)} were similarly up-regulated. We observed differential expression of six hypothetical protein genes, which included ECH_0166, ECH_0251, ECH_0450, ECH_0531, ECH_0753, and ECH_0878.

Mutation in ECH_0660 gene led to minimal transcriptional alterations

While we observed drastic gene expression changes in both ECH_0379 and ECH_0490 mutants, ECH_0660 mutant transcriptome showed minimal variations compared to wildtype; we observed only five genes as notably differentially expressed in this mutant (Table 4). The genes included nitrogen regulation protein (NtrY) (ECH_0299) and the ABC transporter permease protein (ECH_0972) as down-regulated genes, whereas the heme exporter protein CcmA (ECH_0295) and chaperonin (ECH_0364) were upregulated. We also identified several commonly differentially-expressed genes in ECH_0379 and ECH_0490 (Table 5). The ribonuclease D (ECH_0300) and potassium uptake protein (ECH_1093) were commonly down regulated in ECH_0379 and ECH_0490. T4SS protein VirB4 gene was down-regulated in ECH_0490 mutant, whereas this gene was up-regulated in ECH_0379 mutant. Contrary to this, ClpB was down-regulated in ECH_0379 mutant and upregulated in ECH_0490 mutant.

Table 4 E. chaffeensis genes differentially transcribed in ECH_0660 compared to wildtype.
Table 5 E. chaffeensis common differentially transcribed genes in mutants.

Validation of RNA seq data by quantitative real-time reverse transcription PCR

Quantitative real-time quantitative reverse transcriptase-PCR (qRT-PCR) analysis was carried out on thirteen randomly selected genes identified as differentially transcribed according to the RNA seq data. To generate qRT-PCR data, we first normalized RNA samples to a constitutively expressed E. chaffeensis gene coding for the16S RNA as previously described in Cheng et al.6. The primers and genes selected for the qRT-PCR analysis are listed in Table S3. Transcript abundance for 7 down-regulated genes in ECH_379 mutant, including ECH_0466 and mrpC, ClpB, ECH_0033, NtrY, TrkH, and ECH_0972 were validated (Fig. 3A). Similarly, 6 upregulated genes from ECH_0490 mutant strain, including four transcripts belonging to an OMP gene cluster (OMP-p28, OMP-1B, OMP-1N, OMP-p28-2) and one each from ClpB and RpoH genes were verified by qRT-PCR (Fig. 3B). Likewise, the down-regulation of transcripts for the ECH_0299 and ECH_0972 genes were confirmed in ECH_0660 mutant by qRT-PCR (Fig. 3C).

Figure 3
figure 3

Verification of transcriptional variations observed in RNA seq data by qRT-PCR. Transcriptional fold changes in wildtype vs. ECH_0379 mutant (A), ECH_0490 mutant (B), or ECH_0660 mutant (C) were presented from the qRT-PCR data. Black bars represents RNA seq data and white bars represents qRT-PCR data.

Discussion

Isolation of cell-free bacterial RNA from highly abundant host RNA is the first challenge in transcriptional profiling of intracellular pathogens25,28,29. Rickettsiales require culturing in host cells and then need to be purified before extracting RNA for transcriptome evaluation experiments. To document the impact of three transposon mutations on E. chaffeensis transcription, we first developed a method for isolation and purification of host cell-free E. chaffeensis organisms, from which we isolated RNA and then subjected to next generation sequencing (NGS) analysis. To isolate cell-free E. chaffeensis, we started with an efficient host cell lysis protocol, and then filtration of whole cell lysate, followed by a renografin density gradient centrifugation. The second challenge was to obtain host cell-free RNA for transcriptome profiling. Previous studies report that bacterial RNA enrichment methods result in the enrichment of bacterial RNA reads only 3–10%29,30. Isolation of host cell-free bacteria and the bacterial RNA purification steps implemented in our study allowed a greater enrichment of E. chaffeensis RNA. In our current studies, we were able to enrich the bacterial RNA, which helped in generating up to 19% high mapping RNA reads. Notably, deep RNA sequencing analysis aided in mapping 80% of E. chaffeensis genes expressed in infected macrophage host cells.

Among the highly expressed genes, the p28-OMP multigene cluster was dominant in the transcriptome. The E. chaffeensis p28-OMP multigene locus contains 22 tandemly arranged genes coding for the bacterial immunodominant proteins31,32,33. The presence of all 22 transcripts in the RNA seq data suggest that the gene cluster is among the most abundantly expressed genes. These observations are consistent with our previous proteomic study where we reported the p28-OMP genes’ expression abundance33. NADH dehydrogenase I complex genes were also highly expressed in E. chaffeensis. NADH dehydrogenase counters the phagosomal NOX2 response to inhibit host cell apoptosis34. T4SS effector proteins in some pathogenic bacteria are considered as important in manipulating a host gene expression to undermine the host immune response35,36. The contributions of T4SS effectors in pathogenicity are already reported for rickettsiales, including for A. marginale, A. phagocytophilum, E. canis, and E. chaffeensis37,38,39. The RNA seq analysis identified several transcripts encoding for T4SS proteins, including VirB3, B4, B6, B8, B9, B10, and B11. Chaperone protein genes DnaK, DnaJ, GroE, and ClpB were also highly expressed in both wildtype and mutant strains. The presence of such proteins involved in cell homeostasis and the oxidative stress response is reported in other rickettsiales39,40,41, suggesting that their gene products are also critical for the E. chaffeensis stress response if the pathogen proteome is similarly altered as per the transcriptome reported in the current study. Indeed, our recent study suggests that the stress response proteins are important for E. chaffeensis11. Other highly expressed protein genes included those encoding for house-keeping ribosomal proteins involved in protein synthesis, putative membrane proteins, ABC transporter, and lipoprotein; all of which are likely important for the pathogen’s protein synthesis, transport, trafficking, and effector secretion into the host cells. ATP synthase subunit, cytochrome c oxidase, DNA polymerases, GTP-binding protein and translation elongation factors involved energy metabolism, cell division, and transcriptional regulation were also among the highly expressed genes in both wildtype and mutant organisms. The extent of transcriptome coverage is higher than the previously reported for E. chaffeensis in ISE6 and AAE2 tick cells8. This is substantial for both the enhanced detection of intracellular pathogen transcripts and also because of the abundance of gene expressions observed. Higher coverage of the transcriptome likely resulted from deep sequencing of the RNAs by next-generation sequencing compared to microarray analysis8. This global set of highly expressed genes may represent products involved in pathogenicity, replication and survival of E. chaffeensis in host cell environment42,43. Four transcripts that code for ankyrin repeat proteins, which are shown to mediate protein-protein interactions44, were also identified in the transcriptome. Notably, the transcriptome from the wildtype and mutant organisms contained 216 transcripts that code for hypothetical proteins with unknown function. As these were within the core transcriptome, we anticipate that they represent an important set of transcribed genes for E. chaffeensis replication.

Transcription from large numbers of genes in ECH_0379 mutant was found to be reduced compared to wildtype. Genes representing antiporters, ABC transporters, chaperons, metabolic enzymes, and transcription regulators are among the down-regulated genes (Table 2). We predict that the mutation in the anti-porter protein gene caused a metabolic depression. Antiporter and transport proteins play an important role in the transport of ions and solutes across the cell membranes of bacteria45. Antiporters are integral membrane proteins that perform secondary transport of Na+ and/or K+ for H+ across a phospholipid membrane5. The E. chaffeensis genome contains several genes having homology to antiporter proteins or their subunits, suggesting that they are needed for the pathogen’s intraphagosomal replication and survival in a host. In particular, antiporters aid bacteria in maintaining pH, salt, and temperature conditions46. We observed a significant decline in transcription of antiporter genes such as monovalent cation/H + antiporter subunit C (ECH_0469) and ECH_0466. Disrupting the antiporter function or preventing their expression may affect the pathogen’s growth in vivo. Indeed, mutation in the ECH_0379 gene resulted in the attenuated growth of the organism in both an incidental host (dog) and in the reservoir host (white-tailed deer)5,6. ABC transporters also are involved in uptake of ions and amino acids and may play an important role in a pathogen’s ability to infect and survive in a host cell environment47. The ECH_0379 mutant had low levels of transcriptional activity of the genes ECH_0517 and ECH_0972 encoding for ABC transporters, which function at different stages in the pathogenesis of infection47,48. These proteins promote the survival of pathogens in the host microenvironments49. The mutation possibly interferes with transport mechanisms, thereby affecting its ability to infect and survive in host cells5,6. The mutation may have also caused alterations to the transcriptions of genes involved in physiological responses, such as regulating the pathogen’s metabolic activities. We also found down-regulation of several transcripts encoding for metabolic enzymes: glutamate–cysteine ligase, DNA/pantothenate metabolism flavoprotein family protein, ATPase, uroporphyrinogen-III synthase, diaminopimelate decarboxylase, biotin–acetyl-CoA-carboxylase ligase, and argininosuccinate lyase. In general, a pathogen’s survival in an intracellular environment depends on its ability to derive nutrients from the host cell50. Pathogenic bacteria use metabolic pathways and virulence-associated factors that undermine the host immune system so that they can derive nutrients from their host cells51. It is possible that the downregulation of the transcripts from the aforementioned genes in the ECH_0379 mutant hampers the bacterial metabolic response and its capacity to derive nutrients from the host. The mutation also caused decreased expression of genes encoding DNA replication and repair protein, formamidopyrimidine-DNA glycosylase, dimethyladenosine transferase, and leucyl-tRNA synthetase. This may have also contributed to defects in pathogen’s intracellular growth and survival. Our prior studies suggest that despite the mutant’s attenuated growth, it failed to offer complete protection against wildtype infection challenge14. If the changes in the transcriptome correlate with changes in the proteome, variations in the mutant organisms’ protein expression relative to the wildtype E. chaffeensis may result in an altered host response, thus making the host less effective in initiating a protective host response when exposed to the mutant organisms14.

Pathogenic bacteria produce T4SS effectors to weaken the host cell gene expression and contributes to bacterial virulence52,53. RNA seq data suggested declined expressions of various T4SS component protein gene transcripts in ECH_0490 mutant. We also observed decreased transcription of chaperone proteins and several genes involved in the transcription and translational machinery, and exonuclease and DNA-binding regulator gene transcripts in the ECH_0490 mutant strain. On the contrary, ClpB (a major stress response heat shock protein) and RpoH (stress response RNA polymerase transcriptional subunit) showed increased transcription in the mutant.

Chaperone proteins play a key role in protein disaggregation and in aiding the pathogen to overcome the likely host cell-induced stress54. ClpB reactivates aggregated proteins accumulating under stress conditions and it was abundantly expressed during replication stage of E. chaffeensis54,55. Preventing or reducing protein aggregation and the associated protein inactivation during the bacterial growth within a host cell may benefit the pathogen in enhancing its survival11. The RNA polymerase transcription regulator, RpoH, is also important for the pathogen’s continued growth as it aids in promoting the expression of stress response proteins10. Consistent with the prediction, increased expression of ClpB and RpoH was observed in the current study for ECH_0490 mutant. The enhanced expression from these two important genes likely enables the mutant to grow similarly to wildtype E. chaffeensis in vertebrate and tick hosts, as reported in our previous studies5,6. Outer membrane proteins perform a variety of functions such as invasion, transport, immune response, and adhesion that are vital to the survival of Ehrlichia species, including E. chaffeensis and E. ruminantium in a host32,56,57,58,59. The ECH_0490 mutant had increased abundance of OMPs compared to wildtype organisms. We found seven transmembrane genes coding for immunodominant P28/OMP family of proteins (OMP_p28, OMP_p28-2, OMP-1B, and OMP-1N) and membrane proteins (ECH_0039, ECH_0009, and ECH_0230) to be upregulated. Significant changes in the abundance of the outer membrane proteins may be associated with overall changes in the membrane architecture, thereby altering the pathogen’s susceptibility to host defense. The transcriptional changes noted in the ECH_0490 mutant may not have had any negative impact on the pathogen, as the mutant grows similar to the wildtype pathogen both in white-tailed deer (the reservoir host) and in dogs (an incidental host), and in its tick host, Amblyomma americanum5,6. Transcriptional activity assessment of the genes ECH_0490 (lipoic acid synthetase) and ECH_0492 (putative phosphate ABC transporter), both of which are located up and down stream to the transposon insertion mutation, respectively, suggested that the mutation has no effect on these genes’ transcription (Fig. S4). The diverse changes in the transcriptome of the mutant, while having no impact near the mutation site, suggest that the mutation impacted global gene expression and yet did not adversely affect the pathogen’s survival in vertebrate and tick hosts5,6.

The most notable observation was the apparent minimal variation in the transcriptome of the ECH_0660 mutant compared to the wildtype E. chaffeensis. Importantly, mutation within ECH_0660 gene causes severe growth defects in vivo in vertebrate hosts5,6. Further, infection with this mutant also initiates a strong host response and confers protection against wildtype pathogen infection challenge14,60. In the current study, we observed only minor changes in the gene expression in this mutant compared to wildtype. The minor changes in gene expression included genes encoding for putative nitrogen regulation protein, ABC transporter, heme exporter protein and GroES, but the variations were significantly less compared to numerous changes described in the previous two mutants. Together, these data suggest that the mutation in ECH_0660 gene led to fewer transcriptional alterations. Assuming that the proteomes of the wild type and mutant strains of E. chaffeensis are similarly altered as the transcriptomes, then ECH_0660 mutant proteome may be very similar to the wildtype bacterium. The greater degree of similarity between this mutant and the wildtype may enable the vertebrate hosts to recognize this mutant as closer to wildtype organism, thus inducing a stronger host response that mimics wildtype infection14,60. The replication defect reported earlier with this mutant may have resulted due to the loss of gene expression from fewer genes such as ECH_0659 and ECH_0660, while maintaining most of the transcriptome similar to the wildtype.

Conclusions

RNA deep sequencing studies in intracellular bacteria are still a major challenge. The RNA seq data reported here provide the first snapshot of comparative transcriptomics of E. chaffeensis. Sequencing of enriched bacterial RNA from wildtype and mutant strains yielded a high coverage of genes. A mutation in the ORF of ECH_0379 gene caused drastic down-regulation of genes leading to metabolic depression, which may have contributed to the mutant’s attenuation in vertebrate hosts. While a mutation downstream to the protein coding sequence of ECH_0490 gene induced global changes in gene expression, up regulation of stress response regulatory genes may have helped the mutant survive in the vertebrate hosts and tick hosts. A mutation within ECH_0660 gene coding sequence resulted in few transcriptional changes, thus keeping the integrity of its transcriptome similar to wildtype. While the transcriptome data are suggestive of protein expression variations, additional experimental validation from protein analysis studies is necessary to confirm the results. Together, this study offers the first detailed description of transcriptome data for E. chaffeensis, suggesting that variations observed in the pathogen’s ability to survive in a host and the host’s ability to induce protection against the pathogen may be the result of global changes in the gene expression, which in turn may impact changes in the pathogen’s proteome.

Materials and Methods

In vitro cultivation and cell-free E. chaffeensis recovery

E. chaffeensis Arkansas isolate wildtype and the mutants were grown in the canine macrophage cell line, DH8258,61. Isolation and purification of cell-free E. chaffeensis wildtype and its mutants were carried out as outlined in Fig. S5. Briefly, the bacterial infection rate in DH82 cells was assessed with Diff-Quik staining. After 72 h of infection when the infection reached to about 80–90%, the culture from four T-150 confluent flasks was harvested and centrifuged at 500 × g for 5 min. Cellular pellets were resuspended in 1 × phosphate buffered saline (PBS) containing protease inhibitors (Roche, Indianapolis, IN) and cells were homogenized on ice by passing through, 15–20 strokes with a 23 g needle in a 10 mL syringe. Efficiency of homogenization, 80–90% lysis, was checked under light microscope. Whole cell lysate was centrifuged at 500 × g for 5 min at 4 °C. The resulting supernatant containing cell-free Ehrlichia organisms was filtered through a 2 µm sterile membrane filter (Millipore, Billerica, MA). Cell-free Ehrlichia from filtrates were pelleted by centrifuging at 15,000 × g for 15 min and the pellet was suspended in PBS and then layered onto 30% diatrizoate meglumine and sodium solution (Renografin) MD-76R (Mallinckrodt Inc, St. Louis, MO). The suspension was centrifuged for 1 h at 100,000 × g at 4 °C in a S50-ST swinging bucket rotor (Beckman, Indianapolis, IN). The pellet of cell-free Ehrlichia were washed at 15,000 × g for 15 min and used for experiments.

Bacterial mRNA enrichment and sequencing

Figure S6 outlines the workflow for bacteria mRNA enrichment and cDNA library preparation and RNA sequencing. Briefly, RNA form wildtype and mutants were isolated from purified cell-free Ehrlichia using TRIzol Reagent (Sigma-Aldrich, St. Louis, MO). RNA samples were then treated with DNase I (Invitrogen, Carlsbad, CA) and bacterial RNA was enriched by removing host 18 S rRNA, 28 S rRNA, and polyadenylated mRNA using MICROBEnrich Kit (Ambion, Foster City, CA). The quantity and integrity of bacterial RNA before and after enrichment was assessed using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Waltham, MA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). The Ribo-Zero Magnetic Kit was used to isolate mRNA from total RNA samples and then fragmented into short fragments as per the manufacturer’s protocols (Epicentre, Madison, WI). Subsequently, cDNA was synthesized using the mRNA fragments as templates. Libraries of cDNAs for wildtype and mutants were prepared using the TruSeq RNA Sample Prep Kit (Illumina, Ingolstadt, Germany). Sample libraries were quantified using Agilent 2100 Bioanaylzer and library quality was assessed by Real-Time PCR (ABI StepOnePlus) prior to subjecting the samples to sequencing on Illumina HiSeqTM 4000 (Beijing Genomics Institute (BGI), Philadelphia, PA).

Bioinformatics analysis

The original image data were transferred into raw sequence data via base calling. Raw reads were subjected to quality assessment to determine whether the raw reads were qualified for mapping (Fig. S5). The bases with low quality (<20) were excluded from the analysis. Raw reads were then filtered to remove adapter sequences and low quality reads, then clean reads were aligned to the E. chaffeensis Arkansas strain complete genome as per the first annotated GenBank # CP000236.1 using SOAPaligner/SOAP262. We opted to use this accession number because our prior publications, and similarly other investigators, widely used it for referring to gene names and numbers listed in it. Not more than five mismatches were allowed in the alignment, which is a standard cut off used for the alignment analysis. The alignment data were used to calculate distribution of reads on reference genes and determine the gene coverage. Alignment results were assessed for quality check and then proceed with analysis of DGE. The gene expression level was calculated using RPKM method of normalizing for total read length and the number of sequencing reads63. We used p-value < 0.05, False Discovery Rate (FDR) ≤ 0.001, and the absolute value of Log2 Ratio ≥ 1 as the threshold to judge the significance difference in gene expression. The FDR uses accurate p-values as a measure of control in multiple sample comparison of RNA seq data. Corrections for false positive and false negative errors were performed using the method described by Benjamini and Yekutieli64.

Quantitative real-time reverse transcription PCR

SYBR green detection-based quantitative real-time reverse transcription PCR (qRT-PCR assays were performed to validate the gene expression changes observed in the RNA seq data analysis. Wildtype, ECH_0379, ECH_0490, and ECH_0660 mutants’ RNAs used in generating the RNA seq data were also used to determine transcript levels by performing quantitative RT-PCR by SYBR Green assays using a SuperScript® III Platinum SYBR Green One-Step qRT-PCR Kit (Invitrogen, Carlsbad, CA). RNA was reverse transcribed from all the replicates using SuperScript III and then quantitative-PCRs were performed in a 25 μL reaction containing 0.5 μM each of forward and reverse primers. Thermal cycler conditions were; 94 °C for 15 sec, 60 °C for 30 sec, and 74 °C for 15 sec for 40 cycles. Thirteen randomly selected differentially transcribed genes were used in validation experiments using StepOnePlus™ Real-Time PCR instrument (Applied Biosystems, Foster City, CA) and the data were analyzed by StepOne Software v2.3. E. chaffeensis 16 S rRNA was quantitated by real-time RT-PCR as described in27 and used for normalization of RNA concentrations among different RNA batches, prior to performing the validation experiments. For qRT-PCR data, the delta-delta Ct (ΔΔCt) calculation was employed to calculate relative change in the expression and fold change was obtained by averaging the replicate values of gene expression and the standard error. Semi-quantitative one-step RT-PCR (Life Technologies, Carlsbad, CA) targeting to E. chaffeensis genes ECH_0490 and ECH_0492 near the transposon mutation downstream to ECH_0490 gene was performed with 30 cycles of amplification using the gene specific primers as described in a previous study6. Briefly, RNA from wildtype and ECH_0490 mutant were used as the templates for RT-PCR. One tube without reverse transcriptase or template RNA was used as negative control. One tube with DNA as the template was used as positive control. Thermal cycler conditions were as follows: 50 °C for 1 h for reverse transcription step then followed by 35 cycles of 94 °C for 30 sec, 55 °C for 30 sec, and 72 °C for 30 sec; finally a 2-min 72 °C extension step was part of the reaction.