Abstract
Protein expression in vivo is predominately controlled via regulatory feedback mechanisms that adjust the level of mRNA transcription. However for positive sense single-stranded RNA viruses, protein expression is often controlled via secondary structural elements, such as internal ribosomal entry sites, that are encoded within the mRNA. The self-regulation of mRNA translation observed in this class of viruses suggests that it may be possible to design mRNAs that self-regulate their protein expression, enabling the creation of mRNAs for vaccines and other synthetic biology applications where protein levels in the cell can be tightly controlled without feedback to a transcriptional mechanism. As a proof of concept, I design a polycistronic mRNA based on bacteriophage MS2, where the upstream gene is capable of repressing synthesis of the downstream gene. Using a computational tool that simulates ribosome kinetics and the co-translational folding of the mRNA in response, I show that mutations to the mRNA can be identified which enhance the efficiency of the translation and the repression of the downstream gene. The results of this study open up the possibility of designing bespoke mRNA gene circuits in which the amount of protein synthesised in cells are self-regulated for therapeutic or antigenic purposes.
Similar content being viewed by others
Introduction
Messenger RNA (mRNA) based therapeutics and vaccines present a promising new platform for the delivery of immunogenic compounds for vaccination or therapeutic proteins for disease treatment and management1,2,3,4. While there was recently great success with the rapid development of an mRNA based vaccine for SARS-Cov-2, obstacles still remain to the development of mRNA based therapies, which typically require much higher levels of the therapeutic proteins to be delivered to the target cell1. This has lead to the development of strategies such as self-amplifying mRNAs5,6,7, where replicase genes are also supplied on the mRNA, allowing it to undergo replication in the cell. However, it is conceivable that there may exist situations in mRNA based pharmacology where protein production in the cell needs to be tightly controlled as opposed to a continuous production of the protein in the cell to extreme levels. Such considerations highlight the need to develop a wide range of strategies for controlling protein expression in mRNA vaccines and therapeutics.
Techniques to regulate gene expression on mRNAs at the translational level have traditionally focused on the use of non-coding RNAs or riboswitches to engineer regulatory mechanisms that can either repress or promote gene expression from the mRNA. For example, trans-acting small RNA translational activators, termed toehold switches8, are a class of RNA based regulators of translation that can be designed to interact with mRNAs to promote exposure of the ribosome binding site (RBS), directly controlling protein expression. For example, Wang and Simmel have recently demonstrated that small RNAs can be used to both activate and repress translation by sequestering either anti-anti-RBS or anti-RBS sequences9. While Wang and Simmel have used toehold switches to regulate genes on an mRNA, some groups have used toehold based methods to design more complicated mRNA based logic circuits10, where expression is triggered by the presence of miRNAs. Attachment of these miRNAs to the mRNA expose the RBS, providing potential cell specific expression of the mRNA. In addition to toehold based methods, Hong et al.11 also employed the use of small transcriptional activating RNA (STAR)12,13,14, which target transcriptional activation by the RNA polymerase, to construct an incoherent feed-forward loop to control gene expression. Both the use of STAR and toehold switches rely on external small RNAs to exert regulatory control on protein expression, which may be difficult to employ broadly in mRNA based therapeutics.
In contrast to regulatory control of translation using small RNAs, bacteria and viruses provide many alternative examples of how RNA secondary structure within the mRNA, and its interactions with metabolites and proteins, can provide direct regulatory control of gene expression at the translational level. For example, the small positive sense single-stranded RNA (ssRNA) virus bacteriophage MS2 regulates the synthesis of its RNA dependant RNA polymerase (RdRp) gene via a feedback mechanism that tightly controls the amount of RdRp produced in relation to the amount of coat protein present. Specifically, coat protein dimers bind to a 19 nucleotide hairpin encompassing the start codon for the RdRp gene with nanomolar affinity. Thus, as coat protein concentration increases, binding of coat protein to the hairpin blocks further ribosome initiations on the RdRp gene15. Similar to phage MS2, bacterial mRNAs also have similar feedback mechanisms which regulate protein expression from the mRNA transcript and monitor free levels of the protein in the cell. For example, in the L11-L1 polycistronic mRNA in E. coli, as excess amounts of the large ribosomal protein L1 protein accumulates in cell, the L1 protein binds to the RBS of the L11 protein in the mRNA and slows expression of both proteins16. These examples provide an alternative strategy to that of trans-acting small RNAs for the purpose of regulating protein production in mRNAs, where feedback to a transcriptional control mechanism or control via small trans-acting RNAs is not feasible. However, it should be stated that the ability to apply these control mechanisms, which are typically present in bacteria, in a Eukaryotic system for mRNA therapeutics remains hypothetical for the moment.
In this work, I report on a proof of concept application where I have designed a polycystronic mRNA in which the protein production is self-regulated. Specifically, I have re-purposed the MS2 translation repression and coupling mechanism, which functions to regulate the level of the RdRp protein produced, to instead regulate the levels of the 19 kDa NanoLuc luciferase (NLuc) protein. The resulting polycistronic mRNA contains separate genes for the coat and NLuc proteins, with the NLuc protein being translationally coupled and repressed by the MS2 coat protein. Through the use of my computational tool which simulates the kinetics of ribosome translation and mRNA co-translation folding due to ribosome movements on the mRNA17, I am able to propose a series of mutations which enhance overall protein expression and the ability of MS2 coat protein to repress expression of NLuc via the binding of coat protein to a TR hairpin. Interestingly, the results also reveal the importance of alternative RNA secondary structures in the RBS and how these can kinetically compete to alter gene expression, providing important insights into the impact of RNA secondary structure on translation efficiency.
Methods
Reagents
Synthetic oligodeoxynucleotides were custom ordered from Eurofins Genomics and the pET-21a(+) cloning vector was supplied by Novagen (#69770). Restriction enzymes were obtained from Thermofisher (#FD0083, #FD0094). NanoGlo assay reagent was obtained from Promega (#N1110). Restriction digestion clean-up and plasmid isolation was performed using Thermofisher GeneJet PCR clean-up and miniprep kits (#K0701, #K0502). Mutagenesis of the plasmids was done using the NEB-Q5 mutagenesis kit New England Biolabs (#E0554S). All commercial enzymes and kits were used following their provided protocol and the manufacturers provided buffer(s) unless otherwise stated.
Biological resources
Competent E. coli DH5\(\alpha \) cells for cloning were obtained from Thermofisher (#EC0112) and competent E. coli BL21 (DE3) cells for expression were obtained from New England Biolabs (#C2427H).
Plasmid preparation and mutagenesis
Two DNA fragments encoding the CTnano-cnt and CTnano mRNAs were synthesised (GeneStrands, Eurofins Genomics, Cologne, Germany) with each containing the unique restriction sites BglII and Bpu1102. The CTnano-cnt mRNA sequence is based on bacteriophage MS2 coat gene (Genebank accession code NC001417) and a NanoLuc luciferase gene which has been codon optimised for expression in E. coli (Genebank accession code MN834152). The CTnano mRNA sequence is a synonymously re-coded version of CTnano-cnt designed for enhanced expression of NLuc. The exact sequences synthesised are given in supplementary data. These sequences were separately cloned into the pET-21a(+) vector (Novagen) using the restriction sites BglII/Bpu1102 resulting in the plasmids pET-CTnano and pET-CTnano-cnt. Production of the mRNAs in these plasmids are under the control of a T7 promoter and are terminated by a T7 terminator hairpin (c.f. Supplementary Fig. 1). Following creation of these plasmids, three mutations (N55D, T19*, and S37*) where separately introduced to the wild-type MS2 coat protein present in these mRNA constructs using the NEB-Q5 mutagenesis kit (New England Biolabs). The primer sets and PCR settings used are listed in supplementary information. Mutations were subsequently verified by Sanger sequencing of the plasmids (Eurofins Genomics) and all plasmids were grown and isolated using GeneJet plasmid miniprep (Thermofisher) from E. coli DH5\(\alpha \) cells (Thermofisher).
NanoLuc luciferase luminescence assay
New England Biolabs E. coli BL21 (DE3) cells containing one of the 8 plasmids used in this study were grown at \({37}\,\,^\circ \) C in 5 ml of LB broth supplemented with 100 \(\mu \)g/ml ampicillin. When cells reached an optical density \({A}_{600} \approx 0.6\), expression of the mRNA was induced using 0.1 mM isopropyl \(\beta \)-D-thiogalacoside (IPTG). After 30 min of incubation at \({37}\,\,^\circ \) C, the final \({A}_{600}\) of the culture was measured and 20 \(\upmu \)l of culture was diluted with \(80 \upmu \)l of fresh LB and added to 100 \(\upmu \)l of NanoGlo luciferase assay (Promega) consisting of NanoGlo lysis buffer supplemented with the provided furimazine substrate at a v/v ratio of 50:1. Measurements of the luminescence were performed on a BMG Labtech Flowstar OPTIMA 96 well plate reader with emission filter set to 460nm at time points of approximately 5, 15, 30, 45, and 60 min after addition of the NanoGlo assay reagent. Detection limits of the instrument were determined to range between \(2.1\times 10^2\) and \(2 \times 10^6\) (Arb. U.). Measurements were performed in triplicate using three different samples of culture and luminescence time courses were fit to the equation \(L(t) = a + b/(1+exp(-c(t-d)))\), where a, b, c and d are constants as detailed in Supplementary Figs. 5 and 6.
Computational prediction of mRNA structure
To compute putative secondary structures of both the CTnano-cnt and CTnano mRNAs, I assume that the sequence upstream of the TR stem-loop folds both co-transcriptionally and co-translationally into the native structure by the time the polymerase or ribosome reaches the TR stem-loop. Following this assumption, I enforce that the long-distance Min Jou interaction is in place and fix the secondary structure of the mRNA upstream of the TR stem-loop to that of the native structure of the MS2 coat gene, which has been determined experimentally using enzymatic probing and phylogenetic analysis18,19. The secondary structure of the remaining downstream sequence is determined from the minimum thermodynamic free energy fold computed using Turner energy rules20. The resulting structures are shown in Supplementary Figs. 2 and 3.
Computational design of CTnano mRNA for enhanced NLuc expression
Potential synonymous mutations to the NanoLuc luciferase gene which stabilise the TR hairpin and other long distance interactions are identified and tested using a three pronged approach. First, the minimum free energy structure of the mutated sequence is computed using standard mRNA structure prediction algorithms21. Second, the mutated sequence is folded for \(t=1\) second of time kinetically using KFOLD22 starting from a single-stranded state. Finally, the mutated sequence is tested for its ability to return the NLuc ribosome binding site back to its original structure during co-translational folding in the presence of ribosomes using my ribosome/folding kinetics model which simulates mRNA co-translational folding kinetics due to ribosome movement over the mRNA17. Satisfying these three tests insures that; (a) kinetic folding of the sequence is “funnelled” into the thermodynamic minimum free energy structure and, (b) that co-translational folding of the sequence does not trap any structures (particularly the ribosome binding site) into mis-folded states.
The overall procedure used to identify the CTnano mRNA sequence is as follows. First, a set of mutations are identified which forces the desired structure to have the thermodynamic minimum free energy. Next the sequence is tested using the ribosome folding model17 for co-translational folding of the mRNA and its ability to return the NLuc ribosome binding site back to its original structure. Subsequent mutations are then made in the first 100 nucleotides of the coding sequence which improve the co-translational folding properties of the NLuc ribosome binding site while also testing that these mutations do not alter the minimum free energy structure. Finally, KFOLD22 is used to check that the sequence folds kinetically into the thermodynamic minimum free energy structure starting from a single-stranded state. Although a total of 77 mutations to the CTnano-cnt were required, it should be noted that many of these were chosen to stabilise structures downstream from the NLuc ribosome binding site. This was done as a precaution to avoid the possibility that long distance interactions with these areas form with either the coat region (first 478 nucleotides of the CTnano mRNA sequence) or the NLuc ribosome binding site.
Computational simulation of gene expression
A variety of computational tools exist for the estimation of gene expression which take into account the secondary structure of the mRNA around the RBS and any potential interactions of the 30S subunit with the Shine-Dalgarno sequence. These include the RBS Designer23, the RBS Calculator24,25, and the UTR Designer26. While each of these tools are capable of taking a given mRNA sequence and predicting expression rates from the various potential start codons present in the sequence, they do not take into account the effects of translational repression or coupling.
More recently, Tian et al.27 have constructed a biophysical model of protein expression on mRNAs containing translational coupling. However, while their model considers both translation re-initiation events and de novo ribosome initiations on the coupled gene, it does not consider the effects of ribosome movement on the RNA secondary structure and the potential of a downstream gene’s RBS that is normally hidden to be exposed in the process. With regards to the modelling of translation repression, several groups28,29,30 have also examined and provided biophysical models of the MS2 coat protein TR repression system. Interestingly, Vezeau et al.30 have used the repression system in MS2 to construct artificial binary switches. In this example, the TR hairpin is used to detect the presence of MS2 coat protein by either exposing or hiding the RBS of a reporter fluorescent protein that has been expressed from a separate plasmid. In these binary switch systems expression of the reporter system is either completely on or completely off. While these models may seem suitable for the study that I report on here, the translational repression/coupling system in phage MS2 is a non-equilibrium time-dependent process. Thus, to understand the correct quantities of NLuc that will be synthesised, one needs to account for the time-dependent accumulation of coat protein which will slowly inhibit ribosome initiations on the TR stem loop as the expression of NLuc is transitioned from the on state to the off state.
Thus, while all of the biophysical models discussed above can provide an estimation of gene expression in a prokaryotic system, they are not capable of predicting gene expression on mRNAs containing both translational coupling and repression. More importantly, they are also unable to examine the non-equilibrium process of protein expression that needs to be accounted for here. For this reason, computational prediction of coat protein and nano luciferase expression of the CTNano-cnt and CTnano mRNAs where simulated using my ribosome/folding kinetics model17. This model incorporates a kinetic version of the Salis model24 and is capable of predicting gene expression in mRNAs where there are non-equilibrium effects on protein expression, such as the translational coupling/repression mechanism in MS2. Additionally it is also able to simulate the movement of ribosomes and the resulting co-translational folding of the mRNA of interest while simulating the entire translational process that would be occurring in the cellular ‘background’ of an exponentially growing E. Coli cell. For example, in an E. Coli cell with doubling time 60 min, my model accounts for translation from 15,000 ribosomes on \(\approx 1200\) cellular mRNAs31 and includes all known kinetic steps in the translation process for initiation, elongation, and termination32. In addition to taking into account the competition between mRNAs for available free ribosomes, the model also considers the competition of tRNAs for the A-site during decoding, and the recharging of ternary complex by Ef-Ts. Finally, the model also includes the ability of coat proteins to bind to TR-like stem loops in the mRNA and models the competition between ribosomes and coat proteins for the TR stem-loop. Modelling this competition is critical to correctly calculating when the Nano-luc gene will be fully repressed and will be dependent on how the concentration of coat protein increases over time. Further details on the parameter settings, the number of cellular mRNAs, and the cell growth rate, used by the the protein predicition model can be found in17. The code for the protein prediction model can be downloaded from https://www.github/edykeman/ribofold, while the details on the commands and files used to generate the predicted protein expression can be found in supporting information.
Results
Design of a synthetic polycistronic mRNA gene circuit based on bacteriophage MS2
The bacteriophage MS2 regulates the synthesis of its RNA dependent RNA polymerase (RdRp) gene via a combination of two mechanisms, translational coupling and translational repression. In the coupling mechanism, translation of the RdRp gene is dependent on translation of the upstream coat protein gene. Work by Van Duin and colleagues demonstrated that disruption of a long distance RNA secondary structure interaction, the Min Jou interaction33, by ribosomal movement over the coat cistron in the viral mRNA results in the opening of the ribosomal binding site (RBS) for the RdRp gene (c.f. Fig. 1a and b). The Min Jou interaction results from the intermolecular pairing of two sequence elements in the MS2 viral genome, the Min Jou sequence (nucleotides 1427-1433) and the anti-Min Jou-sequence (nucleotides 1738-1744). If this interaction is present, RdRp synthesis is shut off as shown in the schematic in Fig. 1a. However, if this interaction is absent or disrupted by ribosome movement over the coat cistron, then RdRp synthesis is turned on as shown in Fig. 1b. In the translational repression mechanism, translation of the RdRp gene is blocked by the binding of a coat protein dimer to the 19 nucleotide translational repressor (TR) stem-loop, which encompasses the start codon for the RdRp gene (Fig. 1b). It is important to note that this repression is independent of the presence or absence of the Min Jou interaction, which allows the virus to permanently block further RdRp synthesis once sufficient coat protein is present in the cell. Thus, the overall mRNA secondary structure of the MS2 coat and RdRp genes, as illustrated in Fig. 1, enables the synthesis of the RdRp protein to be regulated via feedback from the overall coat protein dimer concentration in the cell.
When constructing a polycistronic mRNA where the NLuc protein is translationally repressed/coupled to the MS2 coat protein, one could directly swap the sequence encoding the RdRp gene with the sequence encoding the NLuc gene (as depicted in Fig. 2). However, it is not clear if this will result in an overall change in the mRNA secondary structure such that the NLuc RBS and/or the Min Jou interaction are disrupted. I hypothesise that the overall secondary structure of the NLuc RBS, and its response to co-translational folding, will have substantial effects on the expression of the protein, thus making the choice of codons of the NanoLuc luciferase gene critical to efficient protein expression.
To test this hypothesis, I have first constructed the control mRNA (CTnano-cnt) which was created by directly replacing the RdRp gene in bacteriophage MS2 with an mRNA sequence encoding the NanoLuc luciferase protein. To ensure that translation of the codon sequence is optimal, I have used a NanoLuc luciferase protein which has been codon optimised for expression in E. coli (Genbank ID MN834152). The thermodynamic minimum free energy fold of this mRNA when the Min Jou interaction is present (Fig. 3a and Supplementary Fig. 2) suggests that the TR stem-loop is present, and thus should have similar expression of the nano luciferase protein to that of RdRp in phage MS2. However, analysis of the NLuc expression in CTnano-cnt using my ribosome/folding kinetics model17 reveals that the NLuc RBS can re-fold into a more stable hairpin (left hand side of Fig. 3b) either after ribosome disruption of the Min Jou interaction, or during co-translational folding of the NLuc RBS. The presence of this hairpin is predicted to substantially reduce protein expression as indicated by its energetic barrier to melting.
In order to prevent formation of this hairpin during co-translational folding, I have synonymously re-coded the NanoLuc luciferase gene in CTnano-cnt mRNA creating the CTnano mRNA (see “Methods” for procedure). The predicted secondary structure of the CTnano mRNA when the Min Jou interaction is present/absent (right hand side of Fig. 3a and b, respectively) shows that the minimum free energy fold now predicts the start codon being sequestered in the TR hairpin, which has a much lower energetic barrier to melting (c.f. \(\Delta G_m\) in Fig. 3b), and thus should result in higher expression levels of the NLuc protein.
Theoretical prediction of NLuc protein expression
In order to theoretically estimate the levels of NLuc that would be produced from the CTnano-cnt and CTnano mRNAs, one must take into account the competition between ribosomes and coat proteins for the TR stem loop and must also calculate the time dependence on the opening of the ribosome stand-by site for the NLuc gene as a result of the translational coupling between the Coat and NLuc genes. This means that the production rate of NLuc will depend on the production rate of coat protein and the speed of the ribosome movement over the coat gene (the coupling effect), as well as the amount of coat proteins present and their ability to compete with free ribosomes for the TR stem-loop (the repression effect). As discussed in “Methods”, my computational ribosome/folding kinetics model17 is able to account for both the translational coupling as well as the competition between ribosomes and coat proteins for the TR stem-loop. This is since it predicts the co-translational folding of the CTnano mRNA and the rate of movement of the ribosome over the coat gene while also considering the translation of all ribosomes that would be present in the cellular environment, allowing for a calculation of the competition between free ribosomes and coat proteins to be estimated.
Using the secondary structures predicted for the CTnano-cnt and CTnano mRNAs (Supporting Figs. 2 and 3, respectively), I simulate the protein expression from multiple copies of either the CTnano-cnt or CTnano mRNA in the presence of the \(\approx 1200\) background cellular mRNAs and 15000 active ribosomes that would be present in an exponentially growing E. coli cell. The CTnano-cnt and CTnano mRNAs are modelled as being continuously produced with rate \(\beta \) from a plasmid vector after induction with IPTG which occurs at time point \(t=0\). The production rate \(\beta =0.012\) \(\hbox {s}^{-1}\) gives the best fit to the experimental data (see below) and results in roughly 20 copies of the mRNA being present in the cell, on average, within 30 min of induction. Figure 4a and b show the results of coat protein and NLuc synthesis in both mRNAs over the course of 30 min. While the amount of coat protein produced in both cases are roughly identical as expected, the CTnano mRNA is predicted to produce approximately 4.6 times as much nano luciferase as the CTnano-cnt mRNA prior to repression by the coat protein.
In addition to the simulation where CTnano and CTnano-cnt mRNAs code for the wild-type MS2 coat protein, I also perform a protein expression simulation of the mRNAs where the MS2 coat protein has the N55D mutation present. This coat protein mutant has been shown to be incapable of binding to the TR stem-loop34. Thus, these mutant mRNAs should not be able to repress NLuc synthesis and hence represent the maximum protein production rate that is possible. The computer simulations in Fig. 4c and d show that the maximum production rate of NLuc in the CTnano mRNA is roughly 20 times that of the CTnano-cnt mRNA, indicating that the re-coded CTnano mRNA has enhanced NLuc protein expression compared with the CTnano-cnt mRNA.
Experimental measurement of NLuc protein expression
In order to validate the theoretical predictions of NLuc production in the CTnano-cnt and CTnano mRNAs, I have constructed the plasmids pET-CTnano-cnt and pET-CTnano, where expression of the mRNAs are under the control of the T7 promoter (see “Methods”). Small cultures of E. coli cells containing one of the plasmids were grown in LB media and the resulting luminescence from a sample of cell culture was measured with the NanoGlo assay (Premega) as detailed in “Methods”. The raw data was normalised by the final \(A_{600}\) of the culture and the data was fitted to a sigmodial function (see supplementary figs. 5 and 6). The best fit peak luminescence for both the CTnano and CTnano-cnt mRNAs, along with the peak luminescence from mutant pET-CTnano and pET-CTnano-cnt plasmids containing one of the three coat mutants (N55D, T19*, and S37*), are shown in Fig. 5. As can be seen, there is a clear and significant increase in expression of NLuc in the CTnano mRNA when compared with the CTnano-cnt mRNA. Table 1 gives the luminescence / \(A_{600}\) measurements for each of the mRNAs.
NLuc is translationally coupled to MS2 coat expression
To demonstrate the theoretically predicted translational coupling in the CTnano and CTnano-cnt mRNAs, nonsense mutations can be introduced to the coat protein before and after the Min Jou sequence. It has been previously shown that nonsense mutations introduced prior to the Min Jou sequence prevents full expression of the MS2 RdRp protein, while nonsense mutations after increase RdRp expression33. Following these experiments that were done to demonstrate translational coupling in phage MS2, I have constructed the plasmids pET-CTnano-M3, which has the nonsense mutation T19* before Min Jou and pET-CTnano-M7, which has the alternative nonsense mutation S37* after Min Jou. Supporting Fig. 4 gives a cartoon diagram of the positions of these stop codons, relative to the Min Jou long-distance interaction. Table 1, and Fig. 5 show that for both CTnano and CTnano-cnt mRNAs, the T19* mutation results in a similar level of NLuc production compared to wild-type coat protein. Since the 19 amino acid N-terminal fragment of the coat protein that is produced by the T19* mutant is unable to bind to the TR stem-loop, NLuc expression would be expected to be at similar protein expression levels seen in the N55D mutant if the NanoLuc start codon was not coupled to the translation of the coat gene. The translational coupling of NLuc to coat protein is further supported by the S37* mutant, which results in much higher expression of NLuc in both CTnano and CTnano-cnt mRNAs due to disruption of the Min Jou interaction. Since the S37* mutant is also unable to bind the TR stem-loop, and disruption of the Min Jou interaction has occurred, we should expect that this mutant has NLuc expression greater then that of the wild-type. It should be noted that if translational coupling is working perfectly then we should expect that the relative luminescence of the mutants are T19* \(\le \) WT \(\le \) S37* \(\le \) N55D. However, the experiments reported here show that the T19* mutant has a luminescence roughly equal to or just slightly higher then that of the wild-type, slightly contradicting this ordering (cf. Fig. 5). One possible explanation for this observation is that since the Min Jou interaction is only \(\approx -8\) kcal/Mol, thermal fluctuations allow periodic disruption of the interaction allowing ribosomes to occasionally initiate on the NLuc gene. If translational coupling was not present, then we would expect to see experimentally that both the T19* and S37* mutants have similar luminescence to the N55D mutant, which is not the case here. This explanation, taken together with the experimental results, suggests that the NLuc gene is translationally coupled to the upstream MS2 coat protein.
NLuc is translationally repressed by MS2 coat protein
Finally, in order to demonstrate experimentally that NLuc protein production is also repressed by MS2 coat protein in the CTnano and CTnano-cnt mRNAs, I have constructed the plasmid pET-CTnano-M2 which contains the N55D mutation to the coat protein. This coat protein mutant was previously shown to have essentially no affinity for the TR hairpin34, and thus will be unable to bind to the TR stem-loop and block initiation of the ribosome on the NLuc start codon. This is clearly demonstrated in the data for both the CTnano-cnt and CTnano mRNAs, which both have a dramatic increase in NLuc production, with a larger increase seen for the CTnano mRNA. Specifically, CTnano-M2 mRNA produces roughly 22 times as much NLuc when compared with CTnano containing the WT coat protein. Likewise, CTnano-cnt-M2 mRNA produces 6.8 times as much NLuc when compared with CTnano-cnt mRNA (cf. Fig. 5 and Table 1). These experimental results are consistent with NLuc production in these mRNAs being suppressed by MS2 coat protein.
Comparison of theoretical and experimental measurements
The experiment measures the intensity of light produced by the NanoLuc enzyme for a given sample of culture while the theoretical calculations predict number of NLuc proteins produced. To compare with experiment, I assume that NanoLuc enzyme follows Michaelis-Menten kinetics and that light production will be directly proportional to the number of enzymes. Thus, experimental measurements can be compared to theoretical ones by examining the ratios. In order to fit the model to the experimental data only two parameters need to be adjusted; the rate of mRNA synthesis from the plasmid (\(\beta \)), and the size of the footprint of the 30S pre-initiation complex (30S:PIC) on the mRNA during initiation. These are the only parameters which require adjusting in the model as the remaining parameters for ribosome kinetics have been fitted from experimental measurements and for RNA folding from Turner energy parameters17,20,32.
It should be noted that it is very difficult to theoretically predict the rate of mRNA production from the plasmid as this will depend on the concentration of T7 polymerase in the cell, the number of copies of the plasmid, and mRNA degradation dynamics, amongst other factors. From a simplified perspective, the value of \(\beta \) essentially adjusts the total amount of NLuc produced in a certain time by the N55D mutant, as this mutant will be unable to repress NLuc synthesis. Thus, as \(\beta \) increases, there is a corresponding increase in the ratio of NLuc produced by the N55D coat mutant to that of the WT where the amount of NLuc will be “capped” due to repression by the coat protein. I have adjusted \(\beta \) such that the ratio of NLuc produced by the N55D mutant to that of the WT roughly matches the experimental observations (c.f. ratios in the bottom row of Fig. 6). This results in a value of \(\beta =0.012\) \(\hbox {s}^{-1}\), which corresponds to a theoretical prediction of roughly 20 mRNAs being present in a single cell (on average) 30 min post induction.
The size of the 30S:PIC footprint on the mRNA during initiation determines how much of the RNA needs to become single stranded to expose the 30S:PIC binding site. Thus, if the binding site is sequestered in a more stable RNA secondary structure, the ribosome will take longer to initiate translation resulting in reduced expression of the protein. Again from a simple perspective, altering the footprint will essentially change the amount of the alternative hairpin (present in CTnano-cnt) that requires melting, and will thus impact the ratio of NLuc produced in CTnano-cnt mRNA verses the amount from the CTnano mRNA. I have found that the ideal footprint size during 30S:PIC interaction with the RBS corresponds to + 6 (− 14) nucleotides 3’ (5’) from the A nucleotide in the start codon. This footprint is consistent with the larger footprint of the full ribosome which was estimated by an RNAase protection assay on the phage MS2 lysis gene35. This gives the best fit to the experimental NLuc protein ratios in CTnano-cnt verses CTnano mRNAs (c.f. ratios in the right hand column of Fig. 6). Interestingly, this footprint corresponds to the amount of mRNA needed to expose an ideally positioned Shine-Dalgarno sequence through to the A site codon. From a comparison to experiment, it is clear that the theoretical melting time for the CTnano-cnt hairpin needs to be slightly longer, or the TR hairpin slightly faster, in order to better match the experimental results. The source of this error is likely due to two possibilities. One possibility is that there could be slight errors in the Turner nearest neighbour model for computing RNA base-pair stacking energies, which would impact on the predicted \(\Delta G_m\) values and hairpin melting times. Alternatively, there could be inherent errors from using a breadth-first search or greedy algorithm to identify transition paths17,36, which are used in the model to compute the estimated energy barriers (i.e. \(\Delta G_m\)) to RNA melting. However, the ability of the model to get the correct qualitative behaviour of the protein expression demonstrates the importance of considering the effects of ribosome movement and competition with other cellular proteins when translational repression is present.
Discussion
Previous work of David Peabody has demonstrated that MS2 coat protein can be used to repress expression of the RdRp protein on a separate monocystronic mRNA in E coli cells in order to screen for coat protein mutants incapable of binding the TR stem-loop37. However, to my knowledge, the possibility of designing a regulatory system on a single polycistronic mRNA in which the MS2 coat protein represses production of a downstream gene other then MS2 RdRp has not been explored. Here I have demonstrated as a proof of concept that such artificial gene regulatory mRNAs can be constructed. Moreover, I have also shown that a stochastic computational model that takes into account detailed steps of ribosome kinetics17, can be utilised simulate translational coupling/repression, predict protein synthesis rates, and design mutations which enhance protein expression.
One of the consequences of translational repression mechanisms, such as that observed in bacteriophage MS2 and the synthetic mRNA constructs created here, is that full repression of the gene is delayed due to biding competition between coat dimers and the ribosome for the TR stem-loop which encompasses the start codon. Experimental binding curves from both Peabody and later Lago et al.34,38 measure a binding affinity of MS2 coat protein for the TR stem-loop to be around \(K_d=1\) nM. In the E. coli cell, this concentration corresponds to roughly a single coat protein dimer, assuming an average cell volume of roughly 1 fl = 1 \(\upmu \) \(\hbox {m}^3\). Hence one should expect the mRNA to be bound equally as often as it is unbound after synthesis of a single coat protein dimer. However, theoretical calculations reveal that NLuc synthesis continues at the initial synthesis rate until approximately 50 coat protein dimers have been synthesised. This suggests that the TR stem-loop and NLuc RBS is mostly ribosome bound until much higher concentration of coat protein have been achieved to suppress ribosome binding. This highlights the importance of modelling the competition between ribosomes and other cellular proteins for binding to mRNAs when trying to predict protein expression. Translational repression is not unique to phage MS2 and has been observed, for example, in some aminoacyl tRNA synthetases (e.g. metRS thrRS) which have also been postulated to bind to their mRNAs in order to repress protein expression and regulate aminoacyl tRNA synthetase levels in bacteria39, and the L11-L1 messenger RNA16.
Applying my ribosome/folding kinetic model to CTnano-cnt mRNA translational predicts that co-translational folding of the NLuc RBS after ribosome initiation can result in the formation of two separate hairpin structures, one in which the TR stem-loop is present, and a second more stable hairpin which also sequesters the start codon (c.f. Fig. 4 and Supplementary Fig. 7). Moreover, the energy barrier to transitioning from the TR stem-loop into the alternative hairpin is predicted to be \(\Delta G_b = 13.8\) kcal/mol, while the reverse transition is substantially higher (\(\Delta G_b = 18.1\) kcal/mol), suggesting the alternative hairpin is a kinetic trap (seeSupplementary Fig. 7). Thus, if this alternative hairpin structure is present in the RBS of CTnano-cnt mRNA, coat protein will be unable to bind to this hairpin and the ribosome will continue to express NanoLuc, although at a lower rate due to the higher stability of the hairpin. This also implies that the MS2 coat protein will have lower ability to repress NLuc expression in CTnano-cnt mRNA verses the re-coded CTnano mRNA due to the TR hairpin potentially being re-folded into the alternative hairpin. This is clearly apparent in both the experimental data and theoretical calculations (Figs. 5 and 6) which shows that, when wild-type MS2 coat protein is produced by the mRNA, NLuc expression reduces 22 fold in CTnano mRNA compared to the coat mutant N55D. However this reduction is only 7 fold in CTnano-cnt, demonstrating a reduced ability of this mRNA to repress NLuc production. Moreover, the presence of this alternative and more stable hairpin structure reduces the ability of the ribosome to melt the hairpin and access the start codon, resulting in an 8 fold reduction in NLuc expression when wild-type coat protein is produced, or a 25 fold reduction in expression when the N55D coat protein mutant is produced.
The differences observed in expression between the CTnano-cnt and CTnano mRNAs suggests it should be possible to alter the amount of protein produced to a desired level. For the example system here this could be done in two ways. One way is by altering the stability of the TR hairpin to make it more stable, reducing the rate of ribosome initiations. The second is by changing the affinity of TR for the coat protein, delaying the time point at which coat protein can suppress ribosome initiations at the NLuc start codon. This procedure should be applicable to most genes, so long as the space of synonymous mutations in the gene of interest is sufficient to alter and/or stabilize the RNA structure of the RBS. However, it may also be possible to extend the N-terminus of the protein and increase the mutational space available for RBS alteration if the protein of interest is functionally robust to such extensions.
Finally, I note that based on the experimental measurements which shows an increasing production of NLuc when nonsense mutations are introduced after the Min Jou sequence, it appears that NLuc is translational coupled to the upstream coat gene in both the CTnano-cnt and CTnano mRNAs. As discussed in the Results, if there were no translational coupling, and the ribosome had unrestricted access to the NLuc RBS, then it should be expected that expression of NLuc would be similar to that of the N55D mutant. However, this coupling is slightly leaky as the T19* mutant is still able to express a similar amount of NLuc to that of the wild-type coat protein. A simple explanation for this observation is that the Min Jou interaction is able to be disrupted by random thermal fluctuations, given that its melting temperature is on the order of \(\Delta G_m = 8.7\) kcal/mol. Currently this can not be simulated in the ribosome folding model as initiation is currently blocked from occurring at internal sites, such as those hidden by the Min Jou interaction. Additional work to the model in the future will hopefully enable such features to be simulated. Despite these small needed improvements, my computational ribosome/folding kinetics model17 in its current form presents a potentially powerful tool for the re-coding of mRNAs to enable both better protein production and the incorporation of advanced features such as built in regulatory mechanisms that control protein synthesis.
Data availability
Software for computing ribosome translation and mRNA co-translational folding is available on github.com/edykeman/ribofold. Computational input files used by the program can be found in supporting information.
References
Rohner, E., Yang, R., Foo, K. S., Goedel, A. & Chien, K. R. Unlocking the promise of mRNA therapeutics. Nat. Biotechnol. 40(11), 1586–1600 (2022).
Qin, S. et al. mRNA-based therapeutics: Powerful and versatile tools to combat diseases. Signal Transduct. Target. Ther. 7(1), 166 (2022).
Hincer, A., Ahan, R. E., Aras, E. & Seker, U. O. S. Making the next generation of therapeutics: mRNA meets synthetic biology. ACS Synth. Biol. 12(9), 2505–2515 (2023).
Wang, Y.-S. et al. mRNA-based vaccines and therapeutics: An in-depth survey of current and upcoming clinical applications. J. Biomed. Sci. 30(1), 84 (2023).
Bloom, K., van den Berg, F. & Arbuthnot, P. Self-amplifying RNA vaccines for infectious diseases. Gene Ther. 28(3–4), 117–129 (2021).
Papukashvili, D. et al. Self-amplifying RNA approach for protein replacement therapy. Int. J. Mol. Sci. 23(21), 12884 (2022).
Comes, J. D., Pijlman, G. P. & Hick, T. A. Rise of the RNA machines-self-amplification in mRNA vaccine design. Trends Biotechnol. 2023, 85 (2023).
Green, A. A., Silver, P. A., Collins, J. J. & Yin, P. Toehold switches: De-novo-designed regulators of gene expression. Cell 159(4), 925–939 (2014).
Wang, T. & Simmel, F. C. Riboswitch-inspired toehold riboregulators for gene regulation in Escherichia coli. Nucleic Acids Res. 50(8), 4784–4798 (2022).
Karagiannis, P., Fujita, Y. & Saito, H. RNA-based gene circuits for cell regulation. Proc. Jpn. Acad. Ser. B 92(9), 412–422 (2016).
Hong, S. et al. Design and evaluation of synthetic RNA-based incoherent feed-forward loop circuits. Biomolecules 11(8), 1182 (2021).
Chappell, J., Westbrook, A., Verosloff, M. & Lucks, J. B. Computational design of small transcription activating RNAs for versatile and dynamic gene regulation. Nat. Commun. 8(1), 1051 (2017).
Chappell, J., Watters, K. E., Takahashi, M. K. & Lucks, J. B. A renaissance in RNA synthetic biology: New mechanisms, applications and tools for the future. Curr. Opin. Chem. Biol. 28, 47–56 (2015).
Glasscock, C. J. et al. Dynamic control of gene expression with riboregulated switchable feedback promoters. ACS Synth. Biol. 10(5), 1199–1213 (2021).
Peabody, D. Role of the coat protein-RNA interaction in the life cycle of bacteriophage MS2. Mol. Gen. Genet. MGG 254(4), 358–364 (1997).
Cole, J. R. & Nomura, M. Translational regulation is responsible for growth-rate-dependent and stringent control of the synthesis of ribosomal proteins L11 and L1 in Escherichia coli. Proc. Natl. Acad. Sci. 83(12), 4129–4133 (1986).
Dykeman, E. C. Modelling ribosome kinetics and translational control on dynamic mRNA. PLoS Comput. Biol. 19(1), e1010870 (2023).
Groeneveld, H. Secondary structure of bacteriophage MS2 RNA: Translational control by kinetics of RNA folding, PhD thesis (1997).
Olsthoorn, R. C. L. Structure and evolution of RNA phages, PhD thesis (1996).
Mathews, D. H., Sabina, J., Zuker, M. & Turner, D. H. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288(5), 911–940 (1999).
Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31(13), 3406–3415 (2003).
Dykeman, E. C. An implementation of the Gillespie algorithm for RNA kinetics with logarithmic time update. Nucleic Acids Res. 43(12), 5708–5715 (2015).
Na, D. & Lee, D. RBSDesigner: Software for designing synthetic ribosome binding sites that yields a desired level of protein expression. Bioinformatics 26(20), 2633–2634 (2010).
Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27(10), 946–950 (2009).
Salis, H. M. The ribosome binding site calculator. In Methods in Enzymology, Vol. 498 19–42 (Elsevier, 2011).
Seo, S. W. et al. Predictive design of mRNA translation initiation region to control prokaryotic translation efficiency. Metab. Eng. 15, 67–74 (2013).
Tian, T. & Salis, H. M. A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons. Nucleic Acids Res. 43(14), 7137–7151 (2015).
Katz, N. et al. Synthetic 5’ UTRs can either up-or downregulate expression upon RNA-binding protein binding. Cell Syst. 9(1), 93–106 (2019).
Katz, N. et al. Overcoming the design, build, test bottleneck for synthesis of nonrepetitive protein-RNA cassettes. Nat. Commun. 12(1), 1576 (2021).
Vezeau, G. E., Lipika, R. G. & Salis, H. M. Automated design of protein-binding riboswitches for sensing human biomarkers in a cell-free expression system. Nat. Commun. 14(1), 2416 (2023).
Bremer, H. & Dennis, P. P. Modulation of chemical composition and other parameters of the cell at different exponential growth rates. Ecosal Plus 2013, 85. https://doi.org/10.1128/ecosal.5.2.3 (2008).
Dykeman, E. C. A stochastic model for simulating ribosome kinetics in vivo. PLoS Comput. Biol. 16(2), e1007618 (2020).
Berkhout, B. & van Duin, J. Mechanism of translational coupling between coat protein and replicase genes of RNA bacteriophage MS2. Nucleic Acids Res. 13(19), 6955–6967 (1985).
Peabody, D. S. The RNA binding site of bacteriophage MS2 coat protein. EMBO J. 12(2), 595–600 (1993).
Berkhout, B. et al. Lysis gene of bacteriophage MS2 is activated by translation termination at the overlapping coat gene. J. Mol. Biol. 195(3), 517–524 (1987).
Voss, B., Meyer, C. & Giegerich, R. Evaluating the predictability of conformational switching in RNA. Bioinformatics 20(10), 1573–1582 (2004).
Peabody, D. S. Translational repression by bacteriophage MS2 coat protein expressed from a plasmid. A system for genetic analysis of a protein-RNA interaction.. J. Biol. Chem. 265(10), 5684–5689 (1990).
Lago, H., Parrott, A. M., Moss, T., Stonehouse, N. J. & Stockley, P. G. Probing the kinetics of formation of the bacteriophage MS2 translational operator complex: Identification of a protein conformer unable to bind RNA. J. Mol. Biol. 305(5), 1131–1144 (2001).
Levi, O., Garin, S. & Arava, Y. RNA mimicry in post-transcriptional regulation by aminoacyl tRNA synthetases. Wiley Interdiscipl. Rev.: RNA 11(2), e1564 (2020).
Acknowledgements
I would like to thank Dr. Alex Borodavka, University of Cambridge, and Prof. Alfred Antson, University of York, for graciously providing lab space for me to perform my experiments.
Author information
Authors and Affiliations
Contributions
E.D. conceived the experiment(s), conducted the experiment(s), performed the computational simulations, analyzed the results, and wrote and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dykeman, E.C. Design of a self-regulating mRNA gene circuit. Sci Rep 14, 19421 (2024). https://doi.org/10.1038/s41598-024-70363-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-70363-0
- Springer Nature Limited