RNA folding kinetics using Monte Carlo and Gillespie algorithms

Clote, Peter; Bayegan, Amir H.

doi:10.1007/s00285-017-1169-7

RNA folding kinetics using Monte Carlo and Gillespie algorithms

Published: 05 August 2017

Volume 76, pages 1195–1227, (2018)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

632 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

RNA secondary structure folding kinetics is known to be important for the biological function of certain processes, such as the hok/sok system in E. coli. Although linear algebra provides an exact computational solution of secondary structure folding kinetics with respect to the Turner energy model for tiny ($\approx $20 nt) RNA sequences, the folding kinetics for larger sequences can only be approximated by binning structures into macrostates in a coarse-grained model, or by repeatedly simulating secondary structure folding with either the Monte Carlo algorithm or the Gillespie algorithm. Here we investigate the relation between the Monte Carlo algorithm and the Gillespie algorithm. We prove that asymptotically, the expected time for a K-step trajectory of the Monte Carlo algorithm is equal to $\langle N \rangle $ times that of the Gillespie algorithm, where $\langle N \rangle $ denotes the Boltzmann expected network degree. If the network is regular (i.e. every node has the same degree), then the mean first passage time (MFPT) computed by the Monte Carlo algorithm is equal to MFPT computed by the Gillespie algorithm multiplied by $\langle N \rangle $; however, this is not true for non-regular networks. In particular, RNA secondary structure folding kinetics, as computed by the Monte Carlo algorithm, is not equal to the folding kinetics, as computed by the Gillespie algorithm, although the mean first passage times are roughly correlated. Simulation software for RNA secondary structure folding according to the Monte Carlo and Gillespie algorithms is publicly available, as is our software to compute the expected degree of the network of secondary structures of a given RNA sequence—see http://bioinformatics.bc.edu/clote/RNAexpNumNbors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory

Article Open access 19 October 2016

RNA Folding: Structure Prediction, Folding Kinetics and Ion Electrostatics

RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

Article 23 May 2015

Notes

Definitions of Monte Carlo algorithm, Gillespie algorithm, detailed balance, mean first passage time, master equation, etc. are given later in the paper.
Each step of Algorithms 2, 3 and 4 involves a transition from a current state to a distinct state; however, each step of the time-driven Monte Carlo Algorithm 1 does not necessarily involve a transition to a new state, especially if the current state x has low energy, so that $p_{x,x}$ may be large. For this reason, we use the term K-transition trajectory.
Recall that secondary structures may be represented as a set of base pairs.
ME stands for the Master Equation, meaning the computation of the time-dependent population occupancy vector $\mathbf{p}(t)$ by solution of the master equation $\frac{d \mathbf{p}(t)}{dt} = \mathbf{p}(t) \cdot Q$, where Q is the rate matrix. MC stands for Monte Carlo simulation, and MMC stands for Map-based Monte Carlo simulation, a method inspired by probabilistic road map methods from robotics, where a constant k many closest neighboring secondary structures are selected for each RNA secondary structure from a sampled collection of modest size. Subsequently Monte Carlo simulation is applied to the resulting network, which is much smaller than the network of all secondary structures. The correctness of the authors’ assertion is due uniquely to the fact that the roadmap sampling network is k-regular.

References

Abkevich VI, Gutin AM, Shakhnovich EI (1996) How the first biopolymers could have evolved. Proc Natl Acad Sci USA 93(2):839–844
Article Google Scholar
Abkevich VI, Gutin AM, Shakhnovich EI (1997) Computer simulations of prebiotic evolution. Pac Symp Biocomput 27–38
Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–230
Article Google Scholar
Aviram I, Veltman I, Churkin A, Barash D (2012) Efficient procedures for the numerical simulation of mid-size RNA kinetics. Algorithms Mol Biol 7(1):24
Article Google Scholar
Bowman GR (1084) A tutorial on building markov state models with MSMBuilder and coarse-graining them with BACE. Methods Mol Biol 141–158:2014
Google Scholar
Bowman GR, Huang X, Pande VS (2009) Using generalized ensemble simulations and Markov state models to identify conformational states. Methods 49(2):197–201
Article Google Scholar
Bowman GR, Pande VS (2010) Protein folded states are kinetic hubs. Proc Natl Acad Sci USA 107(24):10890–10895
Article Google Scholar
Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins 21(3):167–195
Article Google Scholar
Choi HM, Beck VA, Pierce NA (2014) Multiplexed in situ hybridization using hybridization chain reaction. Zebrafish 11(5):488–489
Article Google Scholar
Clote P (2015) Expected degree for RNA secondary structure networks. J Comput Chem 36(2):103–17
Article Google Scholar
Clote P, Backofen R (2000) Computational molecular biology: an introduction. Wiley, Hoboken
MATH Google Scholar
Clote P, Bayegan A (2015) Network properties of the ensemble of RNA structures. PLoS ONE 10(10):e0139476
Article Google Scholar
Dotu I, Garcia-Martin JA, Slinger BL, Mechery V, Meyer MM, Clote P (2015) Complete RNA inverse folding: computational design of functional hammerhead ribozymes. Nucleic Acids Res 42(18):11752–11762
Article Google Scholar
Dykeman EC (2015) An implementation of the Gillespie algorithm for RNA kinetics with logarithmic time update. Nucleic Acids Res 43(12):5708–5715
Article Google Scholar
Esmaili-Taheri A, Ganjtabesh M (2015) ERD: a fast and reliable tool for RNA design including constraints. BMC Bioinform 16(1):20
Article Google Scholar
Feller W (1968) An introduction to probability theory and its applications, vol 1, 3rd edn. Wiley, New York
Flamm C, Fontana W, Hofacker IL, Schuster P (2000) RNA folding at elementary step resolution. RNA 6:325–338
Article Google Scholar
Fusy E, Clote P (2012) Combinatorics of locally optimal RNA secondary structures. J Math Biol 68(1–2):341–75
Garcia-Martin JA, Clote P, Dotu I (2013) RNAiFold: A constraint programming algorithm for RNA inverse folding and molecular design. J Bioinform Comput Biol 11(2):1350001. doi:10.1142/S0219720013500017
Article Google Scholar
Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, Bateman A (2011) Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res 39(Database):D141–D145
Article Google Scholar
Geis M, Flamm C, Wolfinger MT, Tanzer A, Hofacker IL, Middendorf M, Mandl C, Stadler PF, Thurner C (2008) Folding kinetics of large RNAs. J Mol Biol 379(1):160–173
Article Google Scholar
Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22(403):403–434
Article MathSciNet Google Scholar
Harrigan MP, Sultan MM, Hernandez CX, Husic BE, Eastman P, Schwantes CR, Beauchamp KA, McGibbon RT, Pande VS (2017) MSMBuilder: statistical models for biomolecular dynamics. Biophys J 112(1):10–15
Article Google Scholar
Hastings WK (1970) Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1):97–109
Article MathSciNet MATH Google Scholar
Huang X, Yao Y, Bowman GR, Sun J, Guibas LJ, Carlsson G, Pande VS (2010) Constructing multi-resolution Markov State Models (MSMs) to elucidate RNA hairpin folding mechanisms. Pac Symp Biocomput 228–239
Huynen M, Gutell R, Konings D (1997) Assessing the reliability of RNA folding using statistical mechanics. J Mol Biol 267(5):1104–1112
Article Google Scholar
Tinoco I Jr, Schmitz M (2000) Thermodynamics of formation of secondary structure in nucleic acids. In: Di Cera E (ed) Thermodynamics in biology. Oxford University Press, Oxford, pp 131–176
Google Scholar
Isambert H, Siggia ED (2000) Modeling RNA folding paths with pseudoknots: application to hepatitis $delta$ virus ribozyme. Proc Natl Acad Sci USA 97(12):6515–6520
Article Google Scholar
Juhling F, Morl M, Hartmann RK, Sprinzl M, Stadler PF, Putz J (2009) tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res 37(Database):D159–D162
Article Google Scholar
Klemm K, Flamm C, Stadler PF (2008) Funnels in energy landscapes. Eur Phys J B 63(3):387–391
Levinthal C (1968) Are there pathways for protein folding? Journal de Chimie Physique 65:44–45
Article Google Scholar
Levy RM, Dai W, Deng NJ, Makarov DE (2013) How long does it take to equilibrate the unfolded state of a protein? Protein Sci 22(11):1459–1465
Article Google Scholar
Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011) Viennarna package 2.0. Algorithms Mol Biol 6:26
Article Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Article Google Scholar
Meyer CD (1975) The role of the group inverse in the theory of finite Markov chains. SIAM Rev 17(46):443–464
Article MathSciNet MATH Google Scholar
Moulton V, Zuker M, Steel M, Pointon R, Penny D (2000) Metrics on RNA secondary structures. J Comput Biol 7:277–292
Article Google Scholar
Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD (2015) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 43(Database):D130–D137
Article Google Scholar
Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11):6309–6313
Article Google Scholar
Pande VS, Beauchamp K, Bowman GR (2010) Everything you wanted to know about Markov State models but were afraid to ask. Methods 52(1):99–105
Article Google Scholar
Šali A, Shakhnovich E, Karplus M (1994a) How does a protein fold? Nature 369(6477):248–251
Article Google Scholar
Šali A, Shakhnovich E, Karplus M (1994b) Kinetics of protein folding: a lattice model study of the requirements for folding to the native state. J Mol Biol 235:1614–1636
Article Google Scholar
Schmitz M, Steger G (1996) Description of RNA folding by “simulated annealing”. J Mol Biol 255(1):254–266
Article Google Scholar
Senter E, Clote P (2015) Fast, approximate kinetics of RNA folding. J Comput Biol 22(2):124–144
Article MathSciNet MATH Google Scholar
Senter E, Sheikh S, Dotu I, Ponty Y, Clote P (2012) Using the fast Fourier transform to accelerate the computational search for RNA conformational switches. PLoS ONE 7(12):e50506
Article Google Scholar
Senter E, Dotu I, Clote P (2015) RNA folding pathways and kinetics using 2D energy landscapes. J Math Biol 70(1–2):173–196
Article MathSciNet MATH Google Scholar
Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discrete Math 26:261–272
Article MathSciNet MATH Google Scholar
Swope WC, Pitera JW, Suits F (2010) Describing protein folding kinetics by molecular dynamics simulations. 1. Theory. J Phys Chem B 108(21):6571–6581 :228-39
Article Google Scholar
Taneda A (2011) MODENA: a multi-objective RNA inverse folding. Adv Appl Bioinform Chem 4:1–12
Google Scholar
Tang X, Thomas S, Tapia L, Amato NM (2007) Tools for simulating and analyzing RNA folding kinetics.In: Speed T, Huang H (eds) Research in computational molecular biology. RECOMB 2007. Lecture notes in computer science, vol 4453. Springer, Berlin, Heidelberg, pp 268–282
Tang X, Thomas S, Tapia L, Giedroc DP, Amato NM (2008) Simulating RNA folding kinetics on approximated energy landscapes. J Mol Biol 381(4):1055–1067
Article Google Scholar
Turner DH, Mathews DH (2010) NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38(Database):D280–D282
Article Google Scholar
Weber JK, Pande VS (2011) Characterization and rapid sampling of protein folding Markov state model topologies. J Chem Theory Comput 7(10):3405–3411
Article Google Scholar
Wolfinger M, Svrcek-Seiler WA, Flamm C, Stadler PF (2004a) Efficient computation of RNA folding dynamics. J Phys A Math Gen 37:4731–4741
Article MathSciNet MATH Google Scholar
Wolfinger MT, Svrcek-Seiler WA, Flamm C, Hofacker IL, Stadler PF (2004b) Efficient folding dynamics of RNA secondary structures. J Phys A Math Gen 37:4731–4741
Article MATH Google Scholar
Wuchty S, Fontana W, Hofacker IL, Schuster P (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49:145–165
Article Google Scholar
Xu X, Yu T, Chen SJ (2016) Understanding the kinetic mechanism of RNA single base pair formation. Proc Natl Acad Sci USA 113(1):116–121
Article Google Scholar
Zadeh JN, Wolfe BR, Pierce NA (2011) Nucleic acid sequence design via efficient ensemble defect optimization. J Comput Chem 32(3):439–452
Article Google Scholar
Zhang W, Chen SJ (2002) RNA hairpin-folding kinetics. Proc Natl Acad Sci USA 99(4):1931–1936
Article Google Scholar
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148
Article Google Scholar
Zuker M, Sankoff D (1984) RNA secondary structures and their prediction. Bull Biol 46(4):591–621
Article MATH Google Scholar

Download references

Acknowledgements

We would like to thank anonymous referees for helpful remarks. This research was begun during a visit to California Institute of Technology, Free University of Berlin and the Max Planck Institute for Molecular Genetics. For discussions, P.C. would like to thank Frank Noé, Knut Reinert, Martin Vingron and Marcus Weber (Berlin) and Niles Pierce, Erik Winfree (Pasadena). The research was funded by a Guggenheim Fellowship, Deutscher Akademischer Austauschdienst (DAAD), and by National Science Foundation Grant DBI-1262439. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA
Peter Clote & Amir H. Bayegan

Authors

Peter Clote
View author publications
You can also search for this author in PubMed Google Scholar
Amir H. Bayegan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Clote.

Additional information

Research supported by National Science Foundation Grant DBI-1262439.

Appendix

1.1 A Markov chain of RNA secondary structures and detailed balance

In this section, we prove that detailed balance does not hold for the Markov chain $\mathcal {M}_1$ of secondary structures for an RNA sequence, where transition probabilities are defined by Eq. (2). The failure of detailed balance is because for almost any RNA sequence, there are secondary structures x, y that are neighbors of each other, but number N(x) of neighbors of x does not equal the number N(y) of neighbors of y. If every secondary structure of a given RNA sequence has the same number of neighbors, then it is easy to show that the Boltzmann distribution $(\pi _1,\ldots ,\pi _n)$ is the stationary distribution, and a small calculation then shows that detailed balance holds. If different structures have different numbers of neighbors, then the Boltzmann distribution is not in general the stationary distribution. To produce an example where detailed balance fails, we must compute the stationary distribution.

For convenience in following the computation, we first present a simple example using the Nussinov energy model, in which the energy E(s) of a structure is defined to be $-1$ multiplied by the number of base pairs. Historically, the Nussinov algorithm and energy model (Nussinov and Jacobson 1980) predate the Zuker algorithm (Zuker and Stiegler 1981) and Turner energy model (Turner and Mathews 2010).

Example 3

(Detailed balance does not hold for Nussinov energy model) Consider the 6-nt RNA sequence GGGCCC of length 6, which has only four secondary structures, each having N(s) neighbors, given in dot-bracket notation as follows:

1.
$\bullet \bullet \bullet \bullet \bullet \bullet $: the empty structure $s_1$ with no base pairs and $N(s_1)=3$ neighbors.
2.
$\,{\texttt {(}}\,\bullet \bullet \bullet \bullet \,{\texttt {)}}\,$: the structure $s_2$ with base pair (1, 6) and $N(s_2)=1$ neighbor.
3.
$\,{\texttt {(}}\,\bullet \bullet \bullet \,{\texttt {)}}\,\bullet $: the structure $s_3$ with base pair (1, 5) and $N(s_3)=1$ neighbor.
4.
$\bullet \,{\texttt {(}}\,\bullet \bullet \bullet \,{\texttt {)}}\,$: the structure $s_4$ with base pair (2, 6) and $N(s_4)=1$ neighbor.

The transition probability matrix M is given as follows, where we compute the values to 20 decimal places using Eq. (2), but display only 4 decimal places for notational convenience:

$$\begin{aligned} M&= \left( \begin{array}{llll} 0.0000&{} \quad 0.333\overline{3} &{}\quad 0.333\overline{3} &{}\quad 0.333\overline{3}\\ 0.1974&{} \quad 0.8026&{} \quad 0.0000 &{}\quad 0.0000\\ 0.1974&{}\quad 0.8026&{} \quad 0.0000 &{}\quad 0.0000\\ 0.1974&{} \quad 0.8026&{}\quad 0.0000 &{}\quad 0.0000 \end{array} \right) \end{aligned}$$

(21)

Let $M^{\infty } = \lim \nolimits _{n \rightarrow \infty } M^n$. The stationary probability distribution for M must satisfy $M (p_1^*, p_2^*, p_3^*, p_4^*)^T = (p_1^*, p_2^*, p_3^*, p_4^*)^T$, where T denotes transpose, and so

$$\begin{aligned} M^{\infty }&= \left( \begin{array}{llll} p_1^*&{}\quad p_2^*&{} \quad p_3^* &{}\quad p_4^*\\ p_1^*&{}\quad p_2^*&{}\quad p_3^* &{}\quad p_4^*\\ p_1^*&{}\quad p_2^*&{} \quad p_3^* &{}\quad p_4^*\\ p_1^*&{} \quad p_2^*&{}\quad p_3^* &{}\quad p_4^* \end{array} \right) \end{aligned}$$

Using the actual 20-place values of the transition probability matrix M, whose 4-place approximations are given in Eq. (21), we obtain the millionth power of $M^{(10^6)}$ by a Mathematica computation. The computed 16-place values are shown to 4-place accuracy in the following:

$$\begin{aligned} M^{(10^6)}&= \left( \begin{array}{llll} 0.1648&{} \quad 0.2784&{}\quad 0.2784&{}\quad 0.2784\\ 0.1648&{}\quad 0.2784&{} \quad 0.2784&{}\quad 0.2784\\ 0.1648&{}\quad 0.2784&{} \quad 0.2784&{} \quad 0.2784\\ 0.1648&{}\quad 0.2784&{} \quad 0.2784&{} \quad 0.2784\\ \end{array} \right) \end{aligned}$$

so that the 4-place approximations of the stationary probabilities are as follows: $p_1^* = 0.1648$, $p_2^* = 0.2784$, $p_3^* = 0.2784$, $p_4^* = 0.2784$. We then compute that $p_1^* \cdot p_{1,2} = 0.0550$, while $p_2^* \cdot p_{2,1} = 0.0617$, which shows that detailed balance does not hold.

Using the Turner energy model (Turner and Mathews 2010), the same example yields a different transition probability matrix (not shown), whose millionth power is as follows:

$$\begin{aligned} M^{(10^6)} = \left( \begin{array}{llll} 0.9994655553038468&{} \quad 0.000430085352242845&{}\quad 0.0000521796443227893&{}\quad 0.0000521796443227893\\ 0.9994655553038468&{}\quad 0.000430085352242845&{}\quad 0.0000521796443227893&{} \quad 0.0000521796443227893\\ 0.9994655553038468&{}\quad 0.000430085352242845&{}\quad 0.0000521796443227893&{}\quad 0.0000521796443227893\\ 0.9994655553038468&{}\quad 0.000430085352242845&{}\quad 0.0000521796443227893&{}\quad 0.0000521796443227893 \end{array} \right) \end{aligned}$$

We find that $p_1^* =0.9994655553038468164 \approx 0.9995$, $p_2^* =0.000430315331989701045645180244 \approx 0.0004$, $p_{1,2}=0.001288878388504780024906293256 \approx 0.0013$, $p_{2,1}=0.061737166286699882156163710079 \approx 0.617$, and that $p_1^* \cdot p_{1,2} = 0.0012881895542860573 \approx 0.00129$ while $p_2^* \cdot p_{2,1} = 0.00002656644920676464 \approx 0.00003$. It follows that detailed balance does not hold for the Turner energy model. For RNA sequences of length n, there are exponentially many ($\approx $1.95947 $ \cdot n^{-3/2} \cdot 1.86603^n$) secondary structures, so similar computations cannot be carried out, and it may be the case that detailed balance “approximately” holds.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clote, P., Bayegan, A.H. RNA folding kinetics using Monte Carlo and Gillespie algorithms. J. Math. Biol. 76, 1195–1227 (2018). https://doi.org/10.1007/s00285-017-1169-7

Download citation

Received: 02 October 2016
Revised: 09 July 2017
Published: 05 August 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s00285-017-1169-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

RNA folding kinetics using Monte Carlo and Gillespie algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory

RNA Folding: Structure Prediction, Folding Kinetics and Ion Electrostatics

RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 A Markov chain of RNA secondary structures and detailed balance

Example 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

RNA folding kinetics using Monte Carlo and Gillespie algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory

RNA Folding: Structure Prediction, Folding Kinetics and Ion Electrostatics

RNA secondary structures in a polymer-zeta model how foldings should be shaped for sparsification to establish a linear speedup

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 A Markov chain of RNA secondary structures and detailed balance

Example 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation