Combinatorics of locally optimal RNA secondary structures

Fusy, Éric; Clote, Peter

doi:10.1007/s00285-012-0631-9

Combinatorics of locally optimal RNA secondary structures

Published: 22 December 2012

Volume 68, pages 341–375, (2014)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

Éric Fusy¹ &
Peter Clote²

296 Accesses
9 Citations
Explore all metrics

Abstract

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is \(1.104366 \cdot n^{-3/2} \cdot 2.618034^n\). Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes \(-1\) towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are \(1.07427\cdot n^{-3/2} \cdot 2.35467^n\) many saturated structures for a sequence of length \(n\). In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes \(-1\) toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

The Rainbow Spectrum of RNA Secondary Structures

Article 14 March 2018

RNA Secondary Structures with Given Motif Specification: Combinatorics and Algorithms

Article 13 February 2023

A New Approximation Algorithm for the Maximum Stacking Base Pairs Problem from RNA Secondary Structures Prediction

Notes

If the energy \(E(S)=0\) or if the temperature \(T=+\infty \), then the partition function is exactly equal to the number of secondary structures.
Jean Gaston Darboux (1842–1917).
The shape of a secondary structure was defined by Voss et al. (2006) to represent its branching topology; for instance, the shape of the well-known clover-leaf structure of tRNA is \(\,\mathtt [ \,\,\mathtt [ \,\,\mathtt ] \,\,\mathtt [ \,\,\mathtt ] \,\,\mathtt [ \,\,\mathtt ] \,\,\mathtt ] \,\). The asymptotic number of shapes for a length \(n\) sequence yields the run time for the Giegerich Lab software RNAshapes on length \(n\) sequences, since Steffen et al. (2006) report that RNAshapes runs in time \(O(n^3 k s)\) for \(s\) sequences, each of length at most \(n\) and \(k\) shapes.
To the best of our knowledge, UNAFOLD is currently the only software that computes the partition function over all secondary structures in a mathematically rigorous manner.
This is done, for instance, in grammar \(G_4\) by replacing the rule \(\bullet _{\ge \theta } \rightarrow \bullet \) by \(\bullet _{\ge \theta } \rightarrow \bullet ^{\theta }\), where \(\bullet ^{\theta }\) consists of \(\theta \) occurrences of \(\bullet \).
Our grammar \(G_1\) is equivalent to the “tree grammar nussinov78” from Steffen and Giegerich (2005).
It is clear that the number of structures equals the partition function \(\sum _{S} \exp (-E(S)/RT)\) provided that \(E(S)=0\).
Alternatively, and more simply, we could have produced this curve from the Taylor coefficients of the expressions to the right of the limit in equations (3) and (6), after first solving for \(S(z,u)\) [resp. \(S(z,u,v)\)] in equation (1) [resp. (4)].
Fig. 2
Theoretical melting curve for two simple energy models of RNA secondary structure. Temperature in Celsius is given on the \(x\)-axis, while expected number of base pairs is given on the \(y\)-axis. We implemented an algorithm, using dynamic programming, with run time \(O(n^5)\) and space \(O(n^3)\), to compute the partition function \(Z_k = \sum _{S \in \mathbb S _k} \exp (-E(S)/RT)\), where \(\mathbb ( S)_k\) denotes the set of all secondary structures for a homopolymer of length 100 nt, having exactly \(k\) base pairs. The expected number of base pairs is thus \(\sum _k k \cdot p_k\), where \(p_k = \frac{Z_k}{Z}\) denotes the probability that a secondary structure has \(k\) base pairs, and \(Z\) denotes the full partition function \(Z = \sum _{S} \exp (-E(S)/RT) = \sum _k Z_k\). (Alternatively, and more simply, we could have produced this curve from the Taylor coefficients of the expressions to the right of the limit in equations (3) and (6), after first solving for \(S(z,u)\) [resp. \(S(z,u,v)\)] in equation (1) [resp. (4)].) In the Nussinov–Jacobson energy model (Nussinov and Jacobson 1980), \(E(S)\) is defined to be \(-1\cdot |S|\); i.e. \(-1\) times the number of base pairs of \(S\). In the base stacking energy model, \(E(S)\) is defined to be \(-1\) times the number of stacked base pairs of \(S\). Although both models are quite similar, we see that the melting curves are indeed different, where the base stacking model entails more cooperative folding (see Dill and Bromberg 2002 for discussion of cooperative folding)
Full size image
We use the subscript notation for partial derivatives.
Exact base stacking parameters are ignored as is entropy; however, the context-free grammar allows the separate marking of distinct features, such as stacked base pairs, hairpins, bulges, internal loops, multiloops.
Sheikh et al. (2012) show that minimum energy pseudoknotted structure prediction is NP-complete, in contrast with the existence of a cubic time algorithm for the Nussinov energy model (Tabaska et al. 1998).

References

Bender EA (1973) Central and local limit theorem applied to asymptotic enumeration. J Combin Theory Ser A 15:91–111
Article MATH Google Scholar
Clote P (2005) An efficient algorithm to compute the landscape of locally optimal RNA secondary structures with respect to the Nussinov-Jacobson energy model. J Comput Biol 12(1):83–101
Article MathSciNet Google Scholar
Clote P (2006) Combinatorics of saturated secondary structures of RNA. J Comput Biol 13(9):1640–1657
Article MathSciNet Google Scholar
Clote P, Kranakis E, Krizanc D, Salvy B (2009) Asymptotics of canonical and saturated RNA secondary structures. J Bioinform Comput Biol 7(5):869–893
Article Google Scholar
Clote P, Dobrev S, Dotu I, Kranakis E, Krizanc D, Urrutia J (2012) On the page number of RNA secondary structures with pseudoknots. J Math Biol 65:1337–1257
Article MATH MathSciNet Google Scholar
Dill KA, Bromberg S (2002) Molecular driving forces: statistical thermodynamics in chemistry and biology. Garland Publishing Inc., New York
Drmota M (1997) Systems of functional equations. Random Struct Algorithms 10(1–2):103–124
Article MATH MathSciNet Google Scholar
Drmota M, Fusy É, Jué J, Kang M, Kraus V (2011) Asymptotic study of subcritical graph classes. SIAM J Discrete Math 25(4):1615–1651
Article MATH MathSciNet Google Scholar
Evers DJ, Giegerich R (2001) Reducing the conformation space in RNA structure prediction. In: German conference on bioinformatics (GCB’01), pp 1–6
Flajolet P, Odlyzko A (1990) Singularity analysis of generating functions. SIAM J Discrete Math 3(2):216–240
Article MATH MathSciNet Google Scholar
Flajolet P, Sedgewick R (2009) Analytic combinatorics. Cambridge University Press, Cambridge
Book MATH Google Scholar
Harer J, Zagier D (1986) The Euler characteristic of the moduli space of curves. Invent Math 85(3):457–485
Article MATH MathSciNet Google Scholar
Haslinger C, Stadler PF (1999) RNA structures with pseudo-knots: graph-theoretical, combinatorial, and statistical properties. Bull Math Biol 61(3):437–467
Article Google Scholar
Henrici P (1991) Applied and computational complex analysis, vol 2. Wiley Classics Library, Wiley, New York
MATH Google Scholar
Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13):3429–3431
Article Google Scholar
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatsch Chem 125:167–188
Article Google Scholar
Hofacker IL, Schuster P, Stadler PF (1998) Combinatorics of RNA secondary structures. Discr Appl Math 88:207–237
Article MATH MathSciNet Google Scholar
Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31(13):3423–3428
Article Google Scholar
Li TJ, Reidys CM (2012) Combinatorics of RNA-RNA interaction. J Math Biol 64(3):529–556
Article MATH MathSciNet Google Scholar
Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1):31–63
Article MathSciNet Google Scholar
Lyngso RB, Pedersen CN (2000) RNA pseudoknot prediction in energy-based models. J Comput Biol 7(3–4):409–427
Article Google Scholar
Markham NR (2006) Algorithms and software for nucleic acid sequences. PhD, Rensselaer Polytechnic Institute, under the direction of M. Zuker
Markham NR, Zuker M (2008) UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol 453:3–31
Article Google Scholar
Matthews DH, Sabina J, Zuker M, Turner DH (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911–940
Article Google Scholar
McCaskill JS (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105–1119
Article Google Scholar
Meir A, Moon JW (1989) On an asymptotic method in enumeration. J Combin Theory Ser A 51:77–89
Article MATH MathSciNet Google Scholar
Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11):6309–6313
Article Google Scholar
Pemantle R, Wilson MC (2008) Twenty combinatorial examples of asymptotics derived from multivariate generating functions. SIAM Rev 50(2):199–272
Article MATH MathSciNet Google Scholar
Rodland EA (2006) Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence. J Comput Biol 13(6):1197–1213
Article MathSciNet Google Scholar
Saule C, Regnier M, Steyaert JM, Denise A (2011) Counting RNA pseudoknotted structures. J Comput Biol 18(10):1339–1351
Article MathSciNet Google Scholar
Sheikh S, Backofen R, Ponty Y (2012) Impact of the energy model on the complexity of RNA folding with pseudoknots. In: Lecture notes in computer science, vol 7354. 23rd Annual symposium on combinatorial pattern matching, CPM 2012, Helsinki, Finland, pp 321–333
Steffen P, Giegerich R (2005) Versatile and declarative dynamic programming using pair algebras. BMC Bioinformatics 6:224
Article Google Scholar
Steffen P, Voss B, Rehmsmeier M, Reeder J, Giegerich R (2006) RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4):500–503
Article Google Scholar
Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discrete Math 26:261–272
Article MathSciNet Google Scholar
Tabaska JE, Cary RE, Gabow HN, Stormo GD (1998) An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics 14:691–699
Article Google Scholar
Vernizzi G, Orland H, Zee A (2005) Enumeration of RNA structures by matrix models. Phys Rev Lett 94(16):168103
Article Google Scholar
Voss B, Giegerich R, Rehmsmeier M (2006) Complete probabilistic analysis of RNA shapes. BMC Biol 4(1):5–27
Article Google Scholar
Waterman MS (1978) Secondary structure of single-stranded nucleic acids. Stud Found Combin: Adv Math Supplem Stud 1:167–212
MathSciNet Google Scholar
**a T Jr, SantaLucia J, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH (1999) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37:14719–14735
Article Google Scholar
Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1):292–299
Article Google Scholar
Zuker M (1986) RNA folding prediction: the continued need for interaction between biologists and mathematicians. Lect Math Life Sci 17:87–124
MathSciNet Google Scholar
Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13):3406–3415
Article Google Scholar
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148
Article Google Scholar
Zuker M, Mathews DH, Turner DH (1999) Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: Barciszewski J, Clark BFC (eds) RNA biochemistry and biotechnology. NATO ASI series. Kluwer, Dordrecht, pp 11–43
Chapter Google Scholar

Download references

Acknowledgments

Figure 1 was created by W.A. Lorenz and H. Jabbari. We would like to thank the anonymous referees for their helpful comments. É. Fusy is supported by the European project ExploreMaps—-ERC StG 208471. P. Clote is supported by the National Science Foundation under grants DBI-0543506 and DMS-0817971, and by Digiteo Foundation project RNAomics. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Laboratoire d’Informatiques (LIX), Ecole Polytechnique, 91128 , Palaiseau, France
Éric Fusy
Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA
Peter Clote

Authors

Éric Fusy
View author publications
You can also search for this author in PubMed Google Scholar
Peter Clote
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Éric Fusy or Peter Clote.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1 (PDF 85KB)

ESM 2 (PDF 83KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fusy, É., Clote, P. Combinatorics of locally optimal RNA secondary structures. J. Math. Biol. 68, 341–375 (2014). https://doi.org/10.1007/s00285-012-0631-9

Download citation

Received: 18 December 2011
Revised: 19 November 2012
Published: 22 December 2012
Issue Date: January 2014
DOI: https://doi.org/10.1007/s00285-012-0631-9

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Combinatorics of locally optimal RNA secondary structures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Rainbow Spectrum of RNA Secondary Structures

RNA Secondary Structures with Given Motif Specification: Combinatorics and Algorithms

A New Approximation Algorithm for the Maximum Stacking Base Pairs Problem from RNA Secondary Structures Prediction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

ESM 1 (PDF 85KB)

ESM 2 (PDF 83KB)

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Combinatorics of locally optimal RNA secondary structures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Rainbow Spectrum of RNA Secondary Structures

RNA Secondary Structures with Given Motif Specification: Combinatorics and Algorithms

A New Approximation Algorithm for the Maximum Stacking Base Pairs Problem from RNA Secondary Structures Prediction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

ESM 1 (PDF 85KB)

ESM 2 (PDF 83KB)

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation