Log in

Self-Playing RNA Inverse Folding

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Ribonucleic acid (RNA) sequence controls many biological functions in the body of living organisms including protein synthesis and gene regulations. The sequence folds according to the low Minimum Free Energy (MFE) which enables it to achieve various roles. Given a target RNA structure, an RNA sequence can be designed to fold accordingly in a process called RNA Inverse Folding. In this paper, we present an RNA Inverse Folding model that performs a single-step look-ahead to select optimal RNA base and base pairs to design RNA sequences. Our proposed model (SPRNA) learns to accurately design RNA sequences based on self-improving policy function. The model recorded state-of-the-art results on two test datasets and very competitive results on others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Algorithm 4
Fig. 4
Algorithm 5
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

All the datasets used in this work are publicly available. All the references for training and testing datasets are provided in Table 1.

References

  1. Akiba T, Sano S, Yanase T, et al. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, p. 2623–2631.

  2. Akutsu T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl Math. 2000;104(1–3):45–62.

    Article  MathSciNet  Google Scholar 

  3. Alberts B, Johnson A, Lewis J, et al. Molecular motors.  Molecular biology of the cell. 4th ed. New York: Garland Science; 2002.

    Google Scholar 

  4. Anderson JW, Sizikova E, Badugu A, et al. FRNAkenstein: multiple target inverse RNA folding. BMC Bioinformatics. 2012;13:1–12.

    Google Scholar 

  5. Anderson-Lee J, Fisker E, Kosaraju V, et al. Principles for predicting RNA secondary structure design difficulty. J Mol Biol. 2016;428(5):748–57.

    Article  Google Scholar 

  6. Anthony T, Tian Z, Barber D. Thinking fast and slow with deep learning and tree search. Advances in Neural Information Processing Systems 2017;30:5366–5376.

    Google Scholar 

  7. Avihoo A, Churkin A, Barash D. RNAexinv: an extended inverse RNA folding from shape and physical attributes to sequences. BMC Bioinformatics. 2011;12:1–8.

    Article  Google Scholar 

  8. Ba JL, Kiros JR, Hinton GE. Layer normalization. 2016. ar**v preprint ar**v:1607.06450

  9. Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics. 1957;6:679–84.

    MathSciNet  Google Scholar 

  10. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  11. Bringmann K, Grandoni F, Saha B, et al. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM J Comput. 2019;48(2):481–512.

    Article  MathSciNet  Google Scholar 

  12. Busch A, Backofen R. Info-RNA—a fast approach to inverse RNA folding. Bioinformatics. 2006;22(15):1823–31.

    Article  Google Scholar 

  13. Cazenave T. Nested Monte-Carlo search. In: Twenty-First International Joint Conference on Artificial Intelligence, 2009.

  14. Cazenave T, Fournier T. Monte Carlo inverse folding. In: Monte Carlo Search: First Workshop, MCS 2020, Held in Conjunction with IJCAI 2020, Virtual Event, January 7, 2021, Proceedings 1. Springer; 2021. p. 84–99.

  15. Chua K, Calandra R, McAllister R, et al. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems 2018;31:4759–4770.

    Google Scholar 

  16. Churkin A, Retwitzer MD, Reinharz V, et al. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinformatics 2018;19(2):350–8.

    Google Scholar 

  17. Cleaves HJJ, et al. Watson–Crick pairing. Encyclopedia of astrobiology. 2015. p. 2650.

  18. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.

    Article  Google Scholar 

  19. Crick F. Central dogma of molecular biology. Nature. 1970;227(5258):561–3.

    Article  Google Scholar 

  20. Deisenroth M, Rasmussen CE. Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), 2011, p. 465–72.

  21. Doherty EA, Batey RT, Masquida B, et al. A universal mode of helix packing in RNA. Nat Struct Biol. 2001;8(4):339–43.

    Article  Google Scholar 

  22. Dromi N, Avihoo A, Barash D. Reconstruction of natural RNA sequences from RNA shape, thermodynamic stability, mutational robustness, and linguistic complexity by evolutionary computation. J Biomol Struct Dyn. 2008;26(1):147–61.

    Article  Google Scholar 

  23. Drory Retwitzer M, Reinharz V, Ponty Y, et al. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res. 2016;44(W1):W308–14.

    Article  Google Scholar 

  24. Eastman P, Shi J, Ramsundar B, et al. Solving the RNA design problem with reinforcement learning. PLoS Comput Biol. 2018;14(6): e1006176.

    Article  Google Scholar 

  25. Esmaili-Taheri A, Ganjtabesh M. ERD: a fast and reliable tool for RNA design including constraints. BMC Bioinform. 2015;16(1):1–11.

    Article  Google Scholar 

  26. Esmaili-Taheri A, Ganjtabesh M, Mohammad-Noori M. Evolutionary solution for the RNA design problem. Bioinformatics. 2014;30(9):1250–8.

    Article  Google Scholar 

  27. Gao JZ, Li LY, Reidys CM. Inverse folding of RNA pseudoknot structures. Algorithms Mol Biol. 2010;5:1–19.

    Article  Google Scholar 

  28. Garcia-Martin JA, Clote P, Dotu I. RNAifold: a constraint programming algorithm for RNA inverse folding and molecular design. J Bioinform Comput Biol. 2013;11(02):1350001.

    Article  Google Scholar 

  29. Garcia-Martin JA, Dotu I, Clote P. RNAifold 2.0: a web server and software to design custom and RFAM-based RNA molecules. Nucleic Acids Res. 2015;43(W1):W513–21.

    Article  Google Scholar 

  30. Griffiths-Jones S, Bateman A, Marshall M, et al. Rfam: an RNA family database. Nucleic Acids Res. 2003;31(1):439–41.

    Article  Google Scholar 

  31. Hamada M, Kiryu H, Sato K, et al. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics. 2009;25(4):465–73.

    Article  Google Scholar 

  32. Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29(2):147–60.

    Article  MathSciNet  Google Scholar 

  33. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, p. 770–8.

  34. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst. 1998;6(02):107–16.

    Article  Google Scholar 

  35. Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31(13):3429–31.

    Article  Google Scholar 

  36. Hofacker IL, Fontana W, Stadler PF, et al. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie/Chem Mon. 1994;125(2):167–88.

    Article  Google Scholar 

  37. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2(5):359–66.

    Article  Google Scholar 

  38. Ieong S, Kao MY, Lam TW, et al. Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs. J Comput Biol. 2003;10(6):981–95.

    Article  Google Scholar 

  39. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. PMLR; 2015. p. 448–56.

  40. Isaacs FJ, Dwyer DJ, Ding C, et al. Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol. 2004;22(7):841–7.

    Article  Google Scholar 

  41. Jain S, Tao Y, Schlick T. Inverse folding with RNA-as-graphs produces a large pool of candidate sequences with target topologies. J Struct Biol. 2020;209(3): 107438.

    Article  Google Scholar 

  42. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. 2016. ar**v preprint ar**v:1609.02907.

  43. Kleinkauf R, Houwaart T, Backofen R, et al. antaRNA-multi-objective inverse folding of pseudoknot RNA using ant-colony optimization. BMC Bioinform. 2015;16(1):1–7.

    Article  Google Scholar 

  44. Kleinkauf R, Mann M, Backofen R. antaRNA: ant colony-based RNA sequence design. Bioinformatics. 2015;31(19):3114–21.

    Article  Google Scholar 

  45. Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning. In: European Conference on Machine Learning. Springer; 2006. p. 282–93.

  46. Levin A, Lis M, Ponty Y, et al. A global sampling approach to designing and reengineering RNA secondary structures. Nucleic Acids Res. 2012;40(20):10041–52.

    Article  Google Scholar 

  47. Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015. ar**v preprint ar**v:1509.02971.

  48. Lorenz R, Bernhart SH, Hönerzu Siederdissen C, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011;6:1–14.

    Article  Google Scholar 

  49. Lyngsø RB, Pedersen CN. RNA pseudoknot prediction in energy-based models. J Comput Biol. 2000;7(3–4):409–27.

    Article  Google Scholar 

  50. MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 1967, p. 281–97.

  51. Mathews DH, Disney MD, Childs JL, et al. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci. 2004;101(19):7287–92.

    Article  Google Scholar 

  52. Minuesa G, Alsina C, Garcia-Martin JA, et al. MoiRNAifold: a novel tool for complex in silico RNA design. Nucleic Acids Res. 2021;49(9):4934–43.

    Article  Google Scholar 

  53. Mnih V, Kavukcuoglu K, Silver D, et al. Playing Atari with deep reinforcement learning. 2013. ar**v preprint ar**v:1312:5602

  54. Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR; 2016. p. 1928–37.

  55. Nussinov R, Jacobson AB. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci. 1980;77(11):6309–13.

    Article  Google Scholar 

  56. Obonyo S, Nicolas J, Owuor D. Designing RNA sequences through self-play. In: IJCCI; 2022. p. 305–12.

  57. Portela F. An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv. 2018:345587.

  58. Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.

    Article  Google Scholar 

  59. Reinharz V, Ponty Y, Waldispühl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics. 2013;29(13):i308–15.

    Article  Google Scholar 

  60. Retwitzer MD, Reinharz V, Churkin A, et al. incaRNAfbinv 2.0: a webserver and software with motif control for fragment-based design of RNAs. Bioinformatics. 2020;36(9):2920–2.

    Article  Google Scholar 

  61. Rish I, et al. An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 2001, p. 41–6.

  62. Rosenblatt F. The perceptron, a perceiving and recognizing automaton project para. Cornell Aeronautical Laboratory; 1957.

  63. Rosin CD. Nested rollout policy adaptation for Monte Carlo tree search. In: IJCAI; 2011. p. 649–4.

  64. Rummery GA, Niranjan M. On-line Q-learning using connectionist systems, vol. 37. Cambridge: University of Cambridge, Department of Engineering; 1994.

    Google Scholar 

  65. Runge F, Stoll D, Falkner S, et al. Learning to design RNA. 2018. ar**v preprint ar**v:1812.11951.

  66. Saha B. Fast & space-efficient approximations of language edit distance and RNA folding: an amnesic dynamic programming approach. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). IEEE; 2017. p. 295–306.

  67. Schaffner KF. The Watson–Crick model and reductionism. Br J Philos Sci. 1969;20(4):325–48.

    Article  Google Scholar 

  68. Schlichtkrull M, Kipf TN, Bloem P, et al. Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer; 2018. p. 593–607.

  69. Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: International Conference on Machine Learning. PMLR; 2015. p. 1889–97.

  70. Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. 2017. ar**v preprint ar**v:1707.06347.

  71. Silver D, Huang A, Maddison CJ, et al. Mastering the game of go with deep neural networks and tree search. Nature. 2016;529(7587):484–9.

    Article  Google Scholar 

  72. Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge. Nature. 2017;550(7676):354–9.

    Article  Google Scholar 

  73. Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.

    MathSciNet  Google Scholar 

  74. Sutton RS. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull. 1991;2(4):160–3.

    Article  Google Scholar 

  75. Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge: MIT Press; 2018.

    Google Scholar 

  76. Świechowski M, Godlewski K, Sawicki B, et al. Monte Carlo tree search: a review of recent modifications and applications. Artif Intell Rev. 2023;56(3):2497–562.

    Article  Google Scholar 

  77. Taneda A. Multi-objective optimization for RNA design with multiple target secondary structures. BMC Bioinform. 2015;16(1):1–20.

    Article  Google Scholar 

  78. Trotta E. On the normalization of the minimum free energy of RNAs by sequence length. PLoS ONE. 2014;9(11): e113380.

    Article  Google Scholar 

  79. Vinyals O, Babuschkin I, Czarnecki WM, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature. 2019;575(7782):350–4.

    Article  Google Scholar 

  80. Wang T, Wei JJ, Sabatini DM, et al. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343(6166):80–4.

    Article  Google Scholar 

  81. Watford M, Wu G. Protein. Adv Nutr. 2018;9(5):651–3.

    Article  Google Scholar 

  82. Watkins CJ, Dayan P. Q-learning. Mach Learn. 1992;8:279–92.

    Article  Google Scholar 

  83. Weinbrand L, Avihoo A, Barash D. RNAfbinv: an interactive java application for fragment-based design of RNA sequences. Bioinformatics. 2013;29(22):2938–40.

    Article  Google Scholar 

  84. Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn. 1992;8:229–56.

    Article  Google Scholar 

  85. Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst. 1987;2(1–3):37–52.

    Article  Google Scholar 

  86. Yang X, Yoshizoe K, Taneda A, et al. RNA inverse folding using Monte Carlo tree search. BMC Bioinform. 2017;18(1):1–12.

    Article  Google Scholar 

  87. Zemora G, Waldsich C. RNA folding in living cells. RNA Biol. 2010;7(6):634–41.

    Article  Google Scholar 

  88. Zuker M, Mathews DH, Turner DH. Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In: RNA biochemistry and biotechnology. Berlin: Springer; 1999. p. 11–43.

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was granted access to the HPC/AI resources of IDRIS under the allocation 2023-AD010614071 made by GENCI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Obonyo.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Computational Intelligence 2022” guest edited by Joaquim Filipe, Kevin Warwick, Janusz Kacprzyk, Thomas Bäck, Bas van Stein, Christian Wagner, Jonathan Garibaldi, H. K. Lam, Marie Cottrell and Faiyaz Docto.

Appendices

Appendix 1 Value Network Training

The value network was trained according to the following parameters and hyperparameter constraints. These were selected using Optuna [1]. framework (Table 10).

Table 10 Parameter/hyperparameter for the value network

Appendix 2 One-Hot Feature Encoding Results

The experiments according to one-hot encoding did not yield better scores compared to the coding scheme of the results presented in the experiment section of this paper (Table 11).

Table 11 SPRNA with one-hot encoding

Appendix 3 Software and Hardware Settings

The SPRNA was trained with PyTorch 1.13 and Python 3.7. All the experiments were run on NVIDIA V100.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Obonyo, S., Jouandeau, N. & Owuor, D. Self-Playing RNA Inverse Folding. SN COMPUT. SCI. 5, 403 (2024). https://doi.org/10.1007/s42979-024-02659-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02659-x

Keywords

Navigation