Log in

Computational approaches to study the effects of small genomic variations

  • Review
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Abbreviations

TCGA:

The Cancer Genome Atlas

ICGC:

International Cancer Genome Consortium

SNP:

Single nucleotide polymorphism

HGMD:

Human Gene Mutation Database

sSNP:

Nonsynonymous SNP

OMIM:

Online Mendelian Inheritance in Man

HGV:

Human Genome Variation

PMD:

Protein Mutant Database

EVS:

Exome Variant Server

COSMIC:

Collection of somatic mutations in cancer

NCBI:

National Center for Biotechnology Information

dbSNP:

SNP Database

LSDB:

Large number of locus-specific databases

HGVS:

Human Genome Variation Society

MAF:

Minor allele frequency

MSA:

Multiple sequence alignment

PDB:

Protein Data Bank

SS:

Secondary structure

CAGI:

Critical Assessment of Genome Interpretation

References

  1. Levitt M (2009) Nature of the protein universe. Proc Natl Acad Sci USA 106(27):11079–11084. doi:10.1073/pnas.0905029106

  2. Khafizov K, Madrid-Aliste C, Almo SC, Fiser A (2014) Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative. Proc Natl Acad Sci USA 111(10):3733–3738. doi:10.1073/pnas.1321614111

  3. Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genoty**. Nat Rev Genet 12(5):363–376. doi:10.1038/nrg2958

    Article  CAS  Google Scholar 

  4. Giordano TJ (2014) The Cancer Genome Atlas research network: a sight to behold. Endocr Pathol 25(4):362–365. doi:10.1007/s12022-014-9345-4

  5. The International Cancer Genome Consortium, Hudson T et al (2010) International network of cancer genome projects. Nature 464(7291):993–998. doi:10.1038/nature08987

  6. 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65. doi:10.1038/nature11632

  7. Ng SB, Nickerson DA, Bamshad MJ, Shendure J (2010) Massively parallel sequencing and rare disease. Hum Mol Genet 19(R2):R119–R124. doi:10.1093/hmg/ddq390

    Article  CAS  Google Scholar 

  8. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ (2010) Exome sequencing identifies the cause of a Mendelian disorder. Nat Genet 42(1):30–35. doi:10.1038/ng.499

  9. Thomas PD, Kejariwal A (2004) Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. Proc Natl Acad Sci USA 101(43):15398–15403. doi:10.1073/pnas.0404380101

  10. Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE (2013) Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet 14(10):681–691. doi:10.1038/nrg3555

    Article  CAS  Google Scholar 

  11. Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, Cooper DN (2014) The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133(1):1–9. doi:10.1007/s00439-013-1358-4

    Article  CAS  Google Scholar 

  12. Bi XH, Lu CM, Liu Q, Zhang ZX, Zhao HL, Yu J, Zhang JW (2012) A 14 bp indel variation in the NCX1 gene modulates the age at onset in late-onset Alzheimer’s disease. J Neural Transm 119(3):383–386. doi:10.1007/s00702-011-0696-4

    Article  CAS  Google Scholar 

  13. Dong B, Chen J, Zhang X, Pan Z, Bai F, Li Y (2013) Two novel PRP31 premessenger ribonucleic acid processing factor 31 homolog mutations including a complex insertion-deletion identified in Chinese families with retinitis pigmentosa. Mol Vis 19:2426–2435

    CAS  Google Scholar 

  14. Yu Q, Zhou C, Wang J, Chen L, Zheng S, Zhang J (2013) A functional insertion/deletion polymorphism in the promoter of PDCD6IP is associated with the susceptibility of hepatocellular carcinoma in a Chinese population. DNA Cell Biol 32(8):451–457. doi:10.1089/dna.2013.2061

    Article  CAS  Google Scholar 

  15. Glanzmann B, Lombard D, Carr J, Bardien S (2014) Screening of two indel polymorphisms in the 5′UTR of the DJ-1 gene in South African Parkinson’s disease patients. J Neural Transm 121(2):135–138. doi:10.1007/s00702-013-1094-x

  16. Ross JS, Wang K, Al-Rohil RN, Nazeer T, Sheehan CE, Otto GA, He J, Palmer G, Yelensky R, Lipson D, Ali S, Balasubramanian S, Curran JA, Garcia L, Mahoney K, Downing SR, Hawryluk M, Miller VA, Stephens PJ (2014) Advanced urothelial carcinoma: next-generation sequencing reveals diverse genomic alterations and targets of therapy. Mod Pathol: Off J US Can Acad Pathol Inc 27(2):271–280. doi:10.1038/modpathol.2013.135

    Article  CAS  Google Scholar 

  17. Wrobel JA, Chao SF, Conrad MJ, Merker JD, Swanstrom R, Pielak GJ, Hutchison CA 3rd (1998) A genetic approach for identifying critical residues in the fingers and palm subdomains of HIV-1 reverse transcriptase. Proc Natl Acad Sci USA 95(2):638–645

  18. Zwick ME, Cutler DJ, Chakravarti A (2000) Patterns of genetic variation in Mendelian and complex traits. Annu Rev Genomics Hum Genet 1:387–407. doi:10.1146/annurev.genom.1.1.387

    Article  CAS  Google Scholar 

  19. Hainaut P, Hernandez T, Robinson A, Rodriguez-Tome P, Flores T, Hollstein M, Harris CC, Montesano R (1998) IARC database of p53 gene mutations in human tumors and cell lines: updated compilation, revised formats and new visualisation tools. Nucleic Acids Res 26(1):205–213

  20. Henikoff S, Comai L (2003) Single-nucleotide mutations for plant functional genomics. Annu Rev Plant Biol 54:375–401. doi:10.1146/annurev.arplant.54.031902.135009

    Article  CAS  Google Scholar 

  21. Johnston JJ, Biesecker LG (2013) Databases of genomic variation and phenotypes: existing resources and future needs. Hum Mol Genet 22(R1):R27–R31. doi:10.1093/hmg/ddt384

    Article  CAS  Google Scholar 

  22. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. doi:10.1093/nar/gki033

  23. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311

    Article  CAS  Google Scholar 

  24. Smigielski EM, Sirotkin K, Ward M, Sherry ST (2000) dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res 28(1):352–355

    Article  CAS  Google Scholar 

  25. MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW (2014) The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res 42:D986–D992. doi:10.1093/nar/gkt958

  26. UniProt Consortium (2008) The Universal Protein Resource (UniProt). Nucleic Acids Res 36:D190–D195. doi:10.1093/nar/gkm895

  27. UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212. doi:10.1093/nar/gku989

  28. Kawabata T, Ota M, Nishikawa K (1999) The Protein Mutant Database. Nucleic Acids Res 27(1):355–357

    Article  CAS  Google Scholar 

  29. Thusberg J, Olatubosun A, Vihinen M (2011) Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 32(4):358–368. doi:10.1002/humu.21445

    Article  Google Scholar 

  30. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43:D805–D811. doi:10.1093/nar/gku1075

  31. Gonzalez-Perez A, Lopez-Bigas N (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 88(4):440–449. doi:10.1016/j.ajhg.2011.03.004

    Article  CAS  Google Scholar 

  32. Tryka KA, Hao L, Sturcke A, ** Y, Wang ZY, Ziyabari L, Lee M, Popova N, Sharopova N, Kimura M, Feolo M (2014) NCBI’s Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979. doi:10.1093/nar/gkt1211

  33. International HapMap Consortium, Frazer KA et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164):851–861. doi:10.1038/nature06258

  34. Reich DE, Gabriel SB, Altshuler D (2003) Quality and completeness of SNP databases. Nat Genet 33(4):457–458. doi:10.1038/ng1133

    Article  CAS  Google Scholar 

  35. Mitchell AA, Zwick ME, Chakravarti A, Cutler DJ (2004) Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genoty** error rates and patterns. Bioinformatics 20(7):1022–1032. doi:10.1093/bioinformatics/bth034

    Article  CAS  Google Scholar 

  36. Musumeci L, Arthur JW, Cheung FS, Hoque A, Lippman S, Reichardt JK (2010) Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genoty** and haploty** studies. Hum Mutat 31(1):67–73. doi:10.1002/humu.21137

    Article  CAS  Google Scholar 

  37. Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN (2012) The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinformatics Chapter 1:Unit 1.13. doi:10.1002/0471250953.bi0113s39

  38. Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814

  39. Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7:Unit 7.20. doi:10.1002/0471142905.hg0720s76

  40. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249. doi:10.1038/nmeth0410-248

    Article  CAS  Google Scholar 

  41. Li B, Krishnan VG, Mort ME, **n F, Kamati KK, Cooper DN, Mooney SD, Radivojac P (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25(21):2744–2750. doi:10.1093/bioinformatics/btp528

    Article  CAS  Google Scholar 

  42. Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, Hilbert P, Lehvaslaiho H, Liang P, Marsh S, Nebert DW, Povey S, Rossetti S, Scriver CR, Summar M, Tolan DR, Verma IC, Vihinen M, den Dunnen JT (2008) Recommendations for locus-specific databases and their curation. Hum Mutat 29(1):2–5. doi:10.1002/humu.20650

    Article  CAS  Google Scholar 

  43. den Dunnen JT, Antonarakis SE (2000) Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15(1):7–12. doi:10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N

    Article  Google Scholar 

  44. Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT (2011) LOVD v. 2.0: the next generation in gene variant databases. Hum Mutat 32(5):557–563. doi:10.1002/humu.21438

    Article  CAS  Google Scholar 

  45. Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980–D985. doi:10.1093/nar/gkt1113

  46. Yip YL, Famiglietti M, Gos A, Duek PD, David FP, Gateau A, Bairoch A (2008) Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase. Hum Mutat 29(3):361–366. doi:10.1002/humu.20671

    Article  CAS  Google Scholar 

  47. Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22(22):2729–2734. doi:10.1093/bioinformatics/btl423

    Article  CAS  Google Scholar 

  48. Tian J, Wu N, Guo X, Guo J, Zhang J, Fan Y (2007) Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics 8:450. doi:10.1186/1471-2105-8-450

  49. Hicks S, Wheeler DA, Plon SE, Kimmel M (2011) Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum Mutat 32(6):661–668. doi:10.1002/humu.21490

    Article  CAS  Google Scholar 

  50. Bromberg Y, Rost B (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35(11):3823–3835. doi:10.1093/nar/gkm238

    Article  CAS  Google Scholar 

  51. Bao L, Zhou M, Cui Y (2005) nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res 33:W480–W482. doi:10.1093/nar/gki372

  52. Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R (2009) Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30(8):1237–1244. doi:10.1002/humu.21047

    Article  CAS  Google Scholar 

  53. Ramensky V, Bork P, Sunyaev S (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30(17):3894–3900

    Article  CAS  Google Scholar 

  54. Reva B, Antipin Y, Sander C (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39:e118. doi:10.1093/nar/gkr407

  55. Mi H, Guo N, Kejariwal A, Thomas PD (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 35:D247–D252. doi:10.1093/nar/gkl869

  56. Stone EA, Sidow A (2005) Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res 15(7):978–986. doi:10.1101/gr.3804205

    Article  CAS  Google Scholar 

  57. Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A, Robles V (2006) Machine learning in bioinformatics. Brief Bioinform 7(1):86–112

    Article  CAS  Google Scholar 

  58. Ng PC, Henikoff S (2006) Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7:61–80. doi:10.1146/annurev.genom.7.080505.115630

    Article  CAS  Google Scholar 

  59. Pervez MT, Babar ME, Nadeem A, Aslam M, Awan AR, Aslam N, Hussain T, Naveed N, Qadri S, Waheed U, Shoaib M (2014) Evaluating the accuracy and efficiency of multiple sequence alignment methods. Evol Bioinformatics Online 10:205–217. doi:10.4137/EBO.S19199

    Article  CAS  Google Scholar 

  60. Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688. doi:10.1371/journal.pone.0046688

  61. Tavtigian SV, Deffenbaugh AM, Yin L, Judkins T, Scholl T, Samollow PB, de Silva D, Zharkikh A, Thomas A (2006) Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J Med Genet 43(4):295–305. doi:10.1136/jmg.2005.033878

    Article  CAS  Google Scholar 

  62. Ferrer-Costa C, Gelpi JL, Zamakola L, Parraga I, de la Cruz X, Orozco M (2005) PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 21(14):3176–3178. doi:10.1093/bioinformatics/bti486

    Article  CAS  Google Scholar 

  63. Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504. doi:10.1093/nar/gki025

  64. Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65. doi:10.1093/nar/gkl842

  65. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2

    Article  CAS  Google Scholar 

  66. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301. doi:10.1093/nar/gkr1065

  67. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi:10.1038/msb.2011.75

    Article  Google Scholar 

  68. Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B (2005) DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6:66. doi:10.1186/1471-2105-6-66

  69. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. doi:10.1093/molbev/mst010

    Article  CAS  Google Scholar 

  70. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. doi:10.1093/nar/gkh340

    Article  CAS  Google Scholar 

  71. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340. doi:10.1101/gr.2821705

  72. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217. doi:10.1006/jmbi.2000.4042

  73. Wallace IM, O’Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34(6):1692–1699. doi:10.1093/nar/gkl091

    Article  CAS  Google Scholar 

  74. Kim J, Ma J (2011) PSAR: measuring multiple sequence alignment reliability by probabilistic sampling. Nucleic Acids Res 39(15):6359–6368. doi:10.1093/nar/gkr334

    Article  CAS  Google Scholar 

  75. Martin W, Roettger M, Lockhart PJ (2007) A reality check for alignments and trees. Trends Genet 23(10):478–480. doi:10.1016/j.tig.2007.08.007

  76. Loytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883):1632–1635. doi:10.1126/science.1158395

    Article  Google Scholar 

  77. Pais FS, Ruy Pde C, Oliveira G, Coimbra RS (2014) Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol 9(1):4. doi:10.1186/1748-7188-9-4

  78. Ahola V, Aittokallio T, Vihinen M, Uusipaikka E (2006) A statistical score for assessing the quality of multiple sequence alignments. BMC Bioinformatics 7:484. doi:10.1186/1471-2105-7-484

  79. Golubchik T, Wise MJ, Easteal S, Jermiin LS (2007) Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24(11):2433–2442. doi:10.1093/molbev/msm176

    Article  CAS  Google Scholar 

  80. Nuin PA, Wang Z, Tillier ER (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7:471. doi:10.1186/1471-2105-7-471

  81. Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4:47. doi:10.1186/1471-2105-4-47

  82. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919

  83. Dayhoff MOSRM (1978) A model of evolutionary change in proteins. Atlas Protein Seq Structure 5:345–351

    Google Scholar 

  84. Ferrer-Costa C, Orozco M, de la Cruz X (2002) Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J Mol Biol 315(4):771–786. doi:10.1006/jmbi.2001.5255

    Article  CAS  Google Scholar 

  85. Balasubramanian S, **a Y, Freinkman E, Gerstein M (2005) Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms. Nucleic Acids Res 33(5):1710–1721. doi:10.1093/nar/gki311

    Article  CAS  Google Scholar 

  86. Brunham LR, Singaraja RR, Pape TD, Kejariwal A, Thomas PD, Hayden MR (2005) Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene. PLoS Genet 1(6):e83. doi:10.1371/journal.pgen.0010083

  87. Bross P, Corydon TJ, Andresen BS, Jorgensen MM, Bolund L, Gregersen N (1999) Protein misfolding and degradation in genetic diseases. Hum Mutat 14(3):186–198. doi:10.1002/(SICI)1098-1004(1999)14:3<186::AID-HUMU2>3.0.CO;2-J

    Article  CAS  Google Scholar 

  88. Wang Z, Moult J (2001) SNPs, protein structure, and disease. Hum Mutat 17(4):263–270. doi:10.1002/humu.22

    Article  Google Scholar 

  89. Yue P, Melamud E, Moult J (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7:166. doi:10.1186/1471-2105-7-166

  90. Kucukkal TG, Yang Y, Chapman SC, Cao W, Alexov E (2014) Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 15(6):9670–9717. doi:10.3390/ijms15069670

    Article  CAS  Google Scholar 

  91. Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, Sarai A (2002) ProTherm, thermodynamic database for proteins and mutants: developments in version 3.0. Nucleic Acids Res 30(1):301–302

  92. Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A (2006) ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions. Nucleic Acids Res 34:D204–D206. doi:10.1093/nar/gkj103

  93. Moal IH, Fernandez-Recio J (2012) SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics 28(20):2600–2607. doi:10.1093/bioinformatics/bts489

    Article  CAS  Google Scholar 

  94. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382–W388. doi:10.1093/nar/gki387

  95. Yin S, Ding F, Dokholyan NV (2007) Eris: an automated estimator of protein stability. Nat Methods 4(6):466–467. doi:10.1038/nmeth0607-466

    Article  CAS  Google Scholar 

  96. Pokala N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347(1):203–227. doi:10.1016/j.jmb.2004.12.019

    Article  CAS  Google Scholar 

  97. Pappu RV, Hart RK, Ponder JW (1998) Analysis and application of potential energy smoothing and search methods for global optimization. J Phys Chem B 102(48):9725–9742. doi:10.1021/Jp982255t

    Article  CAS  Google Scholar 

  98. deGroot BL, vanAalten DMF, Scheek RM, Amadei A, Vriend G, Berendsen HJC (1997) Prediction of protein conformational freedom from distance constraints. Proteins 29(2):240–251. doi:10.1002/(Sici)1097-0134(199710)29:2<240::Aid-Prot11>3.0.Co;2-O

  99. Cheng TMK, Lu YE, Vendruscolo M, Lio P, Blundell TL (2008) Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms. PLoS Comp Biol 4(7):e1000135. doi:10.1371/journal.pcbi.1000135

  100. Pires DEV, Ascher DB, Blundell TL (2014) mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30(3):335–342. doi:10.1093/bioinformatics/btt691

    Article  CAS  Google Scholar 

  101. da Silveira CH, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, Lopes JCD, Meira W, Neshich G, Ramos CHI, Habesch R, Santoro MM (2009) Protein cutoff scanning: a comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins 74(3):727–743. doi:10.1002/Prot.22187

  102. Pires DE, de Melo-Minardi RC, dos Santos MA, da Silveira CH, Santoro MM, Meira W Jr (2011) Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 12(Suppl 4):S12. doi:10.1186/1471-2164-12-S4-S12

    Article  CAS  Google Scholar 

  103. Pires DE, de Melo-Minardi RC, da Silveira CH, Campos FF, Meira W Jr (2013) aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction. Bioinformatics 29(7):855–861. doi:10.1093/bioinformatics/btt058

    Article  CAS  Google Scholar 

  104. Potapov V, Cohen M, Schreiber G (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 22(9):553–560. doi:10.1093/protein/gzp030

  105. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242

    Article  CAS  Google Scholar 

  106. Gnad F, Baucom A, Mukhyala K, Manning G, Zhang Z (2013) Assessment of computational methods for predicting the effects of missense mutations in human cancers. BMC Genomics 14(Suppl 3):S7. doi:10.1186/1471-2164-14-S3-S7

    Google Scholar 

  107. Gnad F, Ren S, Choudhary C, Cox J, Mann M (2010) Predicting post-translational lysine acetylation using support vector machines. Bioinformatics 26(13):1666–1668. doi:10.1093/bioinformatics/btq260

    Article  CAS  Google Scholar 

  108. Saunders CT, Baker D (2002) Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol 322(4):891–901

    Article  CAS  Google Scholar 

  109. Eisenberg D, Weiss RM, Terwilliger TC (1984) The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci USA 81(1):140–144

  110. Engelman DM, Steitz TA, Goldman A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biophys Chem 15:321–353. doi:10.1146/annurev.bb.15.060186.001541

    Article  CAS  Google Scholar 

  111. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1):105–132

    Article  CAS  Google Scholar 

  112. Wimley WC, White SH (1996) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol 3(10):842–848

    Article  CAS  Google Scholar 

  113. Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G (2005) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 433(7024):377–381. doi:10.1038/nature03216

    Article  CAS  Google Scholar 

  114. Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 78(6):3824–3828

  115. Stamm M, Staritzbichler R, Khafizov K, Forrest LR (2014) AlignMe—a membrane protein sequence alignment web server. Nucleic Acids Res 42:W246–W251. doi:10.1093/nar/gku291

  116. Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185(4154):862–864

    Article  CAS  Google Scholar 

  117. Abkevich V, Zharkikh A, Deffenbaugh AM, Frank D, Chen Y, Shattuck D, Skolnick MH, Gutin A, Tavtigian SV (2004) Analysis of missense variation in human BRCA1 in the context of interspecific sequence variation. J Med Genet 41(7):492–507

    Article  CAS  Google Scholar 

  118. Miller MP, Kumar S (2001) Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet 10(21):2319–2328

    Article  CAS  Google Scholar 

  119. Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 33:W306–W310. doi:10.1093/nar/gki375

  120. Capriotti E, Fariselli P, Rossi I, Casadio R (2008) A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics 9(Suppl 2):S6. doi:10.1186/1471-2105-9-S2-S6

  121. Rost B (1996) PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol 266:525–539

    Article  CAS  Google Scholar 

  122. Delorenzi M, Speed T (2002) An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18(4):617–625

    Article  CAS  Google Scholar 

  123. Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK (2004) Protein flexibility and intrinsic disorder. Protein Sci 13(1):71–80. doi:10.1110/ps.03128904

  124. Melamud E, Moult J (2003) Evaluation of disorder predictions in CASP5. Proteins 53(Suppl 6):561–565. doi:10.1002/prot.10533

    Article  CAS  Google Scholar 

  125. Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure–function paradigm. J Mol Biol 293(2):321–331. doi:10.1006/jmbi.1999.3110

  126. Tompa P (2002) Intrinsically unstructured proteins. Trends Biochem Sci 27(10):527–533

    Article  CAS  Google Scholar 

  127. Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6(3):197–208. doi:10.1038/nrm1589

    Article  CAS  Google Scholar 

  128. Dunker AK, Brown CJ, Obradovic Z (2002) Identification and functions of usefully disordered proteins. Adv Protein Chem 62:25–49

    Article  CAS  Google Scholar 

  129. Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK (2002) Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol 323(3):573–584

    Article  CAS  Google Scholar 

  130. Pajkos M, Meszaros B, Simon I, Dosztanyi Z (2012) Is there a biological cost of protein disorder? Analysis of cancer-associated mutations. Mol BioSyst 8(1):296–307. doi:10.1039/c1mb05246b

    Article  CAS  Google Scholar 

  131. He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK (2009) Predicting intrinsic disorder in proteins: an overview. Cell Res 19(8):929–949. doi:10.1038/cr.2009.87

    Article  CAS  Google Scholar 

  132. Radivojac P, Vucetic S, O’Connor TR, Uversky VN, Obradovic Z, Dunker AK (2006) Calmodulin signaling: analysis and prediction of a disorder-dependent molecular recognition. Proteins 63(2):398–410. doi:10.1002/prot.20873

    Article  CAS  Google Scholar 

  133. Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049. doi:10.1093/nar/gkh253

    Article  CAS  Google Scholar 

  134. Daily MD, Masica D, Sivasubramanian A, Somarouthu S, Gray JJ (2005) CAPRI rounds 3–5 reveal promising successes and future challenges for RosettaDock. Proteins 60(2):181–186. doi:10.1002/prot.20555

    Article  CAS  Google Scholar 

  135. Folkman L, Yang Y, Li Z, Stantic B, Sattar A, Mort M, Cooper DN, Liu Y, Zhou Y (2015) DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 31(10):1599–1606. doi:10.1093/bioinformatics/btu862

    Article  Google Scholar 

  136. Hu J, Ng PC (2013) SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One 8(10):e77940. doi:10.1371/journal.pone.0077940

  137. Zhao HY, Yang YD, Lin H, Zhang XJ, Mort M, Cooper DN, Liu YL, Zhou YQ (2013) DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol 14(3):R23. doi:10.1186/Gb-2013-14-3-R23

  138. Zia A, Moses AM (2011) Ranking insertion, deletion and nonsense mutations based on their effect on genetic information. BMC Bioinformatics 12:299. doi:10.1186/1471-2105-12-299

  139. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315. doi:10.1038/ng.2892

    Article  CAS  Google Scholar 

  140. Liu M, Watson LT, Zhang L (2014) Quantitative prediction of the effect of genetic variation using hidden Markov models. BMC Bioinformatics 15:5. doi:10.1186/1471-2105-15-5

  141. Bermejo-Das-Neves C, Nguyen HN, Poch O, Thompson JD (2014) A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). BMC Bioinformatics 15:111. doi:10.1186/1471-2105-15-111

  142. Limongelli I, Marini S, Bellazzi R (2015) PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinformatics 16:123. doi:10.1186/s12859-015-0554-8

  143. Zhang N, Huang T, Cai YD (2015) Discriminating between deleterious and neutral non-frameshifting indels based on protein interaction networks and hybrid properties. Mol Genet Genomics 290(1):343–352. doi:10.1007/s00438-014-0922-5

  144. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12(11):745–755. doi:10.1038/nrg3031

    Article  CAS  Google Scholar 

  145. Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM, Broad GO, Seattle GO, Project NES (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090):64–69. doi:10.1126/science.1219240

    Article  Google Scholar 

  146. Alper SL (2013) Harnessing red cell membrane pathophysiology towards point-of-care diagnosis for sickle cell disease. J Physiol 591(Pt 6):1403–1404. doi:10.1113/jphysiol.2013.252429

    Article  Google Scholar 

  147. Aidoo M, Terlouw DJ, Kolczak M, McElroy PD, ter Kuile FO, Kariuki S, Nahlen BL, Lal AA, Udhayakumar V (2002) Protective effects of the sickle cell gene against malaria morbidity and mortality. Lancet 359(9314):1311–1312. doi:10.1016/S0140-6736(02)08273-9

    Article  CAS  Google Scholar 

  148. Gong S, Blundell TL (2010) Structural and functional restraints on the occurrence of single amino acid variations in human proteins. PLoS One 5(2):e9186. doi:10.1371/journal.pone.0009186

  149. Wang MJ, Sun ZW, Akutsu T, Song JM (2013) Recent advances in predicting functional impact of single amino acid polymorphisms: a review of useful features, computational methods and available tools. Curr Bioinform 8(2):161–176

  150. Capriotti E, Altman RB, Bromberg Y (2013) Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14(Suppl 3):S2. doi:10.1186/1471-2164-14-S3-S2

    Article  Google Scholar 

  151. Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J (2014) PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 10(1):e1003440. doi:10.1371/journal.pcbi.1003440

  152. Olatubosun A, Valiaho J, Harkonen J, Thusberg J, Vihinen M (2012) PON-P: integrated predictor for pathogenicity of missense variants. Hum Mutat 33(8):1166–1174. doi:10.1002/humu.22102

    Article  CAS  Google Scholar 

  153. Faa V, Coiana A, Incani F, Costantino L, Cao A, Rosatelli MC (2010) A synonymous mutation in the CFTR gene causes aberrant splicing in an Italian patient affected by a mild form of cystic fibrosis. J Mol Diagn 12(3):380–383. doi:10.2353/jmoldx.2010.090126

  154. Brest P, Lapaquette P, Souidi M, Lebrigand K, Cesaro A, Vouret-Craviari V, Mari B, Barbry P, Mosnier JF, Hebuterne X, Harel-Bellan A, Mograbi B, Darfeuille-Michaud A, Hofman P (2011) A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat Genet 43(3):242–245. doi:10.1038/ng.762

    Article  CAS  Google Scholar 

  155. Wang DX, Sadee W (2006) Searching for polymorphisms that affect gene expression and mRNA processing: example ABCB1 (MDR1). AAPS J 8(3):E515–E520. doi:10.1208/Aapsj080361

  156. Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314(5807):1930–1933. doi:10.1126/science.1131262

  157. Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315(5811):525–528. doi:10.1126/science.1135308

    Article  CAS  Google Scholar 

  158. Katsnelson A (2011) Breaking the silence. Nat Med 17(12):1536–1538. doi:10.1038/Nm1211-1536

    Article  CAS  Google Scholar 

  159. Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB (2011) Bioinformatics challenges for personalized medicine. Bioinformatics 27(13):1741–1748. doi:10.1093/bioinformatics/btr295

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This study was partially supported by RFBR, research project no. 15-04-04730, and grant no. RFMEFI60714X0098.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamil Khafizov.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khafizov, K., Ivanov, M.V., Glazova, O.V. et al. Computational approaches to study the effects of small genomic variations. J Mol Model 21, 251 (2015). https://doi.org/10.1007/s00894-015-2794-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00894-015-2794-y

Keywords

Navigation