Computational approaches to study the effects of small genomic variations

Khafizov, Kamil; Ivanov, Maxim V.; Glazova, Olga V.; Kovalenko, Sergei P.

doi:10.1007/s00894-015-2794-y

Computational approaches to study the effects of small genomic variations

Review
Published: 08 September 2015

Volume 21, article number 251, (2015)
Cite this article

Journal of Molecular Modeling Aims and scope Submit manuscript

Kamil Khafizov¹,
Maxim V. Ivanov¹,
Olga V. Glazova¹ &
…
Sergei P. Kovalenko^1,2,3

1643 Accesses
3 Altmetric
Explore all metrics

Abstract

Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Opportunities and challenges in long-read sequencing data analysis

Article Open access 07 February 2020

Mutations and Polymorphisms: What Is The Difference?

Next-Generation Sequencing: Advantages, Disadvantages, and Future

Abbreviations

TCGA:: The Cancer Genome Atlas
ICGC:: International Cancer Genome Consortium
SNP:: Single nucleotide polymorphism
HGMD:: Human Gene Mutation Database
sSNP:: Nonsynonymous SNP
OMIM:: Online Mendelian Inheritance in Man
HGV:: Human Genome Variation
PMD:: Protein Mutant Database
EVS:: Exome Variant Server
COSMIC:: Collection of somatic mutations in cancer
NCBI:: National Center for Biotechnology Information
dbSNP:: SNP Database
LSDB:: Large number of locus-specific databases
HGVS:: Human Genome Variation Society
MAF:: Minor allele frequency
MSA:: Multiple sequence alignment
PDB:: Protein Data Bank
SS:: Secondary structure
CAGI:: Critical Assessment of Genome Interpretation

References

Levitt M (2009) Nature of the protein universe. Proc Natl Acad Sci USA 106(27):11079–11084. doi:10.1073/pnas.0905029106
Khafizov K, Madrid-Aliste C, Almo SC, Fiser A (2014) Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative. Proc Natl Acad Sci USA 111(10):3733–3738. doi:10.1073/pnas.1321614111
Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genoty**. Nat Rev Genet 12(5):363–376. doi:10.1038/nrg2958
Article CAS Google Scholar
Giordano TJ (2014) The Cancer Genome Atlas research network: a sight to behold. Endocr Pathol 25(4):362–365. doi:10.1007/s12022-014-9345-4
The International Cancer Genome Consortium, Hudson T et al (2010) International network of cancer genome projects. Nature 464(7291):993–998. doi:10.1038/nature08987
1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65. doi:10.1038/nature11632
Ng SB, Nickerson DA, Bamshad MJ, Shendure J (2010) Massively parallel sequencing and rare disease. Hum Mol Genet 19(R2):R119–R124. doi:10.1093/hmg/ddq390
Article CAS Google Scholar
Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ (2010) Exome sequencing identifies the cause of a Mendelian disorder. Nat Genet 42(1):30–35. doi:10.1038/ng.499
Thomas PD, Kejariwal A (2004) Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. Proc Natl Acad Sci USA 101(43):15398–15403. doi:10.1073/pnas.0404380101
Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE (2013) Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet 14(10):681–691. doi:10.1038/nrg3555
Article CAS Google Scholar
Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, Cooper DN (2014) The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133(1):1–9. doi:10.1007/s00439-013-1358-4
Article CAS Google Scholar
Bi XH, Lu CM, Liu Q, Zhang ZX, Zhao HL, Yu J, Zhang JW (2012) A 14 bp indel variation in the NCX1 gene modulates the age at onset in late-onset Alzheimer’s disease. J Neural Transm 119(3):383–386. doi:10.1007/s00702-011-0696-4
Article CAS Google Scholar
Dong B, Chen J, Zhang X, Pan Z, Bai F, Li Y (2013) Two novel PRP31 premessenger ribonucleic acid processing factor 31 homolog mutations including a complex insertion-deletion identified in Chinese families with retinitis pigmentosa. Mol Vis 19:2426–2435
CAS Google Scholar
Yu Q, Zhou C, Wang J, Chen L, Zheng S, Zhang J (2013) A functional insertion/deletion polymorphism in the promoter of PDCD6IP is associated with the susceptibility of hepatocellular carcinoma in a Chinese population. DNA Cell Biol 32(8):451–457. doi:10.1089/dna.2013.2061
Article CAS Google Scholar
Glanzmann B, Lombard D, Carr J, Bardien S (2014) Screening of two indel polymorphisms in the 5′UTR of the DJ-1 gene in South African Parkinson’s disease patients. J Neural Transm 121(2):135–138. doi:10.1007/s00702-013-1094-x
Ross JS, Wang K, Al-Rohil RN, Nazeer T, Sheehan CE, Otto GA, He J, Palmer G, Yelensky R, Lipson D, Ali S, Balasubramanian S, Curran JA, Garcia L, Mahoney K, Downing SR, Hawryluk M, Miller VA, Stephens PJ (2014) Advanced urothelial carcinoma: next-generation sequencing reveals diverse genomic alterations and targets of therapy. Mod Pathol: Off J US Can Acad Pathol Inc 27(2):271–280. doi:10.1038/modpathol.2013.135
Article CAS Google Scholar
Wrobel JA, Chao SF, Conrad MJ, Merker JD, Swanstrom R, Pielak GJ, Hutchison CA 3rd (1998) A genetic approach for identifying critical residues in the fingers and palm subdomains of HIV-1 reverse transcriptase. Proc Natl Acad Sci USA 95(2):638–645
Zwick ME, Cutler DJ, Chakravarti A (2000) Patterns of genetic variation in Mendelian and complex traits. Annu Rev Genomics Hum Genet 1:387–407. doi:10.1146/annurev.genom.1.1.387
Article CAS Google Scholar
Hainaut P, Hernandez T, Robinson A, Rodriguez-Tome P, Flores T, Hollstein M, Harris CC, Montesano R (1998) IARC database of p53 gene mutations in human tumors and cell lines: updated compilation, revised formats and new visualisation tools. Nucleic Acids Res 26(1):205–213
Henikoff S, Comai L (2003) Single-nucleotide mutations for plant functional genomics. Annu Rev Plant Biol 54:375–401. doi:10.1146/annurev.arplant.54.031902.135009
Article CAS Google Scholar
Johnston JJ, Biesecker LG (2013) Databases of genomic variation and phenotypes: existing resources and future needs. Hum Mol Genet 22(R1):R27–R31. doi:10.1093/hmg/ddt384
Article CAS Google Scholar
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. doi:10.1093/nar/gki033
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311
Article CAS Google Scholar
Smigielski EM, Sirotkin K, Ward M, Sherry ST (2000) dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res 28(1):352–355
Article CAS Google Scholar
MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW (2014) The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res 42:D986–D992. doi:10.1093/nar/gkt958
UniProt Consortium (2008) The Universal Protein Resource (UniProt). Nucleic Acids Res 36:D190–D195. doi:10.1093/nar/gkm895
UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212. doi:10.1093/nar/gku989
Kawabata T, Ota M, Nishikawa K (1999) The Protein Mutant Database. Nucleic Acids Res 27(1):355–357
Article CAS Google Scholar
Thusberg J, Olatubosun A, Vihinen M (2011) Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 32(4):358–368. doi:10.1002/humu.21445
Article Google Scholar
Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43:D805–D811. doi:10.1093/nar/gku1075
Gonzalez-Perez A, Lopez-Bigas N (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 88(4):440–449. doi:10.1016/j.ajhg.2011.03.004
Article CAS Google Scholar
Tryka KA, Hao L, Sturcke A, ** Y, Wang ZY, Ziyabari L, Lee M, Popova N, Sharopova N, Kimura M, Feolo M (2014) NCBI’s Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979. doi:10.1093/nar/gkt1211
International HapMap Consortium, Frazer KA et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164):851–861. doi:10.1038/nature06258
Reich DE, Gabriel SB, Altshuler D (2003) Quality and completeness of SNP databases. Nat Genet 33(4):457–458. doi:10.1038/ng1133
Article CAS Google Scholar
Mitchell AA, Zwick ME, Chakravarti A, Cutler DJ (2004) Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genoty** error rates and patterns. Bioinformatics 20(7):1022–1032. doi:10.1093/bioinformatics/bth034
Article CAS Google Scholar
Musumeci L, Arthur JW, Cheung FS, Hoque A, Lippman S, Reichardt JK (2010) Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genoty** and haploty** studies. Hum Mutat 31(1):67–73. doi:10.1002/humu.21137
Article CAS Google Scholar
Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN (2012) The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinformatics Chapter 1:Unit 1.13. doi:10.1002/0471250953.bi0113s39
Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814
Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7:Unit 7.20. doi:10.1002/0471142905.hg0720s76
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249. doi:10.1038/nmeth0410-248
Article CAS Google Scholar
Li B, Krishnan VG, Mort ME, **n F, Kamati KK, Cooper DN, Mooney SD, Radivojac P (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25(21):2744–2750. doi:10.1093/bioinformatics/btp528
Article CAS Google Scholar
Cotton RG, Auerbach AD, Beckmann JS, Blumenfeld OO, Brookes AJ, Brown AF, Carrera P, Cox DW, Gottlieb B, Greenblatt MS, Hilbert P, Lehvaslaiho H, Liang P, Marsh S, Nebert DW, Povey S, Rossetti S, Scriver CR, Summar M, Tolan DR, Verma IC, Vihinen M, den Dunnen JT (2008) Recommendations for locus-specific databases and their curation. Hum Mutat 29(1):2–5. doi:10.1002/humu.20650
Article CAS Google Scholar
den Dunnen JT, Antonarakis SE (2000) Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15(1):7–12. doi:10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N
Article Google Scholar
Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT (2011) LOVD v. 2.0: the next generation in gene variant databases. Hum Mutat 32(5):557–563. doi:10.1002/humu.21438
Article CAS Google Scholar
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980–D985. doi:10.1093/nar/gkt1113
Yip YL, Famiglietti M, Gos A, Duek PD, David FP, Gateau A, Bairoch A (2008) Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase. Hum Mutat 29(3):361–366. doi:10.1002/humu.20671
Article CAS Google Scholar
Capriotti E, Calabrese R, Casadio R (2006) Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22(22):2729–2734. doi:10.1093/bioinformatics/btl423
Article CAS Google Scholar
Tian J, Wu N, Guo X, Guo J, Zhang J, Fan Y (2007) Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics 8:450. doi:10.1186/1471-2105-8-450
Hicks S, Wheeler DA, Plon SE, Kimmel M (2011) Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum Mutat 32(6):661–668. doi:10.1002/humu.21490
Article CAS Google Scholar
Bromberg Y, Rost B (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35(11):3823–3835. doi:10.1093/nar/gkm238
Article CAS Google Scholar
Bao L, Zhou M, Cui Y (2005) nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res 33:W480–W482. doi:10.1093/nar/gki372
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R (2009) Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 30(8):1237–1244. doi:10.1002/humu.21047
Article CAS Google Scholar
Ramensky V, Bork P, Sunyaev S (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30(17):3894–3900
Article CAS Google Scholar
Reva B, Antipin Y, Sander C (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39:e118. doi:10.1093/nar/gkr407
Mi H, Guo N, Kejariwal A, Thomas PD (2007) PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 35:D247–D252. doi:10.1093/nar/gkl869
Stone EA, Sidow A (2005) Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res 15(7):978–986. doi:10.1101/gr.3804205
Article CAS Google Scholar
Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A, Robles V (2006) Machine learning in bioinformatics. Brief Bioinform 7(1):86–112
Article CAS Google Scholar
Ng PC, Henikoff S (2006) Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7:61–80. doi:10.1146/annurev.genom.7.080505.115630
Article CAS Google Scholar
Pervez MT, Babar ME, Nadeem A, Aslam M, Awan AR, Aslam N, Hussain T, Naveed N, Qadri S, Waheed U, Shoaib M (2014) Evaluating the accuracy and efficiency of multiple sequence alignment methods. Evol Bioinformatics Online 10:205–217. doi:10.4137/EBO.S19199
Article CAS Google Scholar
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688. doi:10.1371/journal.pone.0046688
Tavtigian SV, Deffenbaugh AM, Yin L, Judkins T, Scholl T, Samollow PB, de Silva D, Zharkikh A, Thomas A (2006) Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J Med Genet 43(4):295–305. doi:10.1136/jmg.2005.033878
Article CAS Google Scholar
Ferrer-Costa C, Gelpi JL, Zamakola L, Parraga I, de la Cruz X, Orozco M (2005) PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 21(14):3176–3178. doi:10.1093/bioinformatics/bti486
Article CAS Google Scholar
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504. doi:10.1093/nar/gki025
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65. doi:10.1093/nar/gkl842
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2
Article CAS Google Scholar
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD (2012) The Pfam protein families database. Nucleic Acids Res 40:D290–D301. doi:10.1093/nar/gkr1065
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi:10.1038/msb.2011.75
Article Google Scholar
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B (2005) DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 6:66. doi:10.1186/1471-2105-6-66
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. doi:10.1093/molbev/mst010
Article CAS Google Scholar
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. doi:10.1093/nar/gkh340
Article CAS Google Scholar
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340. doi:10.1101/gr.2821705
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217. doi:10.1006/jmbi.2000.4042
Wallace IM, O’Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34(6):1692–1699. doi:10.1093/nar/gkl091
Article CAS Google Scholar
Kim J, Ma J (2011) PSAR: measuring multiple sequence alignment reliability by probabilistic sampling. Nucleic Acids Res 39(15):6359–6368. doi:10.1093/nar/gkr334
Article CAS Google Scholar
Martin W, Roettger M, Lockhart PJ (2007) A reality check for alignments and trees. Trends Genet 23(10):478–480. doi:10.1016/j.tig.2007.08.007
Loytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883):1632–1635. doi:10.1126/science.1158395
Article Google Scholar
Pais FS, Ruy Pde C, Oliveira G, Coimbra RS (2014) Assessing the efficiency of multiple sequence alignment programs. Algorithms Mol Biol 9(1):4. doi:10.1186/1748-7188-9-4
Ahola V, Aittokallio T, Vihinen M, Uusipaikka E (2006) A statistical score for assessing the quality of multiple sequence alignments. BMC Bioinformatics 7:484. doi:10.1186/1471-2105-7-484
Golubchik T, Wise MJ, Easteal S, Jermiin LS (2007) Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24(11):2433–2442. doi:10.1093/molbev/msm176
Article CAS Google Scholar
Nuin PA, Wang Z, Tillier ER (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7:471. doi:10.1186/1471-2105-7-471
Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003) OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4:47. doi:10.1186/1471-2105-4-47
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–10919
Dayhoff MOSRM (1978) A model of evolutionary change in proteins. Atlas Protein Seq Structure 5:345–351
Google Scholar
Ferrer-Costa C, Orozco M, de la Cruz X (2002) Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J Mol Biol 315(4):771–786. doi:10.1006/jmbi.2001.5255
Article CAS Google Scholar
Balasubramanian S, **a Y, Freinkman E, Gerstein M (2005) Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms. Nucleic Acids Res 33(5):1710–1721. doi:10.1093/nar/gki311
Article CAS Google Scholar
Brunham LR, Singaraja RR, Pape TD, Kejariwal A, Thomas PD, Hayden MR (2005) Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene. PLoS Genet 1(6):e83. doi:10.1371/journal.pgen.0010083
Bross P, Corydon TJ, Andresen BS, Jorgensen MM, Bolund L, Gregersen N (1999) Protein misfolding and degradation in genetic diseases. Hum Mutat 14(3):186–198. doi:10.1002/(SICI)1098-1004(1999)14:3<186::AID-HUMU2>3.0.CO;2-J
Article CAS Google Scholar
Wang Z, Moult J (2001) SNPs, protein structure, and disease. Hum Mutat 17(4):263–270. doi:10.1002/humu.22
Article Google Scholar
Yue P, Melamud E, Moult J (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 7:166. doi:10.1186/1471-2105-7-166
Kucukkal TG, Yang Y, Chapman SC, Cao W, Alexov E (2014) Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 15(6):9670–9717. doi:10.3390/ijms15069670
Article CAS Google Scholar
Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, Sarai A (2002) ProTherm, thermodynamic database for proteins and mutants: developments in version 3.0. Nucleic Acids Res 30(1):301–302
Kumar MD, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A (2006) ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions. Nucleic Acids Res 34:D204–D206. doi:10.1093/nar/gkj103
Moal IH, Fernandez-Recio J (2012) SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics 28(20):2600–2607. doi:10.1093/bioinformatics/bts489
Article CAS Google Scholar
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382–W388. doi:10.1093/nar/gki387
Yin S, Ding F, Dokholyan NV (2007) Eris: an automated estimator of protein stability. Nat Methods 4(6):466–467. doi:10.1038/nmeth0607-466
Article CAS Google Scholar
Pokala N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347(1):203–227. doi:10.1016/j.jmb.2004.12.019
Article CAS Google Scholar
Pappu RV, Hart RK, Ponder JW (1998) Analysis and application of potential energy smoothing and search methods for global optimization. J Phys Chem B 102(48):9725–9742. doi:10.1021/Jp982255t
Article CAS Google Scholar
deGroot BL, vanAalten DMF, Scheek RM, Amadei A, Vriend G, Berendsen HJC (1997) Prediction of protein conformational freedom from distance constraints. Proteins 29(2):240–251. doi:10.1002/(Sici)1097-0134(199710)29:2<240::Aid-Prot11>3.0.Co;2-O
Cheng TMK, Lu YE, Vendruscolo M, Lio P, Blundell TL (2008) Prediction by graph theoretic measures of structural effects in proteins arising from non-synonymous single nucleotide polymorphisms. PLoS Comp Biol 4(7):e1000135. doi:10.1371/journal.pcbi.1000135
Pires DEV, Ascher DB, Blundell TL (2014) mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30(3):335–342. doi:10.1093/bioinformatics/btt691
Article CAS Google Scholar
da Silveira CH, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, Lopes JCD, Meira W, Neshich G, Ramos CHI, Habesch R, Santoro MM (2009) Protein cutoff scanning: a comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins 74(3):727–743. doi:10.1002/Prot.22187
Pires DE, de Melo-Minardi RC, dos Santos MA, da Silveira CH, Santoro MM, Meira W Jr (2011) Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 12(Suppl 4):S12. doi:10.1186/1471-2164-12-S4-S12
Article CAS Google Scholar
Pires DE, de Melo-Minardi RC, da Silveira CH, Campos FF, Meira W Jr (2013) aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction. Bioinformatics 29(7):855–861. doi:10.1093/bioinformatics/btt058
Article CAS Google Scholar
Potapov V, Cohen M, Schreiber G (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 22(9):553–560. doi:10.1093/protein/gzp030
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242
Article CAS Google Scholar
Gnad F, Baucom A, Mukhyala K, Manning G, Zhang Z (2013) Assessment of computational methods for predicting the effects of missense mutations in human cancers. BMC Genomics 14(Suppl 3):S7. doi:10.1186/1471-2164-14-S3-S7
Google Scholar
Gnad F, Ren S, Choudhary C, Cox J, Mann M (2010) Predicting post-translational lysine acetylation using support vector machines. Bioinformatics 26(13):1666–1668. doi:10.1093/bioinformatics/btq260
Article CAS Google Scholar
Saunders CT, Baker D (2002) Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol 322(4):891–901
Article CAS Google Scholar
Eisenberg D, Weiss RM, Terwilliger TC (1984) The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci USA 81(1):140–144
Engelman DM, Steitz TA, Goldman A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biophys Chem 15:321–353. doi:10.1146/annurev.bb.15.060186.001541
Article CAS Google Scholar
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157(1):105–132
Article CAS Google Scholar
Wimley WC, White SH (1996) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol 3(10):842–848
Article CAS Google Scholar
Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G (2005) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 433(7024):377–381. doi:10.1038/nature03216
Article CAS Google Scholar
Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 78(6):3824–3828
Stamm M, Staritzbichler R, Khafizov K, Forrest LR (2014) AlignMe—a membrane protein sequence alignment web server. Nucleic Acids Res 42:W246–W251. doi:10.1093/nar/gku291
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185(4154):862–864
Article CAS Google Scholar
Abkevich V, Zharkikh A, Deffenbaugh AM, Frank D, Chen Y, Shattuck D, Skolnick MH, Gutin A, Tavtigian SV (2004) Analysis of missense variation in human BRCA1 in the context of interspecific sequence variation. J Med Genet 41(7):492–507
Article CAS Google Scholar
Miller MP, Kumar S (2001) Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet 10(21):2319–2328
Article CAS Google Scholar
Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 33:W306–W310. doi:10.1093/nar/gki375
Capriotti E, Fariselli P, Rossi I, Casadio R (2008) A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics 9(Suppl 2):S6. doi:10.1186/1471-2105-9-S2-S6
Rost B (1996) PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol 266:525–539
Article CAS Google Scholar
Delorenzi M, Speed T (2002) An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18(4):617–625
Article CAS Google Scholar
Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK (2004) Protein flexibility and intrinsic disorder. Protein Sci 13(1):71–80. doi:10.1110/ps.03128904
Melamud E, Moult J (2003) Evaluation of disorder predictions in CASP5. Proteins 53(Suppl 6):561–565. doi:10.1002/prot.10533
Article CAS Google Scholar
Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure–function paradigm. J Mol Biol 293(2):321–331. doi:10.1006/jmbi.1999.3110
Tompa P (2002) Intrinsically unstructured proteins. Trends Biochem Sci 27(10):527–533
Article CAS Google Scholar
Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6(3):197–208. doi:10.1038/nrm1589
Article CAS Google Scholar
Dunker AK, Brown CJ, Obradovic Z (2002) Identification and functions of usefully disordered proteins. Adv Protein Chem 62:25–49
Article CAS Google Scholar
Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK (2002) Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol 323(3):573–584
Article CAS Google Scholar
Pajkos M, Meszaros B, Simon I, Dosztanyi Z (2012) Is there a biological cost of protein disorder? Analysis of cancer-associated mutations. Mol BioSyst 8(1):296–307. doi:10.1039/c1mb05246b
Article CAS Google Scholar
He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK (2009) Predicting intrinsic disorder in proteins: an overview. Cell Res 19(8):929–949. doi:10.1038/cr.2009.87
Article CAS Google Scholar
Radivojac P, Vucetic S, O’Connor TR, Uversky VN, Obradovic Z, Dunker AK (2006) Calmodulin signaling: analysis and prediction of a disorder-dependent molecular recognition. Proteins 63(2):398–410. doi:10.1002/prot.20873
Article CAS Google Scholar
Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049. doi:10.1093/nar/gkh253
Article CAS Google Scholar
Daily MD, Masica D, Sivasubramanian A, Somarouthu S, Gray JJ (2005) CAPRI rounds 3–5 reveal promising successes and future challenges for RosettaDock. Proteins 60(2):181–186. doi:10.1002/prot.20555
Article CAS Google Scholar
Folkman L, Yang Y, Li Z, Stantic B, Sattar A, Mort M, Cooper DN, Liu Y, Zhou Y (2015) DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 31(10):1599–1606. doi:10.1093/bioinformatics/btu862
Article Google Scholar
Hu J, Ng PC (2013) SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One 8(10):e77940. doi:10.1371/journal.pone.0077940
Zhao HY, Yang YD, Lin H, Zhang XJ, Mort M, Cooper DN, Liu YL, Zhou YQ (2013) DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol 14(3):R23. doi:10.1186/Gb-2013-14-3-R23
Zia A, Moses AM (2011) Ranking insertion, deletion and nonsense mutations based on their effect on genetic information. BMC Bioinformatics 12:299. doi:10.1186/1471-2105-12-299
Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315. doi:10.1038/ng.2892
Article CAS Google Scholar
Liu M, Watson LT, Zhang L (2014) Quantitative prediction of the effect of genetic variation using hidden Markov models. BMC Bioinformatics 15:5. doi:10.1186/1471-2105-15-5
Bermejo-Das-Neves C, Nguyen HN, Poch O, Thompson JD (2014) A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). BMC Bioinformatics 15:111. doi:10.1186/1471-2105-15-111
Limongelli I, Marini S, Bellazzi R (2015) PaPI: pseudo amino acid composition to score human protein-coding variants. BMC Bioinformatics 16:123. doi:10.1186/s12859-015-0554-8
Zhang N, Huang T, Cai YD (2015) Discriminating between deleterious and neutral non-frameshifting indels based on protein interaction networks and hybrid properties. Mol Genet Genomics 290(1):343–352. doi:10.1007/s00438-014-0922-5
Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12(11):745–755. doi:10.1038/nrg3031
Article CAS Google Scholar
Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM, Broad GO, Seattle GO, Project NES (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090):64–69. doi:10.1126/science.1219240
Article Google Scholar
Alper SL (2013) Harnessing red cell membrane pathophysiology towards point-of-care diagnosis for sickle cell disease. J Physiol 591(Pt 6):1403–1404. doi:10.1113/jphysiol.2013.252429
Article Google Scholar
Aidoo M, Terlouw DJ, Kolczak M, McElroy PD, ter Kuile FO, Kariuki S, Nahlen BL, Lal AA, Udhayakumar V (2002) Protective effects of the sickle cell gene against malaria morbidity and mortality. Lancet 359(9314):1311–1312. doi:10.1016/S0140-6736(02)08273-9
Article CAS Google Scholar
Gong S, Blundell TL (2010) Structural and functional restraints on the occurrence of single amino acid variations in human proteins. PLoS One 5(2):e9186. doi:10.1371/journal.pone.0009186
Wang MJ, Sun ZW, Akutsu T, Song JM (2013) Recent advances in predicting functional impact of single amino acid polymorphisms: a review of useful features, computational methods and available tools. Curr Bioinform 8(2):161–176
Capriotti E, Altman RB, Bromberg Y (2013) Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14(Suppl 3):S2. doi:10.1186/1471-2164-14-S3-S2
Article Google Scholar
Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J (2014) PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 10(1):e1003440. doi:10.1371/journal.pcbi.1003440
Olatubosun A, Valiaho J, Harkonen J, Thusberg J, Vihinen M (2012) PON-P: integrated predictor for pathogenicity of missense variants. Hum Mutat 33(8):1166–1174. doi:10.1002/humu.22102
Article CAS Google Scholar
Faa V, Coiana A, Incani F, Costantino L, Cao A, Rosatelli MC (2010) A synonymous mutation in the CFTR gene causes aberrant splicing in an Italian patient affected by a mild form of cystic fibrosis. J Mol Diagn 12(3):380–383. doi:10.2353/jmoldx.2010.090126
Brest P, Lapaquette P, Souidi M, Lebrigand K, Cesaro A, Vouret-Craviari V, Mari B, Barbry P, Mosnier JF, Hebuterne X, Harel-Bellan A, Mograbi B, Darfeuille-Michaud A, Hofman P (2011) A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat Genet 43(3):242–245. doi:10.1038/ng.762
Article CAS Google Scholar
Wang DX, Sadee W (2006) Searching for polymorphisms that affect gene expression and mRNA processing: example ABCB1 (MDR1). AAPS J 8(3):E515–E520. doi:10.1208/Aapsj080361
Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314(5807):1930–1933. doi:10.1126/science.1131262
Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315(5811):525–528. doi:10.1126/science.1135308
Article CAS Google Scholar
Katsnelson A (2011) Breaking the silence. Nat Med 17(12):1536–1538. doi:10.1038/Nm1211-1536
Article CAS Google Scholar
Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB (2011) Bioinformatics challenges for personalized medicine. Bioinformatics 27(13):1741–1748. doi:10.1093/bioinformatics/btr295
Article CAS Google Scholar

Download references

Acknowledgments

This study was partially supported by RFBR, research project no. 15-04-04730, and grant no. RFMEFI60714X0098.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russian Federation
Kamil Khafizov, Maxim V. Ivanov, Olga V. Glazova & Sergei P. Kovalenko
The Institute of Molecular Biology and Biophysics, Novosibirsk, Russian Federation
Sergei P. Kovalenko
Novosibirsk State University, Novosibirsk, Russian Federation
Sergei P. Kovalenko

Authors

Kamil Khafizov
View author publications
You can also search for this author in PubMed Google Scholar
Maxim V. Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
Olga V. Glazova
View author publications
You can also search for this author in PubMed Google Scholar
Sergei P. Kovalenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamil Khafizov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khafizov, K., Ivanov, M.V., Glazova, O.V. et al. Computational approaches to study the effects of small genomic variations. J Mol Model 21, 251 (2015). https://doi.org/10.1007/s00894-015-2794-y

Download citation

Received: 08 April 2015
Accepted: 23 August 2015
Published: 08 September 2015
DOI: https://doi.org/10.1007/s00894-015-2794-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational approaches to study the effects of small genomic variations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Opportunities and challenges in long-read sequencing data analysis

Mutations and Polymorphisms: What Is The Difference?

Next-Generation Sequencing: Advantages, Disadvantages, and Future

Abbreviations

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Computational approaches to study the effects of small genomic variations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Opportunities and challenges in long-read sequencing data analysis

Mutations and Polymorphisms: What Is The Difference?

Next-Generation Sequencing: Advantages, Disadvantages, and Future

Abbreviations

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation