SMOTE Based Protein Fold Prediction Classification

  • Conference paper
Advances in Computing and Information Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 177))

  • 3106 Accesses

Abstract

Protein contact maps are two dimensional representations of protein structures. It is well known that specific patterns occuring within contact maps correspond to configurations of protein secondary structures. This paper addresses the problem of protein fold prediction which is a multi-class problem having unbalanced classes. A simple and computationally inexpensive algortihm called Eight-Neighbour algortihm is proposed to extract novel features from the contact map. It is found that of Support Vector Machine (SVM) which can be effectively extended from a binary to a multi-class classifier does not perform well on this problem. Hence in order to boost the performance, boosting algorithm called SMOTE is applied to rebalance the data set and then a decision tree classifier is used to classify “folds” from the features of contact map. The classification is performed across the four major protein structural classes as well as among the different folds within the classes. The results obtained are promising validating the simple methodology of boosting to obtain improved performance on the fold classification problem using features derived from the contact map alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 245.03
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 316.49
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ghanem, A.S., Venkatesh, S., West, G.: Multi-class Pattern Classification in Imbalanced Data. In: ICPR, pp. 2881–2884 (2010)

    Google Scholar 

  2. Day, R., Beck, D.A.C., Armen, R.S., Daggett, V.: A consensus view of fold space: Combining SCOP, CATH, and the Dali domain dictionary. Protein Science 12, 2150–2160 (2003)

    Article  Google Scholar 

  3. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis Journal 6(5), 429–450 (2002)

    MATH  Google Scholar 

  4. Elkan, C.: Boosting and naive bayesian learning. Technical Report CS97-557, Department of Computer Science and Engneering, University of California,Sam Diego, CA (September 1997)

    Google Scholar 

  5. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann, The Mit Press (1996)

    Google Scholar 

  6. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  7. Schwenk, H., Bengio, Y.: Boosting neural networks. Neural Computation 12(8), 1869–1887 (2000)

    Article  Google Scholar 

  8. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 337–374 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost:misclasification cost-sensitive boosting. In: Proceedings of Sixth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, pp. 97–105 (1999)

    Google Scholar 

  10. Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proceedings of the 17th International Conference on Machine Learning, Stanford University, CA, pp. 983–990 (2000)

    Google Scholar 

  11. Joshi, M.V., Kumar, V., Agarwal, R.C.: Evalating boosting algorithms to classify rare classes: Comparison and improvements. In: Proceeding of the First IEEE International Conference on Data Mining, ICDM 2001 (2001)

    Google Scholar 

  12. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. In: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databass, Dubrovnik, Croatia, pp. 107–119 (2003)

    Google Scholar 

  13. Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: The databoost-IM approach. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets 6(1), 30–39 (2004)

    Google Scholar 

  14. Bhavani, S.D., Suvarnavani, K., Sinha, S.: Mining of protein contact maps for protein fold prediction. In: WIREs Data Mining and Knowledge Discovery, vol. 1, pp. 362–368. John Wiley & Sons (July/August 2011)

    Google Scholar 

  15. Hsu, C., Lin, C.J.: A comparision of methods for multi-class Support Vector Machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)

    Article  Google Scholar 

  16. Barah, P., Sinha, S.: Analysis of protein folds using protein contact networks. Pramana 71(2), 369–378 (2008)

    Article  Google Scholar 

  17. Shi, J.-Y., Zhang, Y.-N.: Fast SCOP Classification of Structural Class and Fold Using Secondary Structure Mining in Distance Matrix. In: Kadirkamanathan, V., Sanguinetti, G., Girolami, M., Niranjan, M., Noirel, J. (eds.) PRIB 2009. LNCS (LNBI), vol. 5780, pp. 344–353. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Chmeilnicki, W., Stapor, K.: An efficient multi-class support vector machine classifier for protein fold recognition. In: IWPACBB, pp. 77–84 (2010)

    Google Scholar 

  19. http://www.cs.waikato.ac.nz/ml/weka/

  20. http://www.dynameomics.org/

  21. http://www.rcsb.org/pdb/home/home.do

  22. Fraser, R., Glasgow, J.: A Demonstration of Clustering in Protein Contact Maps for Alpha Helix Pairs. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 758–766. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  23. Ding, C.H.Q., Dubchak, I.: Multi-class proteing fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)

    Article  Google Scholar 

  24. Shamim, M.T.A., Anwaruddin, M., Nagarajaram, H.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinformatics 23:24, 3320–3327 (2007)

    Article  Google Scholar 

  25. Zaki, M.J., Nadimpally, V., Bardhan, D., Bystroff, C.: Predicting Protein Folding Pathways. In: Datamining in Bioinformatics. Springer (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Suvarna Vani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suvarna Vani, K., Durga Bhavani, S. (2013). SMOTE Based Protein Fold Prediction Classification. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds) Advances in Computing and Information Technology. Advances in Intelligent Systems and Computing, vol 177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31552-7_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31552-7_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31551-0

  • Online ISBN: 978-3-642-31552-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation