Hierarchical Classification of Transposable Elements with a Weighted Genetic Algorithm

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2019)

Abstract

Most of the related works in Machine Learning (ML) are concerned with Flat Classification, in which an instance is often associated with one class within a small set of classes. However, in some cases, instances have to be assigned to many classes simultaneously, and these classes are arranged in a hierarchical structure. This problem, called Hierarchical Classification (HC), has received special attention in some fields, such as Bioinformatics. In this context, a topic that has gained attention is the classification of Transposable Elements (TEs), which are DNA fragments capable of moving inside the genome of their hosts. In this paper, we propose a novel hierarchical method based on Genetic Algorithms (GAs) that generates HC rules and classifies TEs in many hierarchical levels of its taxonomy. The proposed method is called Hierarchical Classification with a Weighted Genetic Algorithm (HC-WGA), and is based on a Weighted Sum approach to deal with the accuracy-interpretability trade-off, which is a common and still relevant problem in both ML and Bioinformatics. To the best of our knowledge, this is the first HC method to use such an approach. Experiments with two popular TEs datasets showed that our method achieves competitive results with most of the state-of-the-art HC methods, with the advantage of presenting an interpretable model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 82.38
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 104.85
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available at https://github.com/geantrindade/HC-WGA (20 June 2019).

  2. 2.

    Available at http://pgsb.helmholtz-muenchen.de/plant/ (20 June 2019).

  3. 3.

    Available at http://girinst.org/repbase/ (20 June 2019).

  4. 4.

    Available at https://github.com/geantrindade/TEsHierarchicalDatasets (20 June 2019).

References

  1. Bandaru, S., Ng, A.H., Deb, K.: Data mining methods for knowledge discovery in multi-objective optimization: part a-survey. Expert Syst. Appl. 70, 139–159 (2017)

    Article  Google Scholar 

  2. Costa, E.P., Lorena, A.C., Carvalho, A.C.P.L.F., Freitas, A.A., Holden, N.: Comparing several approaches for hierarchical classification of proteins with decision trees. In: Sagot, M.-F., Walter, M.E.M.T. (eds.) BSB 2007. LNCS, vol. 4643, pp. 126–137. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73731-5_12

    Chapter  Google Scholar 

  3. Finnegan, D.J.: Eukaryotic transposable elements and genome evolution. Trends Genet. 5, 103–107 (1989)

    Article  Google Scholar 

  4. Freitas, A.A.: A critical review of multi-objective optimization in data mining: a position paper. SIGKDD Explor. Newsl. 6(2), 77–86 (2004)

    Article  Google Scholar 

  5. Hollander, M., Wolfe, D.A., Chicken, E.: Nonparametric Statistical Methods. Wiley, New York (2013)

    MATH  Google Scholar 

  6. Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110(1–4), 462–467 (2005)

    Article  Google Scholar 

  7. Kiritchenko, S., Matwin, S., Nock, R., Famili, A.F.: Learning and evaluation in the presence of class hierarchies: application to text categorization. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 395–406. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_34

    Chapter  Google Scholar 

  8. Lipton, Z.C.: The mythos of model interpretability. ar**v preprint ar**v:1606.03490 (2016)

  9. McClintock, B.: The Significance of Responses of the Genome to Challenge. World Scientific Pub. Co., Singapore (1993)

    Google Scholar 

  10. Nakano, F.K., Mastelini, S.M., Barbon, S., Cerri, R.: Improving hierarchical classification of transposable elements using deep neural networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

  11. Nakano, F.K., Pinto, W.J., Pappa, G.L., Cerri, R.: Top-down strategies for hierarchical classification of transposable elements with neural networks. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2539–2546. IEEE (2017)

    Google Scholar 

  12. Nussbaumer, T., et al.: MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41(D1), D1144–D1151 (2012)

    Article  Google Scholar 

  13. Pereira, G.T., Cerri, R.: Hierarchical and non-hierarchical classification of transposable elements with a genetic algorithm. J. Inf. Data Manage. 9(1), 163–178 (2018)

    Google Scholar 

  14. Pereira, G.T., Santos, B.Z., Cerri, R.: A genetic algorithm for transposable elements hierarchical classification rule induction. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2018)

    Google Scholar 

  15. Poursabzi-Sangdeh, F., Goldstein, D.G., Hofman, J.M., Vaughan, J.W., Wallach, H.: Manipulating and measuring model interpretability. ar**v preprint ar**v:1802.07810 (2018)

  16. Santos, B.Z., Pereira, G.T., Nakano, F.K., Cerri, R.: Strategies for selection of positive and negative instances in the hierarchical classification of transposable elements. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 420–425. IEEE (2018)

    Google Scholar 

  17. Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)

    Article  MathSciNet  Google Scholar 

  18. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multilabel classification. Mach. Learn. 73(2), 185–214 (2008)

    Article  Google Scholar 

  19. Wicker, T., et al.: A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8(12), 973–982 (2007)

    Article  Google Scholar 

Download references

Acknowledgment

This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Ní­vel Superior - Brazil (CAPES) - Finance Code 001, as well as by the Sao Paulo Research Foundation (FAPESP), grants 2015/14300-1 and 2016/50457-5.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gean Trindade Pereira , Paulo H. R. Gabriel or Ricardo Cerri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pereira, G.T., Gabriel, P.H.R., Cerri, R. (2019). Hierarchical Classification of Transposable Elements with a Weighted Genetic Algorithm. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11804. Springer, Cham. https://doi.org/10.1007/978-3-030-30241-2_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30241-2_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30240-5

  • Online ISBN: 978-3-030-30241-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation