Adaptive Information-Theoretical Feature Selection for Pattern Classification

  • Conference paper
Computational Intelligence (IJCCI 2012)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 577))

Included in the following conference series:

Abstract

In order to further a classifier construction, feature selection algorithms reduce the input dimensionality to a subset of the most informative features. Usually, such subset is fixed and chosen on the preprocessing step before the actual classification. However, when it is difficult to find a small number of features sufficient for classification of all data samples, as in cases of the heterogeneous input data, we suggest an adaptive approach assuming selection of different features for every testing sample. The adaptive sequential algorithm proposed here selects features that for a given testing sample maximize the expected reduction of uncertainty about its class, where the uncertainty is updated with the values of the already selected features observed on this testing sample. The provided experiments show that especially in cases of limited amount of training data our adaptive conditional mutual information feature selector outperforms two the most related information-based static and adaptive algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Webb, A.: Statisctical Pattern Recognition, pp. 213–226. Arnold, London (1999)

    Google Scholar 

  2. Narendra, P., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers 28(2), 917–922 (1977)

    Article  Google Scholar 

  3. Ding, C.H.Q., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2), 185–206 (2005)

    Article  Google Scholar 

  4. Abe, S.: Modified backward feature selection by cross validation. In: Proc. of the Thirteenth European Symposium on Artificial Neural Networks, Bruges, Belgium, pp. 163–168 (2005)

    Google Scholar 

  5. Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of feature ranking methods based on information entropy. In: Proc. of the IEEE International Joint Conference on Neural Networks, Budapest, Hungary, pp. 1415–1419 (2004)

    Google Scholar 

  6. Battiti, R.: Using mutual information for selecting feature in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)

    Article  Google Scholar 

  7. Bonnlander, B.V., Weigend, A.S.: Selecting input variables using mutual information and nonparametric density estimation. In: International Symposium on Artificial Neural Networks, Taiwan, pp. 42–50 (1994)

    Google Scholar 

  8. Kwak, N., Choi, C.: Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1667–1671 (2002)

    Article  Google Scholar 

  9. Bonnlander, B.V.: Nonparametric selection of input variables for connectionist learning. PhD thesis, University of Colorado at Boulder (1996)

    Google Scholar 

  10. Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 252–264 (1991)

    Article  Google Scholar 

  11. Jiang, H.: Adaptive feature selection in pattern recognition and ultra-wideband radar signal analysis. PhD thesis, California Institute of Technology (2008)

    Google Scholar 

  12. Renninger, L.W., Verghese, P., Coughlan, J.: Where to look next? Eye movements reduce local uncertainty. Journal of Vision 7(3), 1–17 (2007)

    Article  Google Scholar 

  13. Najemnik, J., Geisler, W.S.: Optimal eye movement strategies in visual search. Nature 434, 387–391 (2005)

    Article  Google Scholar 

  14. Geman, D., Jedynak, B.: An active testing model for tracking roads in satellite images. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(1), 1–14 (1996)

    Article  Google Scholar 

  15. Cover, T.M., Thomas, J.A.: Elements of information theory, pp. 12–49. Wiley Interscience, Hoboken (1991)

    Google Scholar 

  16. Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Journal of Machine Learning Research 13, 27–66 (2012)

    MathSciNet  Google Scholar 

  17. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics 27, 832–837 (1956)

    Article  MathSciNet  Google Scholar 

  18. Parzen, E.: On estimation of a probability density and mode. Annals of Mathematical Statistics 35, 1065–1076 (1962)

    Article  MathSciNet  Google Scholar 

  19. Turlach, B.A.: Bandwidth selection in kernel density estimation: a review. In: CORE and Institut de Statistique, pp. 23–493 (1993)

    Google Scholar 

  20. Silverman, B.W.: Density estimation for statistics and data analysis. Chapman and Hall (1986)

    Google Scholar 

  21. Zhang, X., King, M.L., Hyndman, R.J.: Bandwidth selection for multivariate kernel density estimation using MCMC. Technical report, Monash University (2004)

    Google Scholar 

  22. Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization, pp. 125–206. John Wiley (1992)

    Google Scholar 

  23. Johnson, W.E.: Probability: deductive and inductive problems. Mind 41, 421–423 (1932)

    Google Scholar 

  24. Hubel, D., Wiesel, T.: Brain and visual perception: the story of a 25-year collaboration, p. 106. Oxford University Press US (2005)

    Google Scholar 

  25. LeCun, J., Cortes, C.: The mnist dataset of handwritten digits, http://yann.lecun.com/exdb/mnist/ (n.d.) (retrieved)

  26. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Deep learning tutorials, http://deeplearning.net/tutorial/lenet.html (n.d.) (retrieved)

  27. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liliya Avdiyenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Avdiyenko, L., Bertschinger, N., Jost, J. (2015). Adaptive Information-Theoretical Feature Selection for Pattern Classification. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2012. Studies in Computational Intelligence, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-319-11271-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11271-8_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11270-1

  • Online ISBN: 978-3-319-11271-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation