Adaptive Information-Theoretical Feature Selection for Pattern Classification

Avdiyenko, Liliya; Bertschinger, Nils; Jost, Juergen

doi:10.1007/978-3-319-11271-8_18

Liliya Avdiyenko⁶,
Nils Bertschinger⁶ &
Juergen Jost^6,7

Part of the book series: Studies in Computational Intelligence ((SCI,volume 577))

Included in the following conference series:

International Joint Conference on Computational Intelligence

915 Accesses
2 Citations

Abstract

In order to further a classifier construction, feature selection algorithms reduce the input dimensionality to a subset of the most informative features. Usually, such subset is fixed and chosen on the preprocessing step before the actual classification. However, when it is difficult to find a small number of features sufficient for classification of all data samples, as in cases of the heterogeneous input data, we suggest an adaptive approach assuming selection of different features for every testing sample. The adaptive sequential algorithm proposed here selects features that for a given testing sample maximize the expected reduction of uncertainty about its class, where the uncertainty is updated with the values of the already selected features observed on this testing sample. The provided experiments show that especially in cases of limited amount of training data our adaptive conditional mutual information feature selector outperforms two the most related information-based static and adaptive algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Article 08 February 2022

Wide-ranging approach-based feature selection for classification

Article 17 November 2022

A Feature Selection Approach Based on Information Theory for Classification Tasks

References

Webb, A.: Statisctical Pattern Recognition, pp. 213–226. Arnold, London (1999)
Google Scholar
Narendra, P., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers 28(2), 917–922 (1977)
Article Google Scholar
Ding, C.H.Q., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2), 185–206 (2005)
Article Google Scholar
Abe, S.: Modified backward feature selection by cross validation. In: Proc. of the Thirteenth European Symposium on Artificial Neural Networks, Bruges, Belgium, pp. 163–168 (2005)
Google Scholar
Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of feature ranking methods based on information entropy. In: Proc. of the IEEE International Joint Conference on Neural Networks, Budapest, Hungary, pp. 1415–1419 (2004)
Google Scholar
Battiti, R.: Using mutual information for selecting feature in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)
Article Google Scholar
Bonnlander, B.V., Weigend, A.S.: Selecting input variables using mutual information and nonparametric density estimation. In: International Symposium on Artificial Neural Networks, Taiwan, pp. 42–50 (1994)
Google Scholar
Kwak, N., Choi, C.: Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1667–1671 (2002)
Article Google Scholar
Bonnlander, B.V.: Nonparametric selection of input variables for connectionist learning. PhD thesis, University of Colorado at Boulder (1996)
Google Scholar
Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 252–264 (1991)
Article Google Scholar
Jiang, H.: Adaptive feature selection in pattern recognition and ultra-wideband radar signal analysis. PhD thesis, California Institute of Technology (2008)
Google Scholar
Renninger, L.W., Verghese, P., Coughlan, J.: Where to look next? Eye movements reduce local uncertainty. Journal of Vision 7(3), 1–17 (2007)
Article Google Scholar
Najemnik, J., Geisler, W.S.: Optimal eye movement strategies in visual search. Nature 434, 387–391 (2005)
Article Google Scholar
Geman, D., Jedynak, B.: An active testing model for tracking roads in satellite images. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(1), 1–14 (1996)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory, pp. 12–49. Wiley Interscience, Hoboken (1991)
Google Scholar
Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Journal of Machine Learning Research 13, 27–66 (2012)
MathSciNet Google Scholar
Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics 27, 832–837 (1956)
Article MathSciNet Google Scholar
Parzen, E.: On estimation of a probability density and mode. Annals of Mathematical Statistics 35, 1065–1076 (1962)
Article MathSciNet Google Scholar
Turlach, B.A.: Bandwidth selection in kernel density estimation: a review. In: CORE and Institut de Statistique, pp. 23–493 (1993)
Google Scholar
Silverman, B.W.: Density estimation for statistics and data analysis. Chapman and Hall (1986)
Google Scholar
Zhang, X., King, M.L., Hyndman, R.J.: Bandwidth selection for multivariate kernel density estimation using MCMC. Technical report, Monash University (2004)
Google Scholar
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization, pp. 125–206. John Wiley (1992)
Google Scholar
Johnson, W.E.: Probability: deductive and inductive problems. Mind 41, 421–423 (1932)
Google Scholar
Hubel, D., Wiesel, T.: Brain and visual perception: the story of a 25-year collaboration, p. 106. Oxford University Press US (2005)
Google Scholar
LeCun, J., Cortes, C.: The mnist dataset of handwritten digits, http://yann.lecun.com/exdb/mnist/ (n.d.) (retrieved)
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Deep learning tutorials, http://deeplearning.net/tutorial/lenet.html (n.d.) (retrieved)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, 04103, Leipzig, Germany
Liliya Avdiyenko, Nils Bertschinger & Juergen Jost
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico, 87501, U.S.A.
Juergen Jost

Authors

Liliya Avdiyenko
View author publications
You can also search for this author in PubMed Google Scholar
Nils Bertschinger
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Jost
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liliya Avdiyenko .

Editor information

Editors and Affiliations

University Paris-Est Créteil (UPEC), Créteil, France
Kurosh Madani
Departamento de Engenharia Informatica, University of Coimbra, Coimbra, Portugal
António Dourado Correia
Evolutionary Systems and Biomedical Engineering Lab, Instituto Superior Tecnico IST Systems and Robotics Institute, Lisboa, Portugal
Agostinho Rosa
Polytechnic Institute of Setúbal INSTICC, Setubal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Avdiyenko, L., Bertschinger, N., Jost, J. (2015). Adaptive Information-Theoretical Feature Selection for Pattern Classification. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2012. Studies in Computational Intelligence, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-319-11271-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-11271-8_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11270-1
Online ISBN: 978-3-319-11271-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Adaptive Information-Theoretical Feature Selection for Pattern Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Wide-ranging approach-based feature selection for classification

A Feature Selection Approach Based on Information Theory for Classification Tasks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adaptive Information-Theoretical Feature Selection for Pattern Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis

Wide-ranging approach-based feature selection for classification

A Feature Selection Approach Based on Information Theory for Classification Tasks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation