Abstract
Existing feature extraction methods explore either global statistical or local geometric information underlying the data. In this paper, we propose a general framework to learn features that account for both types of information based on variational optimization of nonparametric learning criteria. Using mutual information and Bayes error rate as example criteria, we show that high-quality features can be learned from a variational graph embedding procedure, which is solved through an iterative EM-style algorithm where the E-Step learns a variational affinity graph and the M-Step in turn embeds this graph by spectral analysis. The resulting feature learner has several appealing properties such as maximum discrimination, maximum-relevance- minimum-redundancy and locality-preserving. Experiments on benchmark face recognition data sets confirm the effectiveness of our proposed algorithms.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Belkin, M., Niyogi, P.: Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In: Proceeding of Advances in Neural Information Processing Systems 14 (NIPS 2001), pp. 585–591 (2001)
Bollacker, K.D., Ghosh, J.: Linear Feature Extractors Based on Mutual Information. In: Proceeding of the 13th International Conference on Pattern Recognition (ICPR 1996) (1996)
Cai, D., He, X., Han, J.: SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis. IEEE Transactions on Knowledge and Data Engineering 20(1), 1–12 (2008)
Choi, E.: Feature Extraction Based on the Bhattacharyya Distance. Pattern Recognition 36, 1703–1709 (2003)
Fan, J., Li, R.: Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery. In: Proceeding of Regional Conference in Mathematics (AMS 1996), vol. 3, pp. 595–622 (1996)
Chung, F.R.K.: Spectral Graph Theory. In: Proceeding of Regional Conference in Mathematics (AMS 1992), vol. 92 (1997)
Fukunnaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1991)
Guyon, I., Elissee, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
He, X., Niyogi, P.: Locality Preserving Projections. In: Proceeding of Advances in Neural Information Processing Systems 16 (NIPS 2003) (2003)
Hild II, K.E., Erdogmus, D., Torkkola, K., Principe, C.: Feature Extraction Using Information-Theoretic Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1385–1392 (2006)
Jaakkola, T.: Tutorial on Variational Approximation Methods. In: Opper, M., Saad, D. (eds.) Advanced Mean Field Methods: Theory and Practice, pp. 129–159. MIT Press, Cambridge (2000)
Kaski, S., Peltonen, J.: Informative Discriminant Analysis. In: Proceeding of the 20th Annual International Conference on Machine Learning (ICML 2003), pp. 329–336 (2003)
Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Proceeding of the 13th International Conference on Machine Learning (ICML 1996), pp. 284–292 (1996)
Paninski, L.: Estimation of Entropy and Mutual Information. Neural Computation 15, 1191–1253 (2003)
Peng, H., Long, F., Ding, C.: Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Principe, J.C., Fisher III, J.W., Xu, D.: Information Theoretic Learning. In: Haykin, S. (ed.) Unsupervised Adaptive Filtering. Wiley, Chichester (2000)
Roweis, S., Saul, L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290(22), 2323–2326 (2000)
Saon, G., Padmanabhan, M.: Minimum Bayes Error Feature Selection for Continuous Speech Recognition. In: Proceeding of the16th Annual Conference on Neural Information Processing Systems (NIPS 2002), pp. 800–806 (2002)
Saul, L.K., Weinberger, K.Q., Ham, J.H., Sha, F., Lee, D.D.: Spectral Methods for Dimensionality Reduction. In: Chapelle, O., et al. (eds.) Semisupervised Learning, MIT Press, Cambridge (2006)
Sugiyama, M.: Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis. Journal of Machine Learning Research 8, 1027–1061 (2007)
Tenenbaum, J.B., Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290(22), 2319–2323 (2000)
Torkkola, K.: Feature Extraction by Nonparametric Mutual Information Maximization. Journal of Machine Learning Research 3, 1415–1438 (2003)
Weinberger, K.Q., Sha, F., Saul, L.K.: Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st Annual International Confernence on Machine Learning (ICML 2004), pp. 839–846 (2004)
Yan, S.C., Xu, D., Zhang, B.Y., Zhang, H.J., Yang, Q., Lin, S.: Graph Embedding and Extensions: A General Framework for Dimensionality Reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 40–51 (2007)
Yang, S.H., Hu, B.G.: Discriminative Feature Selection by Nonparametric Bayes Error Minimization. In: Knowledge and Information Systems (KAIS) (to appear)
Yang, H., Moody, J.: Data Visualization and Feature Selection: New Algorithms for Nongaussian Data. In: Proceeding of the 14th Annual Conference on Neural Information Processing Systems (NIPS 2000), pp. 687–693 (2000)
Zelnik-Manor, L., Perona, P.: Self-tuning Spectral Clustering. In: Proceeding of the 18th Neural Information Processing Systems (NIPS 2004), pp. 1601–1608 (2004)
Zhu, X.: Semi-Supervised Learning Literature Survey. Techincal Report 1530, Department of Computer Sciences, University of Wisconsin-Madison (2005)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceeding of the 24th Annual International Conference on Machine Learning (ICML 2007), pp. 1151–1157 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, SH., Zha, H., Zhou, S.K., Hu, BG. (2009). Variational Graph Embedding for Globally and Locally Consistent Feature Extraction. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-04174-7_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)