Abstract
Kernel based methods are increasingly being used for data modeling because of their conceptual simplicity and outstanding performance on many tasks. However, in practice the kernel function is often chosen using trial-and-error heuristics. In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. We propose a quantity to capture this notion, which we call alignment. We study its theoretical properties, and derive a series of simple algorithms for adapting a kernel to the targets. This produces a series of novel methods for both transductive and inductive inference, kernel combination and kernel selection for both classification and regression problems that are computationally feasible for large problems. The algorithms are tested on publicly available datasets and are shown to exhibit good performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
F. Bach and M.I. Jordan. Kernel independent components analysis. Department of computer science, University of California, Berkeley, Berkeley, CA, 2001.
S. Ben-David, N. Eiron, and H.U. Simon. Non-embedability in Euclidean half spaces. In NIPS Workshop on New Perspectives in Kernel Methods, 2000.
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.
M.W. Berry, B. Hendrickson, and P. Raghaven. Sparse matrix re-ordering schemes for browsing hypertext. In J. Renegar, M. Shub, and S. Smale, editors, The Mathematics of Numerical Analysis, pages 91–123. American Mathematical Society, 1996.
A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In International Conference on Machine Learning (ICML) 2001, 2001.
B. Carl and I. Stephani. Entropy, compactness and the approximation of operators. Cambridge University Press, Cambridge, UK, 1990.
N. Cristianini,, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. In International Conference on Machine Learning (ICML 2000), 2000.
N. Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola. On kernel target alignment. In Neural Information Processing Systems 14 (NIPS 14), 2001.
N. Cristianini and J. Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of mathematics. Springer, New York, 1996.
P. Drineas, R. Kannan, S. Vampala, A. Frieze, and V. Vinay. Clustering in large graphs and matrices. In Proceedings of the 10th ACM-SIAM Symposium on Discrete Algorithms, 1999.
J. Forster, N. Schmitt, and H. U. Simon. Estimating the optimal margins of embeddings in euclidean half spaces. In Submitted Computational Learning Theory (COLT) 2001, 2001.
R. Herbrich. Learning Kernel Classifiers-Theory and Algorithms. MIT Press, 2002.
N. Lanckriet, N. Cristianini, P. Bartlett, L. El-Ghoui, and M.I Jordan. Learning the kernel matrix using semi-definite programming. In International Conference on Machine Learning (ICML 2002), 2002.
C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188, 1989.
P. Pavlidas, J. Weston, J. Cai, and W.N. Grundy. Gene functional classification from heterogeneous data. In Proceedings of the Fifth International Conference on Computational Molecular Biology, 2001.
A. Pothen, H. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal Matrix Analysis, 11(3):242–248, 1990.
S. Saitoh. Theory of Reproducing Kernels and its Applications. Longman Scientific & Technical, Harlow, England, 1988.
B. Scholkopf and A. Smola. Learning With Kernels-Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, 2002.
B. Scholkopf, A. Smola, and K. Muller. Kernel principal components analysis. In Advances in Kernel Methods-Support Vector Learning. MIT, 2000.
J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.
J. Shi and J. Malik. Normalised cuts and image segmentation. In In IEEE Conf. on Computer Vision and Pattern Recognition, pages 731–737, 1997.
A. Smola and B. Scholkopf. Sparse greedy matrix approximation for machine learning. In In Proceedings International Conference on Machine Learning 2000, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J. (2006). On Kernel Target Alignment. In: Holmes, D.E., Jain, L.C. (eds) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol 194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33486-6_8
Download citation
DOI: https://doi.org/10.1007/3-540-33486-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30609-2
Online ISBN: 978-3-540-33486-6
eBook Packages: EngineeringEngineering (R0)