Non-Parametric Kernel Learning with robust pairwise constraints

Chen, Changyou; Zhang, Jun**; He, Xuefang; Zhou, Zhi-Hua

doi:10.1007/s13042-011-0048-6

Non-Parametric Kernel Learning with robust pairwise constraints

Original Article
Published: 17 September 2011

Volume 3, pages 83–96, (2012)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Changyou Chen¹,
Jun** Zhang²,
Xuefang He³ &
…
Zhi-Hua Zhou⁴

338 Accesses
Explore all metrics

Abstract

For existing kernel learning based semi-supervised clustering algorithms, it is generally difficult to scale well with large scale datasets and robust pairwise constraints. In this paper, we propose a new Non-Parametric Kernel Learning ( NPKL ) framework to deal with these problems. We generalize the graph embedding framework into kernel learning, by reforming it as a semi-definitive programming ( SDP ) problem, smoothing and avoiding over-smoothing the functional Hilbert space with Laplacian regularization. We propose two algorithms to solve this problem. One is a straightforward algorithm using SDP to solve the original kernel learning problem, dented as TRAnsductive Graph Embedding Kernel ( TRAGEK ) learning; the other is to relax the SDP problem and solve it with a constrained gradient descent algorithm. To accelerate the learning speed, we further divide the data into groups and used the sub-kernels of these groups to approximate the whole kernel matrix. This algorithm is denoted as Efficient Non-PArametric Kernel Learning ( ENPAKL ). The advantages of the proposed NPKL framework are (1) supervised information in the form of pairwise constraints can be easily incorporated; (2) it is robust to the number of pairwise constraints, i.e., the number of constraints does not affect the running time too much; (3) ENPAKL is efficient to some extent compared to some related kernel learning algorithms since it is a constraint gradient descent based algorithm. Experiments for clustering based on the learned kernels show that the proposed framework scales well with the size of datasets and the number of pairwise constraints. Further experiments for image segmentation indicate the potential advantages of the proposed algorithms over the traditional k-means and N-cut clustering algorithms for image segmentation in term of segmentation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Embedded non-parametric kernel learning for kernel clustering

Article 10 August 2016

A Kernel-Learning Approach to Semi-supervised Clustering with Relative Distance Comparisons

Multiple kernel clustering with structure-preserving and block diagonal property

Article 08 June 2023

Notes

Note that although we want to learn a kernel matrix from the aspect of graph embedding, it has little relationship with some algorithms using graph embedding framework such as marginal factor analysis (MFA) [6]. The reason is that such kinds of algorithms aim at supervised learning for classification, thus there is no need to compare the proposed algorithm with them.
We use such kind of convention without declaration below.
More satisfactory results might be attained if employing more sophisticated regularizers.
Note that for two positive semi-definite matrices A and B, \( tr\{AB\} \geq 0\) holds, but not always >0. We thus can not drop the last two constraints directly. Otherwise it is easy to run into numerical problem. Because the constraint still holds if 〈l ^p, K _v〉 is equal to a small enough positive constant, but this is far from the goal that 〈l ^p, K _v〉 should be as large as possible.
There are two points to be declared here. One is that it is easy to prove that K _X in Eq. 32 is a positive semi-definite matrix if K _D is positive semi-definite, this property makes K _X of the whole dataset still be a kernel matrix, which does not violate our objective. The second point is that for unknown points, in order to constrain the feature space be a hyperball, we need to normalize the weights calculated in Eq. 30 by dividing the weights by a normalized scaler w ^T_i K _D w _i, that is, \(w_i = \frac{w_i}{w_i^TK_Dw_i}. \)
This experiment was run on an Intel Core 2 Duo CPU T6400 2.00 GHZ with 2 GB of DDR2 memory
We used an efficient implementation of the N-cut algorithm in [38]
The k-means algorithm takes about 1 s for one image, the N-cut algorithm takes about 2 s, while ENPAKL needs about 5 min, and PCP cannot run in this experiment because the corresponding data is too large. How to accelerate the speed of the proposed algorithm further is our future work.

References

Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
Google Scholar
Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of the 22nd international conference on machine learning, pp 457–464
Li Z, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: Proceedings of the 25th international conference on machine learning, pp 576–583
Yeung D-Y, Chang H, Dai G (2007) A scalable kernel-based algorithm for semi-supervised metric learning. In: Proceedings of the 20th international joint conference on artificial intelligence, pp 1138–1143
Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. In: Advances in neural information processing systems, vol 21
Yan SC, Xu D, Zhang BY, Zhang H-J, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Article Google Scholar
Yang J, Yan SC, Fu Y, Li XL, Huang TS (2008) Non-negative graph embedding. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition
Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Cortes C, Mohri M, Rostamizadeh A (2010) Two-stage learning kernel algorithms. In: Proceedings of the 27th international conference on machine learning
Cortes C, Mohri M, Rostamizadeh A (2010) Generalization bounds for learning kernels. In: Proceedings of the 27th international conference on machine learning
** R, Hoi SCH, Yang T (2010) Online multiple kernel learning: algorithms and mistake bounds. In: Proceedings of the 21st international conference algorithmic learning theory, pp 390–404
Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition—part II. IEEE Trans Syst Man Cybern Part B 29(6):786–801
Article Google Scholar
Yang MS, Wu KL, Hsieh JN, Yu J (2008) Alpha-cut implemented fuzzy clustering algorithms and switching regressions. IEEE Trans Syst Man Cybern Part B 38(3):904–915
Article Google Scholar
Trappey AJC, Trappey CV, Hsu F-C, Hsiao DW (2009) A fuzzy ontological knowledge document clustering methodology. IEEE Trans Syst Man Cybern Part B 39(3):123–131
Article Google Scholar
**ong H, Wu J, Chen J (2009) k-Means clustering versus validation measures: a data-distribution perspective. IEEE Trans Syst Man Cybern Part B 39(2):318–331
Article Google Scholar
Basu S, Bilenko M, Mooney R (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 59–68
Bilenko M, Basu S, Mooney R (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st international conference on machine learning, pp 81–89
Kamvar SD, Klein D, Manning C (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, pp 561–566
**ng EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems, vol 15
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning, pp. 798–803
Hong Y, Kwong S (2009) Learning assignment order of instances for the constrained k-means clustering algorithm. IEEE Trans Syst Man Cybern Part B 39(2):568–574
Article Google Scholar
Lu Z, Leen TK (2005) Semi-supervised learning with penalized probabilistic clustering. In: Advances in neural information processing systems, vol 17, pp 849–856
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965
MathSciNet MATH Google Scholar
Hertz T, Bar-Hillel A, Weinshall D (2004) Boosting margin based distance function for clustering. In: Proceedings of the 21st international conference on machine learning, pp 393–400
Xu Z, Dai M, Meng D (2009) Fast and efficient strategies for model selection of Gaussian support vector machine. IEEE Trans Syst Man Cybern Part B 39(5):1292–1307
Article Google Scholar
Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556
Bousquet O, Herrmann D (2003) On the complexity of learning the kernel matrix. In: Advances in neural information processing systems, vol 15, pp 399–406
Hoi SCH, ** R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In: Proceedings of the 24th international conference on machine learning, pp 361–368
Zhuang J, Tsang IW, Hoi SCH (2009) SimpleNPKL: simple non-parametric kernel learning. In: Proceedings of the 26th international conference on machine learning, pp 1273–1280
Zhou DY, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043
Sturm JF (1999) Using SeDuMi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(2):625–653
Article MathSciNet Google Scholar
Adler RL, Dedieu JP, Margulies JY, Martens M, Shub M (2002) Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J Numer Anal 22(3):359–390
Article MathSciNet MATH Google Scholar
Golub GH, Loan CFV (1996) Matrix computation. Johns Hopkins University Press, Baltimore
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/∼mlearn/MLRepository.html
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Criminisi A MSRC Category Image Database (v2). MSRC. http://research.microsoft.com/en-us/um/people/antcrim/data_objrec/msrc_objcategimagedatabase_v2.zip
Shi J MATLAB Normalized Cuts Segmentation Code. http://www.cis.upenn.edu/jshi/software/
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Tan B, Zhang J, Wang L (2011) Semi-supervised elastic net for pedestrian counting. Pattern Recognit 44(10–11):2297–2304
Article Google Scholar
Duchenne O, Audibert J-Y, Keriven R, Ponce J, Segonne F (2008) Segmentation by transduction. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition

Download references

Acknowledgments

This work was supported in part by the NFSC (No. 60975044, 61073097) and 973 Program (No. 2010CB327900).

Author information

Authors and Affiliations

Research School of Information Sciences and Engineering, The Australian National University, Canberra, Australia
Changyou Chen
Shanghai Key Laboratory of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, China
Jun** Zhang
School of Software and Information Engineering, Bei**g University of Aeronautics & Astronautics, Beihai, China
Xuefang He
National Key Laboratory for Novel Software Technology, Nan**g University, Nan**g, China
Zhi-Hua Zhou

Authors

Changyou Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun** Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuefang He
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changyou Chen.

Additional information

This work was done when C. Chen was at Fudan University, Shanghai, China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Zhang, J., He, X. et al. Non-Parametric Kernel Learning with robust pairwise constraints. Int. J. Mach. Learn. & Cyber. 3, 83–96 (2012). https://doi.org/10.1007/s13042-011-0048-6

Download citation

Received: 20 November 2010
Accepted: 01 September 2011
Published: 17 September 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s13042-011-0048-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Non-Parametric Kernel Learning with robust pairwise constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Embedded non-parametric kernel learning for kernel clustering

A Kernel-Learning Approach to Semi-supervised Clustering with Relative Distance Comparisons

Multiple kernel clustering with structure-preserving and block diagonal property

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Non-Parametric Kernel Learning with robust pairwise constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Embedded non-parametric kernel learning for kernel clustering

A Kernel-Learning Approach to Semi-supervised Clustering with Relative Distance Comparisons

Multiple kernel clustering with structure-preserving and block diagonal property

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation