Abstract
A key issue in the practical applicability of the support vector machine methodology is the identification of the support vectors in very large data sets, a problem to which a great deal of attention has been given in the literature. In the present article we propose methods based on sampling and nearest neighbors, that allow for an efficient implementation of an approximate solution to the classification problem and, at least in some problems, will help in identifying a significant fraction of the support vectors in large data sets at low cost. The performance of the proposed method is evaluated in different examples and some of its theoretical properties are discussed.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-015-1956-8/MediaObjects/10479_2015_1956_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-015-1956-8/MediaObjects/10479_2015_1956_Fig2_HTML.gif)
Similar content being viewed by others
Notes
\(\hat{d}\) represents the median of the squared euclidean distance between data from different classes in the feature space, as suggested by Wu and Wang (2009).
For more information on the LIBSVM library and the datasets go to www.csie.ntu.edu.tw/~cjlin/libsvm/
For more information on the e1071 package go to cran.r-project.org/web/packages/e1071/.
References
Amato, G., Rabitti, F., Savino, P., & Zezula, P. (2003). Region proximity in metric spaces and its use for approximate similarity search. ACM Transactions on Information Systems, 2(21), 192–227.
Ben-Hur, A., Ong, C.-S., Sonnenburg, S., Scholkopf, B., & Ratsch, G. (2008). Support vector machines and kernels for computational biology. PLoS Computer Biology, 4(10). http://svmcompbio.tuebingen.mpg.de/.
Brito, M. R., Chavez, E. L., Quiroz, A. J., & Yukich, J. E. (1997). Connectivity of the mutual k-nearest neighbor graph in outlier detection and clustering. Statistics and Probability Letters, 35, 33–42.
Brito, M. R., Quiroz, A. J., & Yukich, J. E. (2002). Graph theoretic procedures for dimension identification. Journal of Multivariate Analysis, 81, 67–84.
Brito, M. R., Quiroz, A. J., & Yukich, J. E. (2013). Intrinsic dimension identification via graph-theoretic methods. Journal of Multivariate Analysis, 116, 263–277.
Byvatov, E., & Schneider, G. (2003). Support vector machine applications in bioinformatics. Applied Bioinformatics, 2(2), 67–77.
Cao, B., Zhan, D., & Wu. X. (2009). Application of svm in financial research. In CSO international joint conference on computational sciences and optimization, Vol. 2, pp. 507–511.
Chavez, E., Navarro, G., Baez-Yates, R., & Marroquin, J. L. (2001). ACM Computing Surveys, 3(33), 273–321.
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 1–25.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.
Fan, R.-E., Chen, P.-H., & Lin, C.-J. (2005). Working set selection using second order information for training support vector machines. Journal of Machine Learning Research, 6, 1889–1918.
Ferris, M. C., & Munson, T. S. (2002). Interior point methods for massive support vector machines. SIAM Journal on Optimization, 13(3), 783–804.
Friedman, J. H., Baskett, F., & Shustek, J. (1975). An algorithm for finding nearest neighbors. IEEE Transactions on Computers, C–24(10), 1000–1006.
Friedman, J. H., Bentley, J. L., & Finkel, R. (1978). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3), 209–226.
Gonzalez-Lima, M. D., Hager, W. W., & Zhang, H. (2011). An affine-scaling interior-point method for continuous knapsack constraints with application to support vector machines. SIAM Journal on Optimization, 21(1), 361–390.
Janson, S. (2002). On concentration of probability. Contemporary Combinatorics, 10, 289–301.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In C. Nédellec & C. Rouveirol (Eds.), Proceedings of the 10th European conference on machine learning (ECML-98), pp. 137–142. Springer, Berlin.
Jung, J. H., Leary, D. P. O., & Tits, A. L. (2008). Adaptive constraint reduction for training support vector machines. Electronic Transactions on Numerical Analysis, 31, 156–177.
Lin, C.-J. (2001). On the convergence of the decomposition method for support vector machines. IEEE Transactions on Neural Networks, 12, 1288–1298.
Navarro, G. (2002). Searching in metric spaces by spatial approximation. The VLDB Journal, 11(1), 28–46.
Noble, W. S. (2004). Support vector machine applications in computational biology. In B. Schoelkopf, K. Tsuda, & J.-P. Vert (Eds.), Kernel methods in computational biology (pp. 71–92). Cambridge: MIT Press.
Osuna, E. E., Freund, R., & Girosi, F. (1997a). Support vector machines: Training and applications. Technical report A.I. memo no. 1602, CBCL paper no. 144, Massachusetts Institute of Technology, Cambridge, MA, USA.
Osuna, E. E., Freund, R., & Girosi, F. (1997b). Training support vector vector machines: An application to face detection. In IEEE conference on computer vision and pattern recognition, pp. 130–136.
Patella, M., & Ciaccia, P. (2009). Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms, 1(7), 36–48.
Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods, support vector learning (pp. 41–65). Cambridge, MA: MIT Press.
Schilling, M. F. (1986). Mutual and shared neighbor probabilities: Finite and infinite dimensional results. Advances in Applied Probability, 18, 388–405.
Serafini, T., Zanghirati, G., & Zanni, L. (2003). Gradient projection methods for quadratic programs and applications in training support vector machines. Optimization Methods and Software, 20, 353–378.
Serafini, T., & Zanni, L. (2005). On the working set selection in gradient-based descomposition techniques for support vector machines. Optimization Methods and Software, 20, 586–593.
Uribe, R., Navarro, G., Barrientos, R. J., Marín, M. (2006). An index data structure for searching in metric space databases. In Proceeding of international conference on computational science 2006 (ICC 2006). Lecture notes in computer science (Vol. 3991, pp. 611–617). Springer.
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Woodsend, K., & Gondzio, J. (2011). Exploiting separability in large scale linear support vector machine training. Computational Optimization and Applications, 49, 241–269.
Wu, K.-P., & Wang, S.-D. (2009). Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognition, 42, 710–717.
Zanni, L. (2006). An improved gradient projection-based decomposition techniques for support vector machines. Computational Management Science, 3, 131–145.
Zarruk, D. (2012) Resolución del problema de Máquinas de Vectores de Soporte mediante búsqueda de k-vecinos más cercanos. Undergraduate mathematics thesis, Departamento de Matemáticas, Universidad de los Andes, Bogotá, Colombia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Camelo, S.A., González-Lima, M.D. & Quiroz, A.J. Nearest neighbors methods for support vector machines. Ann Oper Res 235, 85–101 (2015). https://doi.org/10.1007/s10479-015-1956-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-015-1956-8