Distributed independent vector machine for big data classification problems

Almaspoor, Mohammad Hassan; Safaei, Ali A.; Salajegheh, Afshin; Minaei-Bidgoli, Behrouz

doi:10.1007/s11227-023-05711-4

Distributed independent vector machine for big data classification problems

Published: 06 November 2023

Volume 80, pages 7207–7244, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mohammad Hassan Almaspoor¹,
Ali A. Safaei²,
Afshin Salajegheh¹ &
…
Behrouz Minaei-Bidgoli³

107 Accesses
Explore all metrics

Abstract

In recent years, various studies have been conducted on SVMs and their applications in different area. They have been developed significantly in many areas. SVM is one of the most robust classification and regression algorithms that plays a significant role in pattern recognition. However, SVM has not been developed significantly in some areas like large-scale datasets, unbalanced datasets, and multiclass classification. Efficient SVM training in large-scale datasets is of great importance in the big data era. However, as the number of samples increases, the time and memory required to train SVM increase, making SVM impractical even for a medium-sized problem. With the emergence of big data, this problem becomes more significant. This paper presents a novel distributed method for SVM training in which a very small subset of training samples is used for classification, which reduces the problem size and thus the required memory and computational resources. The solution of this problem almost converges to standard SVM. This method includes three steps: first, detecting a subset of distributed training samples, second, creating local models of SVM and obtaining partial vectors, and finally combining the partial vectors and obtaining the global vector and the final model. In addition, the datasets which suffer from unbalanced number of samples and tend to the majority class, the proposed method balances the samples of the two classes and it can be used in unbalanced datasets. The empirical results show that using this method is efficient for large-scale problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

A fast classification strategy for SVM on the large-scale high-dimensional datasets

Article 18 April 2017

High-Performance Solution of the Two-Class SVM Problem for Big Data Sets by the Mean Decision Rule Method

Learning Biased SVM with Weighted Within-Class Scatter for Imbalanced Classification

Article 16 September 2019

Notes

References

Dhamecha TI, Noore A, Singh R, Vatsa M (2019) Between-subclass piece-wise linear solutions in large scale kernel SVM learning. Pattern Recognit 95:173–190. https://doi.org/10.1016/j.patcog.2019.04.012
Article Google Scholar
Tsai CW, Lai CF, Chao HC, Vasilakos AV (2015) Big data analytics: a survey. J Big Data 2(1):21. https://doi.org/10.1186/s40537-015-0030-3
Article Google Scholar
Shen XJ, Mu L, Li Z, Wu HX, Gou JP, Chen X (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197. https://doi.org/10.1016/j.neucom.2014.10.102
Article Google Scholar
Peng S, Hu Q, Dang J, Wang W (2020) Optimal feasible step-size based working set selection for large scale SVMs training. Neurocomputing 407:366–375. https://doi.org/10.1016/j.neucom.2020.05.054
Article Google Scholar
Sun BY, Huang DS, Fang HT (2005) Lidar signal denoising using least-squares support vector machine. IEEE Signal Process Lett 12(2):101–104. https://doi.org/10.1109/LSP.2004.836938
Article Google Scholar
Chen P, Wang B, Wong HS, Huang DS (2007) Prediction of protein B-factors using multi-class bounded SVM. Protein Peptide Lett 14(2):185–190. https://doi.org/10.2174/092986607779816078
Article Google Scholar
Liang X, Zhu L, Huang DS (2017) Multi-task ranking SVM for image cosegmentation. Neurocomputing 247:126–136. https://doi.org/10.1016/J.NEUCOM.2017.03.060
Article Google Scholar
Cervantes J, García Lamont F, López-Chau A, Rodríguez Mazahua L, Sergio Ruíz J (2015) Data selection based on decision tree for SVM classification on large data sets. Appl Soft Comput 37:787–798. https://doi.org/10.1016/J.ASOC.2015.08.048
Article Google Scholar
Naik VA, Desai AA (2017) Online handwritten Gujarati character recognition using SVM, MLP, and K-NN, 8th Int. Conf Comput Commun Netw Technol ICCCNT. https://doi.org/10.1109/ICCCNT.2017.8203926
Article Google Scholar
Bhowmik TK, Ghanty P, Roy A, Parui SK (2009) SVM-based hierarchical architectures for handwritten Bangla character recognition. Int J Doc Anal Recognit IJDAR 12:97–108. https://doi.org/10.1007/S10032-009-0084-X
Article Google Scholar
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing. https://doi.org/10.1016/j.neucom.2019.10.118
Article Google Scholar
Gov I (2003) Sparseness of support vector machines Ingo Steinwart. J Mach Learn Res 4:1071–1105. https://doi.org/10.5555/945365.964289
Article Google Scholar
Zheng J, Shen F, Fan H, Zhao J (2013) An online incremental learning support vector machine for large-scale data. Neural Comput Appl 22(5):1023–1035. https://doi.org/10.1007/s00521-011-0793-1
Article Google Scholar
Pratama RFW, Purnami SW, Rahayu SP (2018) Boosting support vector machines for imbalanced microarray data. Proced Comput Sci 144:174–183. https://doi.org/10.1016/j.procs.2018.10.517
Article Google Scholar
Lee YJ and Mangasarian OL (2001) RSVM: Reduced support vector machines, pp 1–17, https://doi.org/10.1137/1.9781611972719.13
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361. https://doi.org/10.1016/j.neucom.2017.01.026
Article Google Scholar
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167. https://doi.org/10.1023/A:1009715923555
Article Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Springer New York. https://doi.org/10.1007/978-1-4757-3264-1
Article Google Scholar
Steinwart I (2004) Sparseness of support vector machines. J Mach Learn Res 4(6):1071–1105. https://doi.org/10.1162/1532443041827925
Article MathSciNet Google Scholar
Li X, Cervantes J, Yu W (2010) A novel SVM classification method for large data sets, Proc—2010. IEEE Int Conf Granul Comput GrC 2010:297–302. https://doi.org/10.1109/GrC.2010.46
Article Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425. https://doi.org/10.1109/72.991427
Article Google Scholar
Orabona F, Castellini C, Caputo B, Jie L, Sandini G (2010) On-line independent support vector machines. Pattern Recognit 43(4):1402–1412. https://doi.org/10.1016/j.patcog.2009.09.021
Article Google Scholar
Rojas-Dominguez A, Padierna LC, Carpio Valadez JM, Puga-Soberanes HJ, Fraire HJ (2017) Optimal hyper-parameter tuning of SVM classifiers with application to medical diagnosis. IEEE Access 6:7164–7176. https://doi.org/10.1109/ACCESS.2017.2779794
Article Google Scholar
Zhou S (2022) Sparse SVM for sufficient data reduction. IEEE Trans Pattern Anal Mach Intell 44(9):5560–5571. https://doi.org/10.1109/TPAMI.2021.3075339
Article Google Scholar
Dong JX, Krzyzak A, Suen CY (2005) Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans Pattern Anal Mach Intell 27(4):603–618. https://doi.org/10.1109/TPAMI.2005.77
Article Google Scholar
Joachims T (2006) Training linear SVMs in linear time, Proc. ACM SIGKDD. Int Conf Knowl Discov Data Min 2006:217–226. https://doi.org/10.1145/1150402.1150429
Article Google Scholar
Graf HP, Cosatto E, Bottou L, Durdanovic I and Vapnik V (2005) Parallel support vector machines : the cascade SVM, Adv Neural Inf Process Syst, pp 521–528
Do TN, Poulet F (2006) Classifying one billion data with a new distributed SVM algorithm. RIVF. 760:59–66. https://doi.org/10.1109/RIVF.2006.1696420
Article Google Scholar
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
Article Google Scholar
Navia-Vázquez A, Gutiérrez-González D, Parrado-Hernández E, Navarro-Abellán JJ (2006) Distributed support vector machines. IEEE Trans Neural Netw 17(4):1091–1097. https://doi.org/10.1109/TNN.2006.875968
Article Google Scholar
Lu Y, Roychowdhury V, Vandenberghe L (2008) Distributed parallel support vector machines in strongly connected networks. IEEE Trans Neural Netw 19(7):1167–1178. https://doi.org/10.1109/TNN.2007.2000061
Article Google Scholar
Chang EY et al (2009) PSVM Parallelizing support vector machines on distributed computers. Adv Neural Inf Process Syst Proc Conf 2:1–8. https://doi.org/10.1007/978-3-642-20429-6_10
Article Google Scholar
Alham NK, Li M, Liu Y, Hammoud S (2011) A MapReduce-based distributed SVM algorithm for automatic image annotation. Comput Math with Appl 62(7):2801–2811. https://doi.org/10.1016/j.camwa.2011.07.046
Article Google Scholar
Guo W, Alham NK, Liu Y, Li M, Qi M (2016) A resource aware Mapreduce based parallel SVM for large scale image classifications. Neural Process Lett 44(1):161–184. https://doi.org/10.1007/s11063-015-9472-z
Article Google Scholar
You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43. https://doi.org/10.1016/j.neucom.2014.05.072
Article Google Scholar
Do TN and Poulet F (2017) Parallel learning of local SVM algorithms for classifying large datasets, in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 10140 LNCS, pp 67–93. https://doi.org/10.1007/978-3-662-54173-9_4.
Scardapane S, Fierimonte R, Di Lorenzo P, Panella M, Uncini A (2016) Distributed semi-supervised support vector machines. Neural Netw 80:43–52
Article Google Scholar
Liu Y, Xu Z, Li C (2018) Distributed online semi-supervised support vector machine. Inf Sci (Ny) 466:236–257. https://doi.org/10.1016/j.ins.2018.07.045
Article MathSciNet Google Scholar
Doostmohammadian M, Aghasi A, Charalambous T, Khan UA (2022) Distributed support vector machines over dynamic balanced directed networks. IEEE Control Syst Lett 6:758–763. https://doi.org/10.1109/LCSYS.2021.3086388
Article MathSciNet Google Scholar
Kashef R (2021) A boosted SVM classifier trained by incremental learning and decremental unlearning approach. Expert Syst Appl 167:114154. https://doi.org/10.1016/J.ESWA.2020.114154
Article Google Scholar
Laskar S and Adnan MA (2022) Fast support vector machine using singular value decomposition, Proc 2022 IEEE International Conference on Big Data, Big Data 2022, pp 1280–1285, https://doi.org/10.1109/BIGDATA55660.2022.10020978
Patel D (2021) Quantile regression support vector machine (QRSVM) model for time series data analysis. Commun Comput Inf Sci 1374:65–74. https://doi.org/10.1007/978-981-16-0708-0_6/COVER
Article Google Scholar
Zanghirati G, Zanni L (2003) A parallel solver for large quadratic programs in training support vector machines. Parallel Comput 29(4):535–551. https://doi.org/10.1016/S0167-8191(03)00021-8
Article MathSciNet Google Scholar
Eitrich T, Lang B (2006) On the optimal working set size in serial and parallel support vector machine learning with the decomposition algorithm. Conf Res Pract Inf Technol Ser 61:121–128
Google Scholar
Serafini T, Zanni L and Zanghirati G (2005) Some improvements to a parallel decomposition technique for training support vector machines, in: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 3666 LNCS, pp 9–17. https://doi.org/10.1007/11557265_7
Qiu S and Lane T (2005) Parallel computation of RBF kernels for support vector classifiers, In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, pp 334–345, https://doi.org/10.1137/1.9781611972757.30
Li X, Cervantes J, Yu W (2012) Fast classification for large data sets via random selection clustering and support vector machines. Intell Data Anal 16(6):897–914. https://doi.org/10.3233/IDA-2012-00558
Article Google Scholar
Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13. https://doi.org/10.1109/TNN.2006.883722
Article Google Scholar
Zhu F, Yang J, Ye N, Gao C, Li G, Yin T (2014) Neighbors’ distribution property and sample reduction for support vector machines. Appl Soft Comput J 16:201–209. https://doi.org/10.1016/j.asoc.2013.12.009
Article Google Scholar
Gärtner B, Welzl E (2001) A simple sampling lemma: analysis and applications in geometric optimization. Discret Comput Geom 25(4):569–590. https://doi.org/10.1007/s00454-001-0006-2
Article MathSciNet Google Scholar
Loosli G, Canu S and Bottou L (2007) Training invariant support vector machines using selective sampling, Large Scale Kernel Mach, pp 301–320
Balcázar JL, Dai Y, Tanaka J, Watanabe O (2008) Provably fast training algorithms for support vector machines. Theory Comput Syst 42(4):568–595. https://doi.org/10.1007/s00224-007-9094-6
Article MathSciNet Google Scholar
Chang CC and Lee YJ (2004) Generating the reduced set by systematic sampling, Lect Notes Comput Sci Including Subser Lect Notes Artif Intell Lect Notes Bioinformatics, vol 3177, pp 720–725, https://doi.org/10.1007/978-3-540-28651-6_107
Chien LIJ, Chang CC, Lee YJ (2010) Variant methods of reduced set selection for reduced support vector machines. J Inf Sci Eng 26(1):183–196. https://doi.org/10.6688/JISE.2010.26.1.13
Article Google Scholar
Zain JM (2020) An alternative algorithm for classification large categorical dataset: k-mode clustering reduced support vector machine, Sersc Org, Accessed: 16
Yin C, Zhu Y, Mu S and Tian S (2012) Local support vector machine based on cooperative clustering for very large-scale dataset, International Conference on Natural Computation, Icnc, pp 88–92, https://doi.org/10.1109/ICNC.2012.6234598
Romero E, Barrio I and Belanche L (2007) Incremental and decremental learning for linear support vector machines, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 4668 LNCS, no PART 1, pp 209–218, https://doi.org/10.1007/978-3-540-74690-4_22
Schölkopf B, Herbrich R and Smola AJ (2001) A generalized representer theorem, pp 416–426, https://doi.org/10.1007/3-540-44581-1_27
Engel Y, Mannor S, Meir R (2004) The kernel recursive least-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285. https://doi.org/10.1109/TSP.2004.830985
Article MathSciNet Google Scholar
Platt JC (2021) Sequential minimal optimization: a fast algorithm for training support vector machines. Apr. 21, 1998. Accessed: Dec. 02
Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649. https://doi.org/10.1162/089976601300014493
Article Google Scholar
Drakonaki EE, Allen GM (2010) Spark: cluster computing withworking sets matei. Skeletal Radiol 39(4):391–396. https://doi.org/10.1007/s00256-009-0861-0
Article Google Scholar
Cao LJ et al (2006) Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw 17(4):1039–1049. https://doi.org/10.1109/TNN.2006.875989
Article Google Scholar
Higham NJ (2011) Gaussian elimination. Wiley Interdiscip Rev Comput Stat 3(3):230–238. https://doi.org/10.1002/WICS.164
Article MathSciNet Google Scholar
Althoen SC, McLaughlin R (1987) Gauss-Jordan reduction: a brief history. Am Math Mon 94(2):130. https://doi.org/10.2307/2322413
Article MathSciNet Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol TIST 2(3):1–27. https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Razzaghi T, Roderick O, Safro I, Marko N (2016) Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5):1–18. https://doi.org/10.1371/journal.pone.0155119
Article Google Scholar
Han J, Kamber M, Mining D (2006) Concepts and techniques. Morgan Kaufmann 340:94104–103205
Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/J.PATREC.2005.10.010
Article MathSciNet Google Scholar
A study of cross-validation and bootstrap for accuracy estimation and model selection | Proceedings of the 14th international joint conference on Artificial intelligence, Vol 2 https://doi.org/10.5555/1643031.1643047
Orriols-Puig A, Sastry K, Goldberg DE and Bernadó-Mansilla E (2006) Substructural surrogates for learning decomposable classification problems, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 4998 LNAI, pp 235–254, https://doi.org/10.1007/978-3-540-88138-4_14
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701. https://doi.org/10.1080/01621459.1937.10503522
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran
Mohammad Hassan Almaspoor & Afshin Salajegheh
Department of Medical Informatics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
Ali A. Safaei
Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
Behrouz Minaei-Bidgoli

Authors

Mohammad Hassan Almaspoor
View author publications
You can also search for this author in PubMed Google Scholar
Ali A. Safaei
View author publications
You can also search for this author in PubMed Google Scholar
Afshin Salajegheh
View author publications
You can also search for this author in PubMed Google Scholar
Behrouz Minaei-Bidgoli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M. H. Almaspoor: wrote the main manuscript text A. A. Safaei: the main idea, revision of the manuscript A. Salajegheh and B. Minaei: revision of the manuscript.

Corresponding author

Correspondence to Ali A. Safaei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Almaspoor, M.H., Safaei, A.A., Salajegheh, A. et al. Distributed independent vector machine for big data classification problems. J Supercomput 80, 7207–7244 (2024). https://doi.org/10.1007/s11227-023-05711-4

Download citation

Accepted: 06 October 2023
Published: 06 November 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11227-023-05711-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Distributed independent vector machine for big data classification problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A fast classification strategy for SVM on the large-scale high-dimensional datasets

High-Performance Solution of the Two-Class SVM Problem for Big Data Sets by the Mean Decision Rule Method

Learning Biased SVM with Weighted Within-Class Scatter for Imbalanced Classification

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Distributed independent vector machine for big data classification problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A fast classification strategy for SVM on the large-scale high-dimensional datasets

High-Performance Solution of the Two-Class SVM Problem for Big Data Sets by the Mean Decision Rule Method

Learning Biased SVM with Weighted Within-Class Scatter for Imbalanced Classification

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation