Log in

Multiple criteria optimization-based data mining methods and applications: a systematic survey

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Support Vector Machine, an optimization technique, is well known in the data mining community. In fact, many other optimization techniques have been effectively used in dealing with data separation and analysis. For the last 10 years, the author and his colleagues have proposed and extended a series of optimization-based classification models via Multiple Criteria Linear Programming (MCLP) and Multiple Criteria Quadratic Programming (MCQP). These methods are different from statistics, decision tree induction, and neural networks. The purpose of this paper is to review the basic concepts and frameworks of these methods and promote the research interests in the data mining community. According to the evolution of multiple criteria programming, the paper starts with the bases of MCLP. Then, it further discusses penalized MCLP, MCQP, Multiple Criteria Fuzzy Linear Programming (MCFLP), Multi-Class Multiple Criteria Programming (MCMCP), and the kernel-based Multiple Criteria Linear Program, as well as MCLP-based regression. This paper also outlines several applications of Multiple Criteria optimization-based data mining methods, such as Credit Card Risk Analysis, Classification of HIV-1 Mediated Neuronal Dendritic and Synaptic Damage, Network Intrusion Detection, Firm Bankruptcy Prediction, and VIP E-Mail Behavior Analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altman E (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Finance 23(3): 589–609

    Article  Google Scholar 

  2. Chang CC, Lin CJ (2001) LIBSVM: A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  3. Charnes A, Cooper WW, Rhodes E (1979) Measuring the efficiency of decision-making units. Eur J Oper Res 3(4): 339

    Article  Google Scholar 

  4. Cortes C, Vapnik V (1995) Support-vector Network. Mach Learn 20: 273–279

    MATH  Google Scholar 

  5. Freed N, Glover F (1981) Simple but powerful goal programming models for discriminant problems. Eur J Oper Res 7: 44–60

    Article  MATH  Google Scholar 

  6. Fung G (2003) Machine learning and data mining via mathematical programming-based support vector machines. Ph.D thesis, The University of Wisconsin-Madison

  7. Fung G, Stoeckel J (2007) SVM feature selection for classification of SPECT images of Alzheimer’s disease using spatial information. Knowl Inform Syst 11: 243–258

    Article  Google Scholar 

  8. He J, Liu X, Shi Y, Xu W, Yan N (2004) Classifications of credit cardholder behavior by using fuzzy linear programming. Int J Inform Technol Decis Making 3(4): 633–650

    Article  Google Scholar 

  9. Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco

    Google Scholar 

  10. Joachims T (2004) SVM-light: support vector machine. Available at: http://svmlight.joachims.org/

  11. Kou G, Shi Y (2002) Linux based Multiple Linear Programming Classification Program, Omaha, NE, U.S.A., College of Information Science and Technology, University of Nebraska-Omaha

  12. Kou G, Liu X, Peng Y, Shi Y, Wise M, Xu W (2003) Multiple criteria linear programming to data mining: models, algorithm designs and software developments. Optim Methods Softw 18: 453–473

    Article  MATH  MathSciNet  Google Scholar 

  13. Kou G, Peng Y, Yan N, Shi Y, Chen Z, Zhu Q, Huff J, McCartney S (2004) Network intrusion detection by using multiple-criteria linear programming. In: Chen J (eds) Proceedings of 2004 international conference on service systems and service management July, 19–21. Bei**g, China, pp 806–809

    Google Scholar 

  14. Kou G, Peng Y, Shi Y, Wise M, Xu W (2005) Discovering credit cardholders behavior by multiple criteria linear programming. Ann Oper Res 135(1): 261–274

    Article  MATH  MathSciNet  Google Scholar 

  15. Kou G (2006) Multi-class multi-criteria mathematical programming and its applications in large scale data mining problems. PhD Dissertation, University of Nebraska Omaha

  16. Kou G, Peng Y, Chen Z, Shi Y (2009) Multiple criteria mathematical programming for multi-class classification and applications in network intrusion detection. Inform Sci 179: 371–381

    Article  Google Scholar 

  17. Kwak W, Shi Y, Cheh JJ (2006) Firm bankruptcy prediction using multiple criteria linear programming data mining approach. Adv Financial Plan Forecast 2: 27–49

    Google Scholar 

  18. Kwak W, Shi Y, Eldridge S, Kou G (2006) Bankruptcy prediction for Japanese firms: using multiple criteria linear programming data mining approach. Int J Bus Intell Data Mining 1(4): 401–416

    Article  Google Scholar 

  19. Li A, Shi Y, He J (2008) MCLP-based methods for improving “Bad” catching rate in credit cardholder behavior analysis. Appl Soft Comput 8(3): 1259–1265

    Google Scholar 

  20. Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13: 444–452

    Article  MATH  MathSciNet  Google Scholar 

  21. Lu Y, Roychowdhury V (2008) Parallel randomized sampling for support vector machine (SVM) and support vector regression (SVR). Knowl Inform Syst 14: 233–247

    Article  Google Scholar 

  22. Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on machine learning, Banff, Alberta, Canada, July 4-8, pp 78–86

  23. Ohlson J (1980) Financial ratios and the probabilistic prediction of bankruptcy. J Acc Res 18(1): 109–131

    Article  MathSciNet  Google Scholar 

  24. Olson D, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York

    Google Scholar 

  25. Peng T, Zuo W, He F (2008) SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowl Inform Syst 16: 281–301

    Article  Google Scholar 

  26. Peng Y (2002) Data mining in credit card portfolio management: classification for cardholders’ behavior. Master Thesis, University of Nebraska Omaha

  27. Peng Y, Kou G, Shi Y, Chen Z (2008) A multi-criteria convex quadratic programming model for credit data analysis. Decis Support Syst 44: 1016–1030

    Article  Google Scholar 

  28. Shi Y (2001) Multiple criteria and multiple constraint level linear programming: concepts, techniques and applications. World Scientific Publishing, Singapore

    Google Scholar 

  29. Shi Y, Yu PL (1989) Goal setting and compromise solutions. In: Karpak B, Zionts S (eds) Multiple criteria decision making and risk analysis using microcomputers. Springer, Berlin, pp 165–204

    Google Scholar 

  30. Shi Y, Wise M, Luo M, Lin Y (2001) Data mining in credit card portfolio management: a multiple criteria decision making approach. In: Koksalan M, Zionts S (eds) Advance in multiple criteria decision making in the new millennium. Springer, Berlin, pp 427–436

    Google Scholar 

  31. Shi Y, Peng Y, Xu W, Tang X (2002) Data mining via multiple criteria linear programming: applications in credit card portfolio management. Int J Inform Technol Decis Making 1: 131–151

    Article  Google Scholar 

  32. Stolfo SJ, Fan W, Lee W, Prodromidis A, Chan PK (2000) Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection: results from the JAM Project, DARPA Information Survivability Conference

  33. Wang Z, Klir G (1992) Fuzzy measure theory. Plenum, New York

    MATH  Google Scholar 

  34. Wang Z, Leung K, Klir GJ (2005) Applying fuzzy measures and nonlinear integrals in data mining. Fuzzy Sets Syst 156: 371–380

    Article  MATH  MathSciNet  Google Scholar 

  35. Wang Z, Guo H (2003) A New genetic algorithm for nonlinear multi-regressions based on generalized Choquet integrals. In: Proceedings of Fuzz/IEEE, IEEE, pp 819–821

  36. Wei LW (2008) Research on data mining classification model based on the multiple criteria programming and its application. PhD Dissertation, Institute of Policy and Management, Chinese Academy of Sciences

  37. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Y, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14: 1–37

    Article  Google Scholar 

  38. Yan N, Shi Y (2003) Neural network classification program, College of Information Science and Technology, University of Nebraska-Omaha. http://dm.ist.unomaha.edu/tools.htm

  39. Yan N, Shi Y, Chen Z (2008) Multiple criteria nonlinear programming classification with signed non-additive measure. In: The 19th international conference on multiple criteria decision making, Auckland, New Zealand, January 7–12

  40. Zhang P, Zhang JL, Shi Y (2007) A new multi-criteria quadratic-programming linear classification model for VIP E-Mail Analysis, ICCS 2007, Part II. LNCS, vol 4488, Springer, Berlin, pp 499–502

  41. Zhang P, Shi Y (2008) Multiple criteria linear programming for vip e-mail behavior analysis. Working Paper, Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences

  42. Zhang JL, Shi Y, Zhang P (2009) Several multi-criteria programming methods for classification. Comput Oper Res 36: 823–836

    Article  MATH  Google Scholar 

  43. Zhang D, Tian Y, Shi Y (2008) A Regression Method by Multiple Criteria Linear Programming. In: 19th international conference on multiple criteria decision making, (MCDM), Auckland, New Zealand, January 7–12

  44. Zhang Z, Zhang D, Tian Y, Shi Y (2008) Kernel based multiple criteria linear program. In: The 19th international conference on multiple criteria decision making (MCDM), Auckland, New Zealand, January 7–12

  45. Zheng J, Zhuang W, Yan N, Kou G, Peng H, McNally C, Erichsen D, Cheloha A, Herek S, Shi C, Shi Y (2004) Classification of HIV-1 mediated neuronal dendritic and synaptic damage using multiple criteria linear programming. Neuroinformatics 2: 303–326

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Shi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, Y. Multiple criteria optimization-based data mining methods and applications: a systematic survey. Knowl Inf Syst 24, 369–391 (2010). https://doi.org/10.1007/s10115-009-0268-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0268-1

Keywords

Navigation