Log in

Item Response Theory Based Ensemble in Machine Learning

  • Research Article
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In this article, we propose a novel probabilistic framework to improve the accuracy of a weighted majority voting algorithm. In order to assign higher weights to the classifiers which can correctly classify hard-to-classify instances, we introduce the item response theory (IRT) framework to evaluate the samples’ difficulty and classifiers’ ability simultaneously. We assigned the weights to classifiers based on their abilities. Three models are created with different assumptions suitable for different cases. When making an inference, we keep a balance between the accuracy and complexity. In our experiment, all the base models are constructed by single trees via bootstrap. To explain the models, we illustrate how the IRT ensemble model constructs the classifying boundary. We also compare their performance with other widely used methods and show that our model performs well on 19 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Z. H. Zhou. Ensemble learning. Encyclopedia of Biometrics, S. Z. Li, Ed., Berlin, Germany: Springer, pp. 411–416, 2009.

    Google Scholar 

  2. L. Lam, S. Y. Suen. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and Humans, vol. 27, no. 5, pp. 553–568, 1997. DOI: https://doi.org/10.1109/3468.618255.

    Google Scholar 

  3. A. F. R. Rahman, H. Alam, M. C. Fairhurst. Multiple classifier combination for character recognition: revisiting the majority voting system and its variations. In Proceedings of the 5th International Workshop on Document Analysis Systems, pp. 167–178, Springer, Princeton, USA, 2002.

    Google Scholar 

  4. H. Kim, H. Kim, H. Moon, H. Ahn. A weight-adjusted voting algorithm for ensembles of classifiers. Journal of the Korean Statistical Society, vol. 40, no. 4, pp. 437–449, 2011. DOI: https://doi.org/10.1016/j.jkss.2011.03.002.

    MathSciNet  MATH  Google Scholar 

  5. S. E. Embretson, S. P. Reise. Item Response Theory, New York, USA: Psychology Press, 2013.

    Google Scholar 

  6. F. Martínez-Plumed, R. B. C. Prudencio, A. Martínez-Usó, J. Hernández-Orallo. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence, vol. 271, pp. 18–42, 2019. DOI: https://doi.org/10.1016/j.artint.2018.09.004.

    MathSciNet  MATH  Google Scholar 

  7. L. Breiman. Bagging predictors. Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. DOI: https://doi.org/10.1007/BF00058655.

    MATH  Google Scholar 

  8. I. Gandhi, M. Pandey. Hybrid ensemble of classifiers using voting. In Proceedings of International Conference on Green Computing and Internet of Things, IEEE, Noida, India, pp. 399–404, 2015. DOI: https://doi.org/10.1109/ICGCIoT.2015.7380496.

    Google Scholar 

  9. A. Rojarath, W. Songpan, C. Pong-Inwong. Improved ensemble learning for classification techniques based on majority voting. In Proceedings of the 7th IEEE International Conference on Software Engineering and Service Science, IEEE, Bei**g, China, pp. 107–110, 2016. DOI: https://doi.org/10.1109/ICSESS.2016.7883026.

    Google Scholar 

  10. C. Cornelio, M. Donini, A. Loreggia, M. S. Pini, F. Rossi. Voting with random classifiers (vorace). ar**v: 1909.08996, 2019. https://arxiv.org/abs/1909.08996.

  11. X. B. Liu, Z. T. Liu, G. J. Wang, Z. H. Cai, H. Zhang. Ensemble transfer learning algorithm. IEEE Access, vol. 6, pp. 2389–2396, 2017. DOI: https://doi.org/10.1109/ACCESS.2017.2782884.

    Google Scholar 

  12. S. J. Winham, R. R. Freimuth, J. M. Biernacka. A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining, vol. 6, no. 6, pp. 496–505, 2013. DOI: https://doi.org/10.1002/sam.11196.

    MathSciNet  MATH  Google Scholar 

  13. Y. C. Chen, H. Ahn, J. J. Chen. High-dimensional canonical forest. Journal of Statistical Computation and Simulation, vol. 87, no. 5, pp. 845–854, 2017. DOI: https://doi.org/10.1080/00949655.2016.1231191.

    MathSciNet  MATH  Google Scholar 

  14. H. F. Zhou, X. Z. Zhao, X. Wang. An effective ensemble pruning algorithm based on frequent patterns. Knowledge-Based Systems, vol. 56, pp. 79–85, 2014. DOI: https://doi.org/10.1016/j.knosys.2013.10.024.

    Google Scholar 

  15. Y. Zhang, S. Burer, W. N. Street. Ensemble pruning via semidefinite programming. Journal of Machine Learning Research, vol. 7, no. 1, pp. 1315–1338, 2006.

    MathSciNet  MATH  Google Scholar 

  16. L. I. Kuncheva, J. J. Rodríguez. A weighted voting framework for classifiers ensembles. Knowledge and Information Systems, vol. 38, no. 2, pp. 259–275, 2014. DOI: https://doi.org/10.1007/s10115-012-0586-6.

    Google Scholar 

  17. A. Kabir, C. Ruiz, S. A. Alvarez. Mixed bagging: a novel ensemble learning framework for supervised classification based on instance hardness. In Proceedings of IEEE International Conference on Data Mining, IEEE, Singapore, Singapore, pp.1073–1078, 2018. DOI: https://doi.org/10.1109/ICDM.2018.00137.

    Google Scholar 

  18. L. V. Utkin, M. S. Kovalev, A. A. Meldo. A deep forest classifier with weights of class probability distribution subsets. Knowledge-based Systems, vol. 173, pp. 15–27, 2019. DOI: https://doi.org/10.1016/j.knosys.2019.02.022.

    Google Scholar 

  19. H. Reddy, N. Raj, M. Gala, A. Basava. Text-mining-based fake news detection using ensemble methods. International Journal of Automation and Computing, vol. 17, no. 2, pp. 210–221, 2020. DOI: https://doi.org/10.1007/s11633-019-1216-5.

    Google Scholar 

  20. W. G. Yi, J. Duan, M. Y. Lu. Double-layer Bayesian classifier ensembles based on frequent itemsets. International Journal of Automation and Computing, vol. 9, no. 2, pp. 215–220, 2012. DOI: https://doi.org/10.1007/s11633-012-0636-2.

    Google Scholar 

  21. G. Wang, J. X. Hao, J. Ma, H. B. Jiang. A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, vol. 38, no. 1, pp. 223–230, 2011. DOI: https://doi.org/10.1016/j.eswa.2010.06.048.

    Google Scholar 

  22. F. Martínez-Plumed, R. B. Prudêncio, A. Martínez-Usó, J. Hernández-Orallo. Making sense of item response theory in machine learning. In Proceedings of the 22nd European Conference on Artificial Intelligence, IOS Press, The Hague, The Netherlands, pp. 1140–1148, 2016. DOI: https://doi.org/10.3233/978-1-61499-672-9-1140.

    Google Scholar 

  23. C. Zanon, C. S. Hutz, H. Yoo, R. K. Hambleton. An application of item response theory to psychological test development. Psicologia: Refflexão e Crítica, vol. 29, no. 1, Article number 18, 2016. DOI: https://doi.org/10.1186/s41155-016-0040-x.

  24. H. L. Fu, G. Manogaran, K. Wu, M. Cao, S. Jiang, A. M. Yang. Intelligent decision-making of online shop** behavior based on internet of things. International Journal of Information Management, vol. 50, pp. 515–525, 2020. DOI: https://doi.org/10.1016/j.i**fomgt.2019.03.010.

    Google Scholar 

  25. W. R. Gilks, S. Richardson, D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Boca Raton, USA: Chapman & Hall, CRC, 1995.

    MATH  Google Scholar 

  26. Y. Chen, T. S. Filho, R. B. C. Prudencio, T. Diethe, P. Flach. β3-IRT: a new item response model and its applications. ar**v: 1903.04016, 2019. https://arxiv.org/abs/1903.04016.

  27. B. W. Junker, R. J. Patz, N. M. VanHoudnos. Markov chain Monte Carlo for item response models. Handbook of Item Response Theory, Volume Two: Statistical Tools, W. J. van der Linden, Ed., Boca Raton, USA: Chapman and Hall, CRC, pp. 271–325, 2016.

    Google Scholar 

  28. J. S. Kim, D. M. Bolt. Estimating item response theory models using Markov chain Monte Carlo methods. Educational Measurement: Issues and Practice, vol. 26, no. 4, pp. 38–51, 2007. DOI: https://doi.org/10.1111/j.1745-3992.2007.00107.x.

    Google Scholar 

  29. M. A. Tanner, W. H. Wong. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, vol. 82, no. 398, pp. 528–540, 1987. DOI: https://doi.org/10.1080/01621459.1987.10478458.

    MathSciNet  MATH  Google Scholar 

  30. J. H. Albert. Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, vol. 17, no. 3, pp. 251–269, 1992. DOI: https://doi.org/10.3102/10769986017003251.

    Google Scholar 

  31. Y. Y. Sheng. Markov chain Monte Carlo estimation of normal ogive IRT models matlab. Journal of Statistical Software, vol. 25, no. 8, pp.1–15, 2008. DOI:https://doi.org/10.18637/jss.v025.i08.

    Google Scholar 

  32. Y. Y. Sheng. Bayesian estimation of the four-parameter IRT model using Gibbs sampling. International Journal of Quantitative Research in Education, vol. 2, no. 3–4, pp. 194–212, 2015. DOI: https://doi.org/10.1504/IJQRE.2015.071736.

    Google Scholar 

  33. Y. Noel, B. Dauvier. A beta item response model for continuous bounded responses. Applied Psychological Measurement, vol. 31, no. 1, pp. 47–73, 2007. DOI: https://doi.org/10.1177/0146621605287691.

    MathSciNet  Google Scholar 

  34. J. C. Xu, Q. W. Ren, Z. Z. Shen. Prediction of the strength of concrete radiation shielding based on LS-SVM. Annals of Nuclear Energy, vol. 85, pp. 296–300, 2015. DOI: https://doi.org/10.1016/j.anucene.2015.05.030.

    Google Scholar 

  35. S. Borman. The expectation maximization algorithm: a short tutorial. Submmitted for Publication, vol. 41, 2004.

  36. W. Deng, H. M. Zhao, L. Zou, G. Y. Li, X. H. Yang, D. Q. Wu. A novel collaborative optimization algorithm in solving complex optimization problems. Soft Computing, vol. 21, no. 15, pp. 4387–4398, 2017. DOI: https://doi.org/10.1007/s00500-016-2071-8.

    Google Scholar 

  37. M. H. Fang, X. H. Hu, T. T. He, Y. Wang, J. M. Zhao, X. J. Shen, J. Yuan. Prioritizing disease-causing genes based on network diffusion and rank concordance. In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine, IEEE, Belfast, UK, pp. 242–247, 2014. DOI: https://doi.org/10.1109/BIBM.2014.6999162.

    Google Scholar 

  38. S. R. Safavian, D. Landgrebe. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660–674, 1991. DOI: https://doi.org/10.1109/21.97458.

    MathSciNet  Google Scholar 

  39. A. Liaw, M. Wiener. Classification and regression by randomforest. R News, vol. 2–3, pp. 18–22, 2002.

    Google Scholar 

  40. J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, vol. 38, no. 4, pp. 367–378, 2002. DOI: https://doi.org/10.1016/S0167-9473(01)00065-2.

    MathSciNet  MATH  Google Scholar 

  41. S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K. R. Mullers. Fisher discriminant analysis with kernels. In Proceedings of IEEE Signal Processing Society Workshop, IEEE, Madison, USA, pp. 41–48, 1999. DOI: https://doi.org/10.1109/NNSP.1999.788121.

    Google Scholar 

  42. J. A. K. Suykens, J. Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999. DOI: https://doi.org/10.1023/A:1018628609742.

    Google Scholar 

  43. E. Bauer, R. Kohavi. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning, vol. 36, no. 1–2, pp. 105–139, 1999. DOI: https://doi.org/10.1023/A:1007515423169.

    Google Scholar 

  44. H. Li, F. D. Chen, K. W. Cheng, Z. Z. Zhao, D. Z. Yang. Prediction of zeta potential of decomposed peat via machine learning: comparative study of support vector machine and artificial neural networks. International Journal of Electrochemical Science, vol. 10, no. 8, pp. 6044–6056, 2015.

    Google Scholar 

  45. Y. C. Chen, H. Ha, H. Kim, H. Ahn. Canonical forest. Computational Statistics, vol. 29, no. 3–4, pp. 849–867, 2014. DOI: https://doi.org/10.1007/s00180-013-0466-x.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziheng Chen.

Additional information

Recommended by Associate Editor Matjaz Gams

Ziheng Chen received the B. Sc. degree in statistics from Renmin University of China, China in 2016. He is currently a Ph. D. degree candidate in Department of Applied Mathematics and Statistics, Stony Brook University, USA.

His research interests include reinforcement learning, recommending system, tree structure model and ensemble learning theory.

Hongshik Ahn received the B. Sc. degree in mathematics from Seoul National University, South Korea, and the Ph. D. degree in statistics from University of Wisconsin-Madison, USA in 1992. From 1992 to 1996, he was a mathematical statistician at the National Center for Toxicological Research, U.S. Food and Drug Administration, and a faculty member in the Department of Applied Mathematics and Statistics at Stony Brook University, USA from 1996 to present. He was the first Vice President of SUNY Korea for two years from 2012. Currently, he is a professor at Stony Brook University. He has published 2 books, 3 book chapters, over 70 papers in peer-reviewed journals, and 25 conference papers.

His research interests include classification of high-dimensional data, tree-structured regression modeling, survival analysis, and multi-step batch testing for infectious diseases.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Ahn, H. Item Response Theory Based Ensemble in Machine Learning. Int. J. Autom. Comput. 17, 621–636 (2020). https://doi.org/10.1007/s11633-020-1239-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-020-1239-y

Keywords

Navigation