Abstract
The methodology of automatic method selection (metalearning) allows to recommend the most suitable method (e.g. algorithm or statistical estimator) from several alternatives for a given dataset, based on information learned over a training database of datasets. Practitioners have become accustomed to using metalearning in the context of regression modeling, which is useful in a variety of applications in different fields. Still, none of previous metalearning studies on regression targeted at regression complexity issues and the majority of available metalearning studies for regression considered the standard mean square error as the prediction error measure. In this paper, a metalearning study focused on comparing different method selection criteria for the regression task is presented. A prediction rule, recommending the best regression estimator (possibly robust), is constructed over 31 training datasets. These are publicly available datasets, in which the linear model was carefully examined to be suitable. The results with the highest classification accuracy are obtained if the choice of the best estimator is based on robust versions of Akaike information criterion, particularly the version derived from MM-estimators. The work also advocates an implicitly weighted robust prediction mean square error.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B., Csaki, F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Budapest, Academiai Kaido (1973)
Borra, S., Di Ciaccio, A.: Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Comput. Statist. Data Anal. 54, 2976–2989 (2010)
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, E.: Metalearning: Applications to Data Mining. Springer, Berlin (2009)
Brazdil, P., Giraud-Carrier, C.: Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Mach. Learn. 107, 1–14 (2018)
California housing dataset. https://github.com/ageron/handson-ml/tree/master/datasets/housing (2019)
Collins, A., Beel, J., Tkaczyk, D.: One-at-a-time: A meta-learning recommender-system for recommendation-algorithm selection on micro level. Ar**v:1805.12118 (2020)
Crotti, R., Misrahi, T.: The Travel & Tourism Competitiveness Report 2015. Growth Through Shocks. World Economic Forum, Geneva (2015)
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, Irvine. http://archive.ics.uci.edu/ml (2019)
Ewald, R.: Automatic Algorithm Selection for Complex Simulation Problems. Vieweg+Teubner Verlag, Wiesbaden (2012)
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)
Güney, Y., Tuaç, Y., Özdemir, Ş., Arslan, O.: Conditional maximum Lq-likelihood estimation for regression model with autoregressive error terms. Ar**v:1804.07600 (2020)
Haykin, S.O.: Neural Networks and Learning Machines: A Comprehensive Foundation, 2nd edn. Prentice Hall, Upper Saddle River (2009)
Huber, P.J., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, New York (2009)
Jurečková, J., Picek, J., Schindler, M.: Robust Statistical Methods with R, 2nd edn. CRC Press, Boca Raton (2019)
Jurečková, J., Sen, P.K., Picek, J.: Methodology in Robust and Nonparametric Statistics. CRC Press, Boca Raton (2013)
Kalina, J.: On robust information extraction from high-dimensional data. Serb. J. Manage. 9, 131–144 (2014)
Kalina, J.: Three contributions to robust regression diagnostics. J. Appl. Math. Stat. Inf. 11(2), 69–78 (2015)
Kalina, J.: On Sensitivity of Metalearning: An Illustrative Study for Robust Regression. In: Proceedings ISNPS 2018. Accepted (in press) (2020)
Kersche, P., Hoos, H.H., Neumann, F., Trautmann, H.: Automated algorithm selection: survey and perspectives. Evol. Comput. 27, 3–45 (2018)
Kmenta, J.: Elements of Econometrics. Macmillan, New York (1986)
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Koller, M., Mächler, M.: Defintions of \(\psi \)-functions available in Robustbase. https://cran.r-project.org/web/packages/robustbase/vignettes/ (2019)
Kudová, P.: Learning with Regularization Networks. Dissertation thesis. MFF UK, Prague (2006)
Lorena, A.C., Maciel, A.I., de Miranda, P.B.C., Costa, I.G., Prudêncio, R.B.C.: Data complexity meta-features for regression problems. Mach. Learn. 107, 209–246 (2018)
Luo, G.: A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Model. Anal. Health Inf. Bioinform. 5, 5–18 (2016)
Maechler, M., Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibián-Barrera, M., Verbeke, T., Koller, M., Conceicao, E.L.T., di Palma, M.A.: Robustbase: Basic Robust Statistics R package version 0.92-7 (2016)
Maronna, R.A., Martin, R.D., Yohai, V.J., Salibián-Barrera, M.: Robust Statistics: Theory and Methods (with R), 2nd edn. Wiley, Oxford (2019)
Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87, 357–380 (2012)
Ridd, P., Giraud-Carrier, C.: Using metalearning to predict when parameter optimization is likely to improve classification accuracy. In: Proceedings International Conference on Metalearning and Algorithm Selection MLAS’14, pp. 18–23 (2014)
Roelant, E., Van Aelst, S., Willems, G.: The minimum weighted covariance determinant estimator. Metrika 70, 177–204 (2009)
Ronchetti, E.: Robust model selection in regression. Stat. Prob. Lett. 3, 21–23 (1985)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Rousseeuw, P.J., van Driessen, K.: Computing LTS regression for large datasets. Data Mining Knowl. Discovery 12, 29–45 (2006)
Rusiecki, A., Kordos, M., Kamiński, T., Greń, K.: Training neural networks on noisy data. Lect. Notes Comput. Sci. 8467, 131–142 (2014)
Smucler, E., Yohai, V.J.: Robust and sparse estimators for linear regression models. Comput. Stat. Data Anal. 111, 116–130 (2017)
Spaeth, H.: Mathematical Algorithms for Linear Regression. Academic Press, Cambridge (1991)
Tharmaratnam, K., Claeskens, G.: A comparison of robust versions of the AIC based on M-S- and MM-estimators. Statistics 47, 216–235 (2013)
Vanschoren, J.: Metalearning. In Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning. Methods, Systems, Challenges, Chap. 2, pp. 35–61. Springer, Cham (2019)
Vasant, P.M.: Meta-Heuristics Optimization Algorithms in Engineering, Business, Economics, and Finance. IGI Global, Hershey (2012)
Víšek, J.Á.: Robust error-term-scale estimate. IMS Collect. 7, 254–267 (2010)
Víšek, J.Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47, 179–206 (2011)
Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.: A feature subset selection algorithm automatic recommendation method. J. Artif. Intell. Res. 47, 1–34 (2013)
Wilcox, R.R.: Introduction to Robust Estimation and Hypothesis Testing, 3rd edn. Elsevier, Waltham (2012)
Yohai, V.J.: High breakdown-point and high efficiency robust estimates for regression. Ann. Stat. 15, 642–656 (1987)
Acknowledgements
The research was supported by the grants GA18-23827S (P. Vidnerová) and GA19-05704S (J. Kalina) of the Czech Science Foundation. The authors are grateful to Aleš Neoral for technical help and to the referee and the editor for their time and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Vidnerová, P., Kalina, J., Güney, Y. (2020). A Comparison of Robust Model Choice Criteria Within a Metalearning Study. In: Maciak, M., Pešta, M., Schindler, M. (eds) Analytical Methods in Statistics. AMISTAT 2019. Springer Proceedings in Mathematics & Statistics, vol 329. Springer, Cham. https://doi.org/10.1007/978-3-030-48814-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-48814-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48813-0
Online ISBN: 978-3-030-48814-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)