A Comparison of Robust Model Choice Criteria Within a Metalearning Study

Vidnerová, Petra; Kalina, Jan; Güney, Yeşim

doi:10.1007/978-3-030-48814-7_7

Petra Vidnerová⁴,
Jan Kalina⁴ &
Yeşim Güney⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 329))

Included in the following conference series:

Workshop on Analytical Methods in Statistics

576 Accesses

Abstract

The methodology of automatic method selection (metalearning) allows to recommend the most suitable method (e.g. algorithm or statistical estimator) from several alternatives for a given dataset, based on information learned over a training database of datasets. Practitioners have become accustomed to using metalearning in the context of regression modeling, which is useful in a variety of applications in different fields. Still, none of previous metalearning studies on regression targeted at regression complexity issues and the majority of available metalearning studies for regression considered the standard mean square error as the prediction error measure. In this paper, a metalearning study focused on comparing different method selection criteria for the regression task is presented. A prediction rule, recommending the best regression estimator (possibly robust), is constructed over 31 training datasets. These are publicly available datasets, in which the linear model was carefully examined to be suitable. The results with the highest classification accuracy are obtained if the choice of the best estimator is based on robust versions of Akaike information criterion, particularly the version derived from MM-estimators. The work also advocates an implicitly weighted robust prediction mean square error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Metalearning Study for Robust Nonlinear Regression

Regression Method in Data Mining: A Systematic Literature Review

Article 27 March 2024

Valid prediction intervals for regression problems

Article 18 April 2022

References

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B., Csaki, F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Budapest, Academiai Kaido (1973)
Google Scholar
Borra, S., Di Ciaccio, A.: Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Comput. Statist. Data Anal. 54, 2976–2989 (2010)
Article MathSciNet Google Scholar
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, E.: Metalearning: Applications to Data Mining. Springer, Berlin (2009)
Book Google Scholar
Brazdil, P., Giraud-Carrier, C.: Metalearning and algorithm selection: progress, state of the art and introduction to the 2018 special issue. Mach. Learn. 107, 1–14 (2018)
Article MathSciNet Google Scholar
California housing dataset. https://github.com/ageron/handson-ml/tree/master/datasets/housing (2019)
Collins, A., Beel, J., Tkaczyk, D.: One-at-a-time: A meta-learning recommender-system for recommendation-algorithm selection on micro level. Ar**v:1805.12118 (2020)
Crotti, R., Misrahi, T.: The Travel & Tourism Competitiveness Report 2015. Growth Through Shocks. World Economic Forum, Geneva (2015)
Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, Irvine. http://archive.ics.uci.edu/ml (2019)
Ewald, R.: Automatic Algorithm Selection for Complex Simulation Problems. Vieweg+Teubner Verlag, Wiesbaden (2012)
Book Google Scholar
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)
Article Google Scholar
Güney, Y., Tuaç, Y., Özdemir, Ş., Arslan, O.: Conditional maximum Lq-likelihood estimation for regression model with autoregressive error terms. Ar**v:1804.07600 (2020)
Haykin, S.O.: Neural Networks and Learning Machines: A Comprehensive Foundation, 2nd edn. Prentice Hall, Upper Saddle River (2009)
Google Scholar
Huber, P.J., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, New York (2009)
Book Google Scholar
Jurečková, J., Picek, J., Schindler, M.: Robust Statistical Methods with R, 2nd edn. CRC Press, Boca Raton (2019)
Book Google Scholar
Jurečková, J., Sen, P.K., Picek, J.: Methodology in Robust and Nonparametric Statistics. CRC Press, Boca Raton (2013)
MATH Google Scholar
Kalina, J.: On robust information extraction from high-dimensional data. Serb. J. Manage. 9, 131–144 (2014)
Article Google Scholar
Kalina, J.: Three contributions to robust regression diagnostics. J. Appl. Math. Stat. Inf. 11(2), 69–78 (2015)
MathSciNet MATH Google Scholar
Kalina, J.: On Sensitivity of Metalearning: An Illustrative Study for Robust Regression. In: Proceedings ISNPS 2018. Accepted (in press) (2020)
Google Scholar
Kersche, P., Hoos, H.H., Neumann, F., Trautmann, H.: Automated algorithm selection: survey and perspectives. Evol. Comput. 27, 3–45 (2018)
Article Google Scholar
Kmenta, J.: Elements of Econometrics. Macmillan, New York (1986)
MATH Google Scholar
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Koller, M., Mächler, M.: Defintions of \(\psi \)-functions available in Robustbase. https://cran.r-project.org/web/packages/robustbase/vignettes/ (2019)
Kudová, P.: Learning with Regularization Networks. Dissertation thesis. MFF UK, Prague (2006)
Google Scholar
Lorena, A.C., Maciel, A.I., de Miranda, P.B.C., Costa, I.G., Prudêncio, R.B.C.: Data complexity meta-features for regression problems. Mach. Learn. 107, 209–246 (2018)
Article MathSciNet Google Scholar
Luo, G.: A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Model. Anal. Health Inf. Bioinform. 5, 5–18 (2016)
Article Google Scholar
Maechler, M., Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibián-Barrera, M., Verbeke, T., Koller, M., Conceicao, E.L.T., di Palma, M.A.: Robustbase: Basic Robust Statistics R package version 0.92-7 (2016)
Google Scholar
Maronna, R.A., Martin, R.D., Yohai, V.J., Salibián-Barrera, M.: Robust Statistics: Theory and Methods (with R), 2nd edn. Wiley, Oxford (2019)
MATH Google Scholar
Reif, M., Shafait, F., Dengel, A.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87, 357–380 (2012)
Article MathSciNet Google Scholar
Ridd, P., Giraud-Carrier, C.: Using metalearning to predict when parameter optimization is likely to improve classification accuracy. In: Proceedings International Conference on Metalearning and Algorithm Selection MLAS’14, pp. 18–23 (2014)
Google Scholar
Roelant, E., Van Aelst, S., Willems, G.: The minimum weighted covariance determinant estimator. Metrika 70, 177–204 (2009)
Article MathSciNet Google Scholar
Ronchetti, E.: Robust model selection in regression. Stat. Prob. Lett. 3, 21–23 (1985)
Article MathSciNet Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Book Google Scholar
Rousseeuw, P.J., van Driessen, K.: Computing LTS regression for large datasets. Data Mining Knowl. Discovery 12, 29–45 (2006)
Article MathSciNet Google Scholar
Rusiecki, A., Kordos, M., Kamiński, T., Greń, K.: Training neural networks on noisy data. Lect. Notes Comput. Sci. 8467, 131–142 (2014)
Article Google Scholar
Smucler, E., Yohai, V.J.: Robust and sparse estimators for linear regression models. Comput. Stat. Data Anal. 111, 116–130 (2017)
Article MathSciNet Google Scholar
Spaeth, H.: Mathematical Algorithms for Linear Regression. Academic Press, Cambridge (1991)
Google Scholar
Tharmaratnam, K., Claeskens, G.: A comparison of robust versions of the AIC based on M-S- and MM-estimators. Statistics 47, 216–235 (2013)
Article MathSciNet Google Scholar
Vanschoren, J.: Metalearning. In Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning. Methods, Systems, Challenges, Chap. 2, pp. 35–61. Springer, Cham (2019)
Google Scholar
Vasant, P.M.: Meta-Heuristics Optimization Algorithms in Engineering, Business, Economics, and Finance. IGI Global, Hershey (2012)
Google Scholar
Víšek, J.Á.: Robust error-term-scale estimate. IMS Collect. 7, 254–267 (2010)
MathSciNet Google Scholar
Víšek, J.Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47, 179–206 (2011)
MathSciNet MATH Google Scholar
Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.: A feature subset selection algorithm automatic recommendation method. J. Artif. Intell. Res. 47, 1–34 (2013)
Article Google Scholar
Wilcox, R.R.: Introduction to Robust Estimation and Hypothesis Testing, 3rd edn. Elsevier, Waltham (2012)
MATH Google Scholar
Yohai, V.J.: High breakdown-point and high efficiency robust estimates for regression. Ann. Stat. 15, 642–656 (1987)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research was supported by the grants GA18-23827S (P. Vidnerová) and GA19-05704S (J. Kalina) of the Czech Science Foundation. The authors are grateful to Aleš Neoral for technical help and to the referee and the editor for their time and helpful suggestions.

Author information

Authors and Affiliations

The Czech Academy of Sciences, Institute of Computer Science, Pod Vodárenskou věží 2, 182 07, Praha 8, Czech Republic
Petra Vidnerová & Jan Kalina
Faculty of Science, Department of Statistics, Ankara University, 06100, Tandogan, Ankara, Turkey
Yeşim Güney

Authors

Petra Vidnerová
View author publications
You can also search for this author in PubMed Google Scholar
Jan Kalina
View author publications
You can also search for this author in PubMed Google Scholar
Yeşim Güney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kalina .

Editor information

Editors and Affiliations

Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Matúš Maciak
Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Michal Pešta
Department of Applied Mathematics, Technical University of Liberec, Liberec, Czech Republic
Martin Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vidnerová, P., Kalina, J., Güney, Y. (2020). A Comparison of Robust Model Choice Criteria Within a Metalearning Study. In: Maciak, M., Pešta, M., Schindler, M. (eds) Analytical Methods in Statistics. AMISTAT 2019. Springer Proceedings in Mathematics & Statistics, vol 329. Springer, Cham. https://doi.org/10.1007/978-3-030-48814-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-48814-7_7
Published: 20 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48813-0
Online ISBN: 978-3-030-48814-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Comparison of Robust Model Choice Criteria Within a Metalearning Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Metalearning Study for Robust Nonlinear Regression

Regression Method in Data Mining: A Systematic Literature Review

Valid prediction intervals for regression problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparison of Robust Model Choice Criteria Within a Metalearning Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Metalearning Study for Robust Nonlinear Regression

Regression Method in Data Mining: A Systematic Literature Review

Valid prediction intervals for regression problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation