Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

Moya, Antonio R.; Veloso, Bruno; Gama, João; Ventura, Sebastián

doi:10.1007/s10618-023-00997-7

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

Published: 21 December 2023

Volume 38, pages 1289–1315, (2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Antonio R. Moya ORCID: orcid.org/0000-0002-3657-4749¹,
Bruno Veloso^2,3,
João Gama^2,3 &
…
Sebastián Ventura¹

294 Accesses
1 Altmetric
Explore all metrics

Abstract

Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitrarily over time. On the other hand, dealing with data streams and online learning is a challenging problem. In fact, the higher the technology goes, the greater the importance of sophisticated techniques to process these data streams. Thus, improving hyper-parameter self-tuning during online learning of these machine learning models is crucial. To this end, in this paper, we present MESSPT, an evolutionary algorithm for self-hyper-parameter tuning for data streams. We apply Differential Evolution to dynamically-sized samples, requiring a single pass-over of data to train and evaluate models and choose the best configurations. We take care of the number of configurations to be evaluated, which necessarily has to be reduced, thus making this evolutionary approach a micro-evolutionary one. Furthermore, we control how our evolutionary algorithm deals with concept drift. Experiments on different learning tasks and over well-known datasets show that our proposed MESSPT outperforms the state-of-the-art on hyper-parameter tuning for data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 6

Fig. 7

Self Hyper-Parameter Tuning for Data Streams

Self Hyper-parameter Tuning for Stream Classification Algorithms

A lightweight knowledge-based PSO for SVM hyper-parameters tuning in a dynamic environment

Article 19 May 2023

Notes

References

Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Bäck T (1996) Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford
Book Google Scholar
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, et al (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, Citeseer, pp 77–86
Bahri M, Gomes HM, Bifet A, et al (2020a) Cs-arf: compressed adaptive random forests for evolving data stream classification. In: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Bahri M, Maniu S, Bifet A, et al (2020b) Compressed k-nearest neighbors ensembles for evolving data streams. In: ECAI 2020-24th European conference on artificial intelligence
Bahri M, Bifet A, Gama J et al (2021) Data stream analysis: foundations, major tasks and tools. Wiley Interdiscipl Rev: Data Min Knowl Discov 11(3):e1405
Google Scholar
Bakhashwain N, Sagheer A (2020) Online tuning of hyperparameters in deep LSTM for time series applications. Int J Intell Eng Syst 14(1):212–220
Google Scholar
Ballester-Ripoll R, Paredes EG, Pajarola R (2019) Sobol tensor trains for global sensitivity analysis. Reliab Eng Syst Safety 183:311–322
Article Google Scholar
Barros RC, Basgalupp MP, De Carvalho ACPLF et al (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(3):291–312
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281–305
MathSciNet Google Scholar
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining, SIAM, pp 443–448
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA, pp 443–448
Breiman L, Friedman JH, Olshen RA et al (1984) Classification and regression trees. Wadsworth, Belmont
Google Scholar
Candanedo LM, Feldheim V, Deramaix D (2017) Data driven prediction models of energy use of appliances in a low-energy house. Energy Build 140:81–97
Article Google Scholar
Candillier L, Lemaire V (2012) Design and analysis of the nomao challenge - active learning in the real-world. In: Proceedings of the ALRA : active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK
Celik B, Vanschoren J (2021) Adaptation strategies for automated machine learning on evolving data. IEEE Trans Pattern Anal Mach Intell 43(9):3067–3078
Article Google Scholar
Cho D, Yoo C, Im J et al (2020) Comparative assessment of various machine learning-based bias correction methods for numerical weather prediction model forecasts of extreme air temperatures in urban areas. Earth Space Sci 7(4):2019000740
Article Google Scholar
Coello CAC, Pulido GT (2001) A micro-genetic algorithm for multiobjective optimization. In: Evolutionary multi-criterion optimization, first international conference, EMO 2001, Zurich, Switzerland, March 7-9, 2001, Proceedings, Lecture Notes in Computer Science, vol 1993. Springer, pp 126–140
Das S, Suganthan PN (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):4–31
Article Google Scholar
Deneke T, Haile H, Lafond S, et al (2014) Video transcoding time prediction for proactive load balancing. In: Multimedia and expo (ICME), 2014 IEEE International Conference on, pp 1–6
Dua D, Graff C (2017) UCI machine learning repository
Duarte J, Gama J, Bifet A (2016) Adaptive model rules from high-speed data streams. ACM Trans Knowl Discov Data 10(3):30:1-30:22
Article Google Scholar
Frias-Blanco I, del Campo-Ávila J, Ramos-Jimenez G et al (2014) Online and non-parametric drift detection methods based on hoeffding’s bounds. IEEE Trans Knowl Data Eng 27(3):810–823
Article Google Scholar
Galletly J (1998) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Kybernetes 27(8):979–980
Article Google Scholar
Gama J, Medas P, Castillo G, et al (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence, Springer, pp 286–295
Gama J, Žliobaitė I, Bifet A et al (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37
Article Google Scholar
Garcia S, Herrera F (2008) An extension on" statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. J Mach Learn Res 9(12):2677–2694
Google Scholar
Gardner A, Duncan CA, Kanno J, et al (2014) 3d hand posture recognition from small unlabeled point sets. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 164–169
Guliashki V, Toshev H, Korsemov C (2009) Survey of evolutionary algorithms used in multiobjective optimization. Probl Eng Cybernet Robot 60(1):42–54
MathSciNet Google Scholar
Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evol Comput 1(3):111–128
Article Google Scholar
Hruschka ER, Campello RJ, Freitas AA et al (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):133–155
Article Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 97–106
Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Disc 23(1):128–168
Article MathSciNet Google Scholar
Imbrea A (2021) Automated machine learning techniques for data streams. CoRR ar**v:abs/2106.07317
Koza JR (1995) Survey of genetic algorithms and genetic programming. pp. 589–594
Kulbach C, Montiel J, Bahri M, et al (2022) Evolution-based online automated machine learning. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 13280 LNAI:472 – 484
Lacombe T, Koh YS, Dobbie G, et al (2021) A meta-learning approach for automated hyperparameter tuning in evolving data streams. In: International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, July 18–22, 2021. IEEE, pp 1–8
Lerman P (1980) Fitting segmented regression models by grid search. J Roy Stat Soc: Ser C (Appl Stat) 29(1):77–84
MathSciNet Google Scholar
Lin C, Guo M, Li C, et al (2019) Online hyper-parameter learning for auto-augmentation strategy. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27—November 2, 2019. IEEE, pp 6578–6587
Lobo JL, Ser JD, Osaba E (2021) Lightweight alternatives for hyper-parameter tuning in drifting data streams. In: 2021 International Conference on Data Mining, ICDM 2021 - Workshops, Auckland, New Zealand, December 7–10, 2021. IEEE, pp 304–311
McCullagh P, Nelder JA (1989) Generalized linear models. Springer, Berlin
Book Google Scholar
Mockus J, Tiesis V, Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. Towards Global Optimiz 2(117–129):2
Google Scholar
Montiel J, Halford M, Mastelini SM et al (2021) River: machine learning for streaming data in python. J Mach Learn Res 22:110:1-110:8
Google Scholar
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31
Article Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Article MathSciNet Google Scholar
Rahim MS, Imran AA, Ahmed T (2021) Mining the productivity data of garment industry. Int J Bus Intell Data Min 1(1):1
Google Scholar
Salam A, El Hibaoui A (2018) Comparison of machine learning algorithms for the power consumption prediction:-case study of Tetouan city. In: 2018 6th International renewable and sustainable energy conference (IRSEC), IEEE, pp 1–5
Sebastião R, Fernandes JM (2017) Supporting the page-hinkley test with empirical mode decomposition for change detection. In: Foundations of Intelligent Systems: 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings 23, Springer, pp 492–498
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 377–382
Sun Y, Pfahringer B, Gomes HM et al (2022) Soknl: a novel way of integrating k-nearest neighbours with adaptive random forest regression for data streams. Data Min Knowl Disc 36(5):2006–2032
Article MathSciNet Google Scholar
Veloso B, Gama J, Malheiro B (2018) Self hyper-parameter tuning for data streams. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 11198 LNAI:241—255
Veloso B, Gama J, Malheiro B et al (2021) Hyperparameter self-tuning for data streams. Inform Fusion 76:75–86
Article Google Scholar
Zhan H, Gomes G, Li XS, et al (2018) Efficient online hyperparameter optimization for kernel ridge regression with applications to traffic time series prediction. CoRR ar**v:abs/1811.00620
Zhan ZH, Shi L, Tan KC et al (2022) A survey on evolutionary computation for complex continuous optimization. Artif Intell Rev 55(1):59–110
Article Google Scholar

Download references

Acknowledgements

This work has been funded by the Ministry of Science and Innovation and the European Regional Development Fund, project PID2020-115832GB-I00, and by the Ministry of Universities, predoctoral grant FPU18/06307.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020.

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Cordoba, Cordoba, Spain
Antonio R. Moya & Sebastián Ventura
INESC TEC, Porto, Portugal
Bruno Veloso & João Gama
FEP, University of Porto, Porto, Portugal
Bruno Veloso & João Gama

Authors

Antonio R. Moya
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Veloso
View author publications
You can also search for this author in PubMed Google Scholar
João Gama
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Ventura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio R. Moya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Datasets

For the classification task, we used the following benchmark datasets:

Agra (Agrawal et al. 1993): Offers a stream generator in which its possible to change the concept drift by changing a classification function with ten possible values. It contains nine features, and 60000 inputs are generated, with a concept drift in position 30000 and a binary target variable.^{Footnote 2}
Bank (Moro et al. 2014): It includes data from marketing campaigns of a Portuguese banking institution. They try to discover if a term deposit will be subscribed to by the clients or not (binary target variable). It contains 17 attributes and 45211 instances.^{Footnote 3}
Sine (Gama et al. 2004): It is a sine generator. It generates four relevant features (between 0-1), with two relevant for classification and two optionally added as noise. It offers four classification functions to label the output. Changing these classification functions allows the creation of abrupt concept drift. 50000 instances are generated, with abrupt concept drift positioned since position 25000. Again, it is a dataset with two labels (binary target variable).
Tweet 500 (Bahri et al. 2020a): It consists of 100000 tweets of 500 attributes and two possible classes.^{Footnote 4}
Tweet 1000 (Bahri et al. 2020a): It consists of 100000 tweets of 1000 attributes and two possible classes.^{Footnote 5}
Postures (Gardner et al. 2014): Decision-related to 5 hand postures (multiclass target variable). To this end, a motion capture camera system recollects information from each posture.^{Footnote 6} A ’0’ value replaces missing values.
Nomao (Candillier and Lemaire 2012): It collects data about places (name, phone, location) from many sources. The classification consists of predicting if data belong to 2 spots.^{Footnote 7}
Enron (Bahri et al. 2020b): A cleaned version of a large set of emails aimed to detect fraud. This version has 1702 instances and 1000 attributes.^{Footnote 8}
SEA (Street and Kim 2001): Each instance consists of two integer attributes randomly chosen between 0 and 10. The result of adding them will suppose each instance to belong to one or another class. It allows abrupt concept drift by changing the conditions related to the threshold that separates both classes. We include 50000 instances and the appearance of concept drift after 25000 of them.^{Footnote 9}
Cardio (Dua and Graff 2017): Fetal heart rate (FHR) and uterine contraction (UC) features measured on cardiotocograph classified by expert obstetricians. We try to predict the FHR pattern class code (1 to 10). It counts with 2126 instances and 23 real attributes.^{Footnote 10}

For the regression task, we adopted the following benchmark datasets:

2DPlanes (Breiman et al. 1984): It is a synthetic dataset that contains 10 attributes. For our experiments, we generated 50000 synthetic data instances.
Ailerons^{Footnote 11}: It contains 13750 instances and 41 attributes.
Friedman (Ikonomovska et al. 2011): It is a synthetic dataset composed of 10 features between 0 and 1 uniformly sampled. We use a river option to trigger two global Recurring Abrupt drift points. We generate 70000 instances and choose 25000 and 50000 as these drift positions.^{Footnote 12}
Sgemm (Ballester-Ripoll et al. 2019): It measures the running time of a matrix product, measuring the future output to predict new products. It contains 14 attributes and 241600 instances.^{Footnote 13}
Transcoding (Deneke et al. 2014): Transcoding time prediction (related to online video). It contains 20 attributes regarding online video characteristics to this end.^{Footnote 14}
Appliances (Candanedo et al. 2017): It includes measurements from a low-energy building obtained with 10 min periods and related to 4.5 months. Appliances energy prediction is the target. It includes 19735 and 29 attributes.^{Footnote 15}
Bias (Cho et al. 2020): It contains meteorological forecast data and auxiliary geographical variables to make a temperature prediction. It presents 7750 instances with 25 attributes collected in Seoul, South Korea, in the summer.^{Footnote 16}
Metro^{Footnote 17}: It measures Metro Interstate Traffic Volume. It contains variables that impact the volume, like hourly weather features. It has 48204 and 9 numerical attributes.
Tetuan (Salam and El Hibaoui 2018): It includes nine numerical attributes and 52,417 instances in order to predict the power consumption of Tetouan city.^{Footnote 18}
Garment (Rahim et al. 2021): It is focused on productivity prediction of garment employees. It includes 1197 and 15 numerical attributes.^{Footnote 19}

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Moya, A.R., Veloso, B., Gama, J. et al. Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach. Data Min Knowl Disc 38, 1289–1315 (2024). https://doi.org/10.1007/s10618-023-00997-7

Download citation

Received: 09 February 2023
Accepted: 25 November 2023
Published: 21 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10618-023-00997-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self Hyper-Parameter Tuning for Data Streams

Self Hyper-parameter Tuning for Stream Classification Algorithms

A lightweight knowledge-based PSO for SVM hyper-parameters tuning in a dynamic environment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Datasets

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Self Hyper-Parameter Tuning for Data Streams

Self Hyper-parameter Tuning for Stream Classification Algorithms

A lightweight knowledge-based PSO for SVM hyper-parameters tuning in a dynamic environment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Datasets

Appendix: Datasets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation