Log in

Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitrarily over time. On the other hand, dealing with data streams and online learning is a challenging problem. In fact, the higher the technology goes, the greater the importance of sophisticated techniques to process these data streams. Thus, improving hyper-parameter self-tuning during online learning of these machine learning models is crucial. To this end, in this paper, we present MESSPT, an evolutionary algorithm for self-hyper-parameter tuning for data streams. We apply Differential Evolution to dynamically-sized samples, requiring a single pass-over of data to train and evaluate models and choose the best configurations. We take care of the number of configurations to be evaluated, which necessarily has to be reduced, thus making this evolutionary approach a micro-evolutionary one. Furthermore, we control how our evolutionary algorithm deals with concept drift. Experiments on different learning tasks and over well-known datasets show that our proposed MESSPT outperforms the state-of-the-art on hyper-parameter tuning for data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Algorithm 4
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Code available: https://gitfront.io/r/user-9515996/T1tZ5eH3W949/MESSPT/.

  2. https://riverml.xyz/0.14.0/api/datasets/synth/Agrawal/.

  3. http://archive.ics.uci.edu/ml/datasets/Bank+Marketing.

  4. https://github.com/marouabahri/CS-ARF.

  5. https://github.com/marouabahri/CS-ARF.

  6. https://archive.ics.uci.edu/ml/datasets/MoCap+Hand+Postures.

  7. https://github.com/marouabahri/CS-ARF/tree/master/datasets.

  8. https://github.com/marouabahri/CS-kNN/tree/master/datasets.

  9. https://riverml.xyz/0.14.0/api/datasets/synth/SEA/.

  10. https://archive.ics.uci.edu/ml/datasets/Cardiotocography.

  11. https://github.com/renatopp/arff-datasets/blob/master/regression/2dplanes.arff.

  12. https://riverml.xyz/0.14.0/api/datasets/synth/FriedmanDrift/.

  13. http://archive.ics.uci.edu/ml/datasets/SGEMM+GPU+kernel+performance.

  14. https://archive.ics.uci.edu/ml/datasets/Online+Video+Characteristics+and+Transcoding+Time+Dataset.

  15. https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction.

  16. https://archive.ics.uci.edu/ml/datasets/Bias+correction+of+numerical+prediction+model+temperature+forecast.

  17. https://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume.

  18. https://archive.ics.uci.edu/ml/datasets/Power+consumption+of+Tetouan+city.

  19. https://archive.ics.uci.edu/ml/datasets/Productivity+Prediction+of+Garment+Employees.

References

  • Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925

    Article  Google Scholar 

  • Bäck T (1996) Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford

    Book  Google Scholar 

  • Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, et al (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, Citeseer, pp 77–86

  • Bahri M, Gomes HM, Bifet A, et al (2020a) Cs-arf: compressed adaptive random forests for evolving data stream classification. In: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

  • Bahri M, Maniu S, Bifet A, et al (2020b) Compressed k-nearest neighbors ensembles for evolving data streams. In: ECAI 2020-24th European conference on artificial intelligence

  • Bahri M, Bifet A, Gama J et al (2021) Data stream analysis: foundations, major tasks and tools. Wiley Interdiscipl Rev: Data Min Knowl Discov 11(3):e1405

    Google Scholar 

  • Bakhashwain N, Sagheer A (2020) Online tuning of hyperparameters in deep LSTM for time series applications. Int J Intell Eng Syst 14(1):212–220

    Google Scholar 

  • Ballester-Ripoll R, Paredes EG, Pajarola R (2019) Sobol tensor trains for global sensitivity analysis. Reliab Eng Syst Safety 183:311–322

    Article  Google Scholar 

  • Barros RC, Basgalupp MP, De Carvalho ACPLF et al (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(3):291–312

    Article  Google Scholar 

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281–305

    MathSciNet  Google Scholar 

  • Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining, SIAM, pp 443–448

  • Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA, pp 443–448

  • Breiman L, Friedman JH, Olshen RA et al (1984) Classification and regression trees. Wadsworth, Belmont

    Google Scholar 

  • Candanedo LM, Feldheim V, Deramaix D (2017) Data driven prediction models of energy use of appliances in a low-energy house. Energy Build 140:81–97

    Article  Google Scholar 

  • Candillier L, Lemaire V (2012) Design and analysis of the nomao challenge - active learning in the real-world. In: Proceedings of the ALRA : active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK

  • Celik B, Vanschoren J (2021) Adaptation strategies for automated machine learning on evolving data. IEEE Trans Pattern Anal Mach Intell 43(9):3067–3078

    Article  Google Scholar 

  • Cho D, Yoo C, Im J et al (2020) Comparative assessment of various machine learning-based bias correction methods for numerical weather prediction model forecasts of extreme air temperatures in urban areas. Earth Space Sci 7(4):2019000740

    Article  Google Scholar 

  • Coello CAC, Pulido GT (2001) A micro-genetic algorithm for multiobjective optimization. In: Evolutionary multi-criterion optimization, first international conference, EMO 2001, Zurich, Switzerland, March 7-9, 2001, Proceedings, Lecture Notes in Computer Science, vol 1993. Springer, pp 126–140

  • Das S, Suganthan PN (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):4–31

    Article  Google Scholar 

  • Deneke T, Haile H, Lafond S, et al (2014) Video transcoding time prediction for proactive load balancing. In: Multimedia and expo (ICME), 2014 IEEE International Conference on, pp 1–6

  • Dua D, Graff C (2017) UCI machine learning repository

  • Duarte J, Gama J, Bifet A (2016) Adaptive model rules from high-speed data streams. ACM Trans Knowl Discov Data 10(3):30:1-30:22

    Article  Google Scholar 

  • Frias-Blanco I, del Campo-Ávila J, Ramos-Jimenez G et al (2014) Online and non-parametric drift detection methods based on hoeffding’s bounds. IEEE Trans Knowl Data Eng 27(3):810–823

    Article  Google Scholar 

  • Galletly J (1998) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Kybernetes 27(8):979–980

    Article  Google Scholar 

  • Gama J, Medas P, Castillo G, et al (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence, Springer, pp 286–295

  • Gama J, Žliobaitė I, Bifet A et al (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37

    Article  Google Scholar 

  • Garcia S, Herrera F (2008) An extension on" statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. J Mach Learn Res 9(12):2677–2694

    Google Scholar 

  • Gardner A, Duncan CA, Kanno J, et al (2014) 3d hand posture recognition from small unlabeled point sets. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 164–169

  • Guliashki V, Toshev H, Korsemov C (2009) Survey of evolutionary algorithms used in multiobjective optimization. Probl Eng Cybernet Robot 60(1):42–54

    MathSciNet  Google Scholar 

  • Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evol Comput 1(3):111–128

    Article  Google Scholar 

  • Hruschka ER, Campello RJ, Freitas AA et al (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):133–155

    Article  Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 97–106

  • Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Disc 23(1):128–168

    Article  MathSciNet  Google Scholar 

  • Imbrea A (2021) Automated machine learning techniques for data streams. CoRR ar**v:abs/2106.07317

  • Koza JR (1995) Survey of genetic algorithms and genetic programming. pp. 589–594

  • Kulbach C, Montiel J, Bahri M, et al (2022) Evolution-based online automated machine learning. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 13280 LNAI:472 – 484

  • Lacombe T, Koh YS, Dobbie G, et al (2021) A meta-learning approach for automated hyperparameter tuning in evolving data streams. In: International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, July 18–22, 2021. IEEE, pp 1–8

  • Lerman P (1980) Fitting segmented regression models by grid search. J Roy Stat Soc: Ser C (Appl Stat) 29(1):77–84

    MathSciNet  Google Scholar 

  • Lin C, Guo M, Li C, et al (2019) Online hyper-parameter learning for auto-augmentation strategy. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27—November 2, 2019. IEEE, pp 6578–6587

  • Lobo JL, Ser JD, Osaba E (2021) Lightweight alternatives for hyper-parameter tuning in drifting data streams. In: 2021 International Conference on Data Mining, ICDM 2021 - Workshops, Auckland, New Zealand, December 7–10, 2021. IEEE, pp 304–311

  • McCullagh P, Nelder JA (1989) Generalized linear models. Springer, Berlin

    Book  Google Scholar 

  • Mockus J, Tiesis V, Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. Towards Global Optimiz 2(117–129):2

    Google Scholar 

  • Montiel J, Halford M, Mastelini SM et al (2021) River: machine learning for streaming data in python. J Mach Learn Res 22:110:1-110:8

    Google Scholar 

  • Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31

    Article  Google Scholar 

  • Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313

    Article  MathSciNet  Google Scholar 

  • Rahim MS, Imran AA, Ahmed T (2021) Mining the productivity data of garment industry. Int J Bus Intell Data Min 1(1):1

    Google Scholar 

  • Salam A, El Hibaoui A (2018) Comparison of machine learning algorithms for the power consumption prediction:-case study of Tetouan city. In: 2018 6th International renewable and sustainable energy conference (IRSEC), IEEE, pp 1–5

  • Sebastião R, Fernandes JM (2017) Supporting the page-hinkley test with empirical mode decomposition for change detection. In: Foundations of Intelligent Systems: 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings 23, Springer, pp 492–498

  • Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 377–382

  • Sun Y, Pfahringer B, Gomes HM et al (2022) Soknl: a novel way of integrating k-nearest neighbours with adaptive random forest regression for data streams. Data Min Knowl Disc 36(5):2006–2032

    Article  MathSciNet  Google Scholar 

  • Veloso B, Gama J, Malheiro B (2018) Self hyper-parameter tuning for data streams. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 11198 LNAI:241—255

  • Veloso B, Gama J, Malheiro B et al (2021) Hyperparameter self-tuning for data streams. Inform Fusion 76:75–86

    Article  Google Scholar 

  • Zhan H, Gomes G, Li XS, et al (2018) Efficient online hyperparameter optimization for kernel ridge regression with applications to traffic time series prediction. CoRR ar**v:abs/1811.00620

  • Zhan ZH, Shi L, Tan KC et al (2022) A survey on evolutionary computation for complex continuous optimization. Artif Intell Rev 55(1):59–110

    Article  Google Scholar 

Download references

Acknowledgements

This work has been funded by the Ministry of Science and Innovation and the European Regional Development Fund, project PID2020-115832GB-I00, and by the Ministry of Universities, predoctoral grant FPU18/06307.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio R. Moya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Datasets

Appendix: Datasets

For the classification task, we used the following benchmark datasets:

  • Agra (Agrawal et al. 1993): Offers a stream generator in which its possible to change the concept drift by changing a classification function with ten possible values. It contains nine features, and 60000 inputs are generated, with a concept drift in position 30000 and a binary target variable.Footnote 2

  • Bank (Moro et al. 2014): It includes data from marketing campaigns of a Portuguese banking institution. They try to discover if a term deposit will be subscribed to by the clients or not (binary target variable). It contains 17 attributes and 45211 instances.Footnote 3

  • Sine (Gama et al. 2004): It is a sine generator. It generates four relevant features (between 0-1), with two relevant for classification and two optionally added as noise. It offers four classification functions to label the output. Changing these classification functions allows the creation of abrupt concept drift. 50000 instances are generated, with abrupt concept drift positioned since position 25000. Again, it is a dataset with two labels (binary target variable).

  • Tweet 500 (Bahri et al. 2020a): It consists of 100000 tweets of 500 attributes and two possible classes.Footnote 4

  • Tweet 1000 (Bahri et al. 2020a): It consists of 100000 tweets of 1000 attributes and two possible classes.Footnote 5

  • Postures (Gardner et al. 2014): Decision-related to 5 hand postures (multiclass target variable). To this end, a motion capture camera system recollects information from each posture.Footnote 6 A ’0’ value replaces missing values.

  • Nomao (Candillier and Lemaire 2012): It collects data about places (name, phone, location) from many sources. The classification consists of predicting if data belong to 2 spots.Footnote 7

  • Enron (Bahri et al. 2020b): A cleaned version of a large set of emails aimed to detect fraud. This version has 1702 instances and 1000 attributes.Footnote 8

  • SEA (Street and Kim 2001): Each instance consists of two integer attributes randomly chosen between 0 and 10. The result of adding them will suppose each instance to belong to one or another class. It allows abrupt concept drift by changing the conditions related to the threshold that separates both classes. We include 50000 instances and the appearance of concept drift after 25000 of them.Footnote 9

  • Cardio (Dua and Graff 2017): Fetal heart rate (FHR) and uterine contraction (UC) features measured on cardiotocograph classified by expert obstetricians. We try to predict the FHR pattern class code (1 to 10). It counts with 2126 instances and 23 real attributes.Footnote 10

For the regression task, we adopted the following benchmark datasets:

  • 2DPlanes (Breiman et al. 1984): It is a synthetic dataset that contains 10 attributes. For our experiments, we generated 50000 synthetic data instances.

  • AileronsFootnote 11: It contains 13750 instances and 41 attributes.

  • Friedman (Ikonomovska et al. 2011): It is a synthetic dataset composed of 10 features between 0 and 1 uniformly sampled. We use a river option to trigger two global Recurring Abrupt drift points. We generate 70000 instances and choose 25000 and 50000 as these drift positions.Footnote 12

  • Sgemm (Ballester-Ripoll et al. 2019): It measures the running time of a matrix product, measuring the future output to predict new products. It contains 14 attributes and 241600 instances.Footnote 13

  • Transcoding (Deneke et al. 2014): Transcoding time prediction (related to online video). It contains 20 attributes regarding online video characteristics to this end.Footnote 14

  • Appliances (Candanedo et al. 2017): It includes measurements from a low-energy building obtained with 10 min periods and related to 4.5 months. Appliances energy prediction is the target. It includes 19735 and 29 attributes.Footnote 15

  • Bias (Cho et al. 2020): It contains meteorological forecast data and auxiliary geographical variables to make a temperature prediction. It presents 7750 instances with 25 attributes collected in Seoul, South Korea, in the summer.Footnote 16

  • MetroFootnote 17: It measures Metro Interstate Traffic Volume. It contains variables that impact the volume, like hourly weather features. It has 48204 and 9 numerical attributes.

  • Tetuan (Salam and El Hibaoui 2018): It includes nine numerical attributes and 52,417 instances in order to predict the power consumption of Tetouan city.Footnote 18

  • Garment (Rahim et al. 2021): It is focused on productivity prediction of garment employees. It includes 1197 and 15 numerical attributes.Footnote 19

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moya, A.R., Veloso, B., Gama, J. et al. Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach. Data Min Knowl Disc 38, 1289–1315 (2024). https://doi.org/10.1007/s10618-023-00997-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00997-7

Keywords

Navigation