Abstract
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitrarily over time. On the other hand, dealing with data streams and online learning is a challenging problem. In fact, the higher the technology goes, the greater the importance of sophisticated techniques to process these data streams. Thus, improving hyper-parameter self-tuning during online learning of these machine learning models is crucial. To this end, in this paper, we present MESSPT, an evolutionary algorithm for self-hyper-parameter tuning for data streams. We apply Differential Evolution to dynamically-sized samples, requiring a single pass-over of data to train and evaluate models and choose the best configurations. We take care of the number of configurations to be evaluated, which necessarily has to be reduced, thus making this evolutionary approach a micro-evolutionary one. Furthermore, we control how our evolutionary algorithm deals with concept drift. Experiments on different learning tasks and over well-known datasets show that our proposed MESSPT outperforms the state-of-the-art on hyper-parameter tuning for data streams.
Similar content being viewed by others
Notes
Code available: https://gitfront.io/r/user-9515996/T1tZ5eH3W949/MESSPT/.
References
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Bäck T (1996) Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms. Oxford University Press, Oxford
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, et al (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, Citeseer, pp 77–86
Bahri M, Gomes HM, Bifet A, et al (2020a) Cs-arf: compressed adaptive random forests for evolving data stream classification. In: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
Bahri M, Maniu S, Bifet A, et al (2020b) Compressed k-nearest neighbors ensembles for evolving data streams. In: ECAI 2020-24th European conference on artificial intelligence
Bahri M, Bifet A, Gama J et al (2021) Data stream analysis: foundations, major tasks and tools. Wiley Interdiscipl Rev: Data Min Knowl Discov 11(3):e1405
Bakhashwain N, Sagheer A (2020) Online tuning of hyperparameters in deep LSTM for time series applications. Int J Intell Eng Syst 14(1):212–220
Ballester-Ripoll R, Paredes EG, Pajarola R (2019) Sobol tensor trains for global sensitivity analysis. Reliab Eng Syst Safety 183:311–322
Barros RC, Basgalupp MP, De Carvalho ACPLF et al (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(3):291–312
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281–305
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining, SIAM, pp 443–448
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA, pp 443–448
Breiman L, Friedman JH, Olshen RA et al (1984) Classification and regression trees. Wadsworth, Belmont
Candanedo LM, Feldheim V, Deramaix D (2017) Data driven prediction models of energy use of appliances in a low-energy house. Energy Build 140:81–97
Candillier L, Lemaire V (2012) Design and analysis of the nomao challenge - active learning in the real-world. In: Proceedings of the ALRA : active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK
Celik B, Vanschoren J (2021) Adaptation strategies for automated machine learning on evolving data. IEEE Trans Pattern Anal Mach Intell 43(9):3067–3078
Cho D, Yoo C, Im J et al (2020) Comparative assessment of various machine learning-based bias correction methods for numerical weather prediction model forecasts of extreme air temperatures in urban areas. Earth Space Sci 7(4):2019000740
Coello CAC, Pulido GT (2001) A micro-genetic algorithm for multiobjective optimization. In: Evolutionary multi-criterion optimization, first international conference, EMO 2001, Zurich, Switzerland, March 7-9, 2001, Proceedings, Lecture Notes in Computer Science, vol 1993. Springer, pp 126–140
Das S, Suganthan PN (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol Comput 15(1):4–31
Deneke T, Haile H, Lafond S, et al (2014) Video transcoding time prediction for proactive load balancing. In: Multimedia and expo (ICME), 2014 IEEE International Conference on, pp 1–6
Dua D, Graff C (2017) UCI machine learning repository
Duarte J, Gama J, Bifet A (2016) Adaptive model rules from high-speed data streams. ACM Trans Knowl Discov Data 10(3):30:1-30:22
Frias-Blanco I, del Campo-Ávila J, Ramos-Jimenez G et al (2014) Online and non-parametric drift detection methods based on hoeffding’s bounds. IEEE Trans Knowl Data Eng 27(3):810–823
Galletly J (1998) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Kybernetes 27(8):979–980
Gama J, Medas P, Castillo G, et al (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence, Springer, pp 286–295
Gama J, Žliobaitė I, Bifet A et al (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):1–37
Garcia S, Herrera F (2008) An extension on" statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. J Mach Learn Res 9(12):2677–2694
Gardner A, Duncan CA, Kanno J, et al (2014) 3d hand posture recognition from small unlabeled point sets. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC), IEEE, pp 164–169
Guliashki V, Toshev H, Korsemov C (2009) Survey of evolutionary algorithms used in multiobjective optimization. Probl Eng Cybernet Robot 60(1):42–54
Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evol Comput 1(3):111–128
Hruschka ER, Campello RJ, Freitas AA et al (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):133–155
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 97–106
Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Disc 23(1):128–168
Imbrea A (2021) Automated machine learning techniques for data streams. CoRR ar**v:abs/2106.07317
Koza JR (1995) Survey of genetic algorithms and genetic programming. pp. 589–594
Kulbach C, Montiel J, Bahri M, et al (2022) Evolution-based online automated machine learning. Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 13280 LNAI:472 – 484
Lacombe T, Koh YS, Dobbie G, et al (2021) A meta-learning approach for automated hyperparameter tuning in evolving data streams. In: International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, July 18–22, 2021. IEEE, pp 1–8
Lerman P (1980) Fitting segmented regression models by grid search. J Roy Stat Soc: Ser C (Appl Stat) 29(1):77–84
Lin C, Guo M, Li C, et al (2019) Online hyper-parameter learning for auto-augmentation strategy. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27—November 2, 2019. IEEE, pp 6578–6587
Lobo JL, Ser JD, Osaba E (2021) Lightweight alternatives for hyper-parameter tuning in drifting data streams. In: 2021 International Conference on Data Mining, ICDM 2021 - Workshops, Auckland, New Zealand, December 7–10, 2021. IEEE, pp 304–311
McCullagh P, Nelder JA (1989) Generalized linear models. Springer, Berlin
Mockus J, Tiesis V, Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. Towards Global Optimiz 2(117–129):2
Montiel J, Halford M, Mastelini SM et al (2021) River: machine learning for streaming data in python. J Mach Learn Res 22:110:1-110:8
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Rahim MS, Imran AA, Ahmed T (2021) Mining the productivity data of garment industry. Int J Bus Intell Data Min 1(1):1
Salam A, El Hibaoui A (2018) Comparison of machine learning algorithms for the power consumption prediction:-case study of Tetouan city. In: 2018 6th International renewable and sustainable energy conference (IRSEC), IEEE, pp 1–5
Sebastião R, Fernandes JM (2017) Supporting the page-hinkley test with empirical mode decomposition for change detection. In: Foundations of Intelligent Systems: 23rd International Symposium, ISMIS 2017, Warsaw, Poland, June 26-29, 2017, Proceedings 23, Springer, pp 492–498
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 377–382
Sun Y, Pfahringer B, Gomes HM et al (2022) Soknl: a novel way of integrating k-nearest neighbours with adaptive random forest regression for data streams. Data Min Knowl Disc 36(5):2006–2032
Veloso B, Gama J, Malheiro B (2018) Self hyper-parameter tuning for data streams. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 11198 LNAI:241—255
Veloso B, Gama J, Malheiro B et al (2021) Hyperparameter self-tuning for data streams. Inform Fusion 76:75–86
Zhan H, Gomes G, Li XS, et al (2018) Efficient online hyperparameter optimization for kernel ridge regression with applications to traffic time series prediction. CoRR ar**v:abs/1811.00620
Zhan ZH, Shi L, Tan KC et al (2022) A survey on evolutionary computation for complex continuous optimization. Artif Intell Rev 55(1):59–110
Acknowledgements
This work has been funded by the Ministry of Science and Innovation and the European Regional Development Fund, project PID2020-115832GB-I00, and by the Ministry of Universities, predoctoral grant FPU18/06307.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Datasets
Appendix: Datasets
For the classification task, we used the following benchmark datasets:
-
Agra (Agrawal et al. 1993): Offers a stream generator in which its possible to change the concept drift by changing a classification function with ten possible values. It contains nine features, and 60000 inputs are generated, with a concept drift in position 30000 and a binary target variable.Footnote 2
-
Bank (Moro et al. 2014): It includes data from marketing campaigns of a Portuguese banking institution. They try to discover if a term deposit will be subscribed to by the clients or not (binary target variable). It contains 17 attributes and 45211 instances.Footnote 3
-
Sine (Gama et al. 2004): It is a sine generator. It generates four relevant features (between 0-1), with two relevant for classification and two optionally added as noise. It offers four classification functions to label the output. Changing these classification functions allows the creation of abrupt concept drift. 50000 instances are generated, with abrupt concept drift positioned since position 25000. Again, it is a dataset with two labels (binary target variable).
-
Tweet 500 (Bahri et al. 2020a): It consists of 100000 tweets of 500 attributes and two possible classes.Footnote 4
-
Tweet 1000 (Bahri et al. 2020a): It consists of 100000 tweets of 1000 attributes and two possible classes.Footnote 5
-
Postures (Gardner et al. 2014): Decision-related to 5 hand postures (multiclass target variable). To this end, a motion capture camera system recollects information from each posture.Footnote 6 A ’0’ value replaces missing values.
-
Nomao (Candillier and Lemaire 2012): It collects data about places (name, phone, location) from many sources. The classification consists of predicting if data belong to 2 spots.Footnote 7
-
Enron (Bahri et al. 2020b): A cleaned version of a large set of emails aimed to detect fraud. This version has 1702 instances and 1000 attributes.Footnote 8
-
SEA (Street and Kim 2001): Each instance consists of two integer attributes randomly chosen between 0 and 10. The result of adding them will suppose each instance to belong to one or another class. It allows abrupt concept drift by changing the conditions related to the threshold that separates both classes. We include 50000 instances and the appearance of concept drift after 25000 of them.Footnote 9
-
Cardio (Dua and Graff 2017): Fetal heart rate (FHR) and uterine contraction (UC) features measured on cardiotocograph classified by expert obstetricians. We try to predict the FHR pattern class code (1 to 10). It counts with 2126 instances and 23 real attributes.Footnote 10
For the regression task, we adopted the following benchmark datasets:
-
2DPlanes (Breiman et al. 1984): It is a synthetic dataset that contains 10 attributes. For our experiments, we generated 50000 synthetic data instances.
-
AileronsFootnote 11: It contains 13750 instances and 41 attributes.
-
Friedman (Ikonomovska et al. 2011): It is a synthetic dataset composed of 10 features between 0 and 1 uniformly sampled. We use a river option to trigger two global Recurring Abrupt drift points. We generate 70000 instances and choose 25000 and 50000 as these drift positions.Footnote 12
-
Sgemm (Ballester-Ripoll et al. 2019): It measures the running time of a matrix product, measuring the future output to predict new products. It contains 14 attributes and 241600 instances.Footnote 13
-
Transcoding (Deneke et al. 2014): Transcoding time prediction (related to online video). It contains 20 attributes regarding online video characteristics to this end.Footnote 14
-
Appliances (Candanedo et al. 2017): It includes measurements from a low-energy building obtained with 10 min periods and related to 4.5 months. Appliances energy prediction is the target. It includes 19735 and 29 attributes.Footnote 15
-
Bias (Cho et al. 2020): It contains meteorological forecast data and auxiliary geographical variables to make a temperature prediction. It presents 7750 instances with 25 attributes collected in Seoul, South Korea, in the summer.Footnote 16
-
MetroFootnote 17: It measures Metro Interstate Traffic Volume. It contains variables that impact the volume, like hourly weather features. It has 48204 and 9 numerical attributes.
-
Tetuan (Salam and El Hibaoui 2018): It includes nine numerical attributes and 52,417 instances in order to predict the power consumption of Tetouan city.Footnote 18
-
Garment (Rahim et al. 2021): It is focused on productivity prediction of garment employees. It includes 1197 and 15 numerical attributes.Footnote 19
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Moya, A.R., Veloso, B., Gama, J. et al. Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approach. Data Min Knowl Disc 38, 1289–1315 (2024). https://doi.org/10.1007/s10618-023-00997-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00997-7