Post deployment recycling of machine learning models

Patel, Harsh; Adams, Bram; Hassan, Ahmed E.

doi:10.1007/s10664-024-10492-2

Post deployment recycling of machine learning models

Don’t Throw Away Your Old Models!

Published: 15 June 2024

Volume 29, article number 100, (2024)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

89 Accesses
Explore all metrics

Abstract

Once a Machine Learning (ML) model is deployed, the same model is typically retrained from scratch, either on a scheduled interval or as soon as model drift is detected, to make sure the model reflects current data distributions and performance experiments. As such, once a new model is available, the old model typically is discarded. This paper challenges the notion of older models being useless by showing that old models still have substantial value compared to newly trained models, and by proposing novel post-deployment model recycling techniques that help make informed decisions on which old models to reuse and when to reuse. In an empirical study on eight long-lived Apache projects comprising a total of 84,343 commits, we analyze the performance of five model recycling strategies on three different types of Just-In-Time defect prediction models (Random Forest (RF), Logistic Regression (LR) and Neural Network (NN)). Comparison against traditional model retraining from scratch (RFS) shows that our approach significantly outperforms RFS in terms of recall, g-mean, AUC and F1 by up to a median of \(30\%\), \(20\%\), \(11\%\) and \(10\%\), respectively, with the best recycling strategy (Model Stacking) outperforming the baseline in over \(50\%\) of the projects. Our recycling strategies provide this performance improvement at the cost of a median of 2x to 6-17x slower time-to-inference compared to RFS, depending on the selected strategy and variant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction

Article Open access 13 September 2023

Exploring the relationship between performance metrics and cost saving potential of defect prediction models

Article Open access 27 September 2022

Performance evaluation of software defect prediction with NASA dataset using machine learning techniques

Article 04 October 2023

Data Availibility Statement

The replication package for this project which contains the code and data used can be found here Patel (2023).

Notes

References

Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Transactions on Neural Networks and Learning Systems 25(1):81–94
Article Google Scholar
Cabral GG, Minku LL (2022) Towards reliable online just-in-time software defect prediction. IEEE Transactions on Software Engineering 49(3):1342–1358
Article Google Scholar
Cabral GG, Minku LL, Shihab E, Mujahid S (2019) Class imbalance evolution and verification latency in just-in-time software defect prediction. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp 666–676. IEEE
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357
Article Google Scholar
Chen L, Zaharia M, Zou J (2023) Frugalgpt: How to use large language models while reducing cost and improving performance. ar**v:2305.05176
Chen Z, Liu B (2018) Lifelong machine learning, vol 1. Springer
Cruz YJ, Rivas M, Quiza R, Haber RE, Castaño F, Villalonga A (2022) A two-step machine learning approach for dynamic model selection: a case study on a micro milling process. Computers in Industry 143:103–764
Article Google Scholar
De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(7):3366–3385
Google Scholar
Diethe T, Borchert T, Thereska E, Balle B, Lawrence N (2019) Continual learning in practice. ar**v:1903.05202
Ekanayake J, Tappolet J, Gall HC, Bernstein A (2009) Tracking concept drift of software projects using defect prediction quality. In: 2009 6th IEEE international working conference on mining software repositories, pp 51–60. IEEE
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks 22(10):1517–1531
Article Google Scholar
Falessi D, Ahluwalia A, Penta MD (2011) The impact of dormant defects on defect prediction: a study of 19 apache projects. ACM Transactions on Software Engineering and Methodology (TOSEM) 31(1):1–26
Article Google Scholar
Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 313–324
Forman G (2006) Tackling concept drift by temporal inductive transfer. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 252–259
Gao S, Zhang H, Gao C, Wang C (2023) Kee** pace with ever-increasing data: towards continual learning of code intelligence models. ar**v:2302.03482
Herraiz I, Rodriguez D, Robles G, Gonzalez-Barahona JM (2013) The evolution of the laws of software evolution: a discussion based on a systematic literature review. ACM Computing Surveys (CSUR) 46(2):1–28
Article Google Scholar
Hess MR, Kromrey JD (2004) Robust confidence intervals for effect sizes: a comparative study of cohen’sd and cliff’s under non-normality and heterogeneous variances. In: Annual meeting of the American educational research association, vol 1. Citeseer
Hoang T, Dam HK, Kamei Y, Lo D, Ubayashi N (2019) Deepjit: an end-to-end deep learning framework for just-in-time defect prediction. In: 2019 IEEE/ACM 16th international conference on Mining Software Repositories (MSR), pp 34–45. IEEE
James G, Witten D, Hastie T, Tibshirani R et al (2013) An introduction to statistical learning, vol 112. Springer
Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: 2013 28th IEEE/ACM international conference on Automated Software Engineering (ASE), pp 279–289. Ieee
Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21:2072–2106
Article Google Scholar
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757–773
Article Google Scholar
Keshavarz H, Nagappan M (2022) Apachejit: a large dataset for just-in-time defect prediction. In: Proceedings of the 19th international conference on mining software repositories, pp 191–195
Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Transactions on Software Engineering 34(2):181–196
Article Google Scholar
Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: Do people and participation matter? In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 111–120. IEEE
Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Icml, vol 97, p 179. Citeseer
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18(17):1–5. http://jmlr.org/papers/v18/16-365
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30
Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555
Article Google Scholar
McIntosh S, Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. In: Proceedings of the 40th international conference on software engineering, pp 560–560
Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Technical Journal 5(2):169–180
Article Google Scholar
Olewicki D, Habchi S, Nayrolles M, Faramarzi M, Chandar S, Adams B (2023) Towards lifelong learning for software analytics models: empirical study on brown build and risk prediction. ar**v:2305.09824
Olewicki D, Nayrolles M, Adams B (2022) Towards language-independent brown build detection. In: Proceedings of the 44th International Conference on Software Engineering, pp 2177–2188
Paleyes A, Urma RG, Lawrence ND (2022) Challenges in deploying machine learning: a survey of case studies. ACM Computing Surveys 55(6):1–29
Article Google Scholar
Patel H (2023) Post-deployment model recycling. https://github.com/SAILResearch/replication-23-harsh-model_recycling
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12:2825–2830
MathSciNet Google Scholar
Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 31(4):497–508
Pornprasit C, Tantithamthavorn CK (2021) Jitline: a simpler, better, faster, finer-grained just-in-time defect prediction. In: 2021 IEEE/ACM 18th international conference on Mining Software Repositories (MSR), pp 369–379. IEEE
Quinonero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2008) Dataset shift in machine learning. Mit Press
Rajbahadur GK, Wang S, Oliva GA, Kamei Y, Hassan AE (2021) The impact of feature importance methods on the interpretation of defect classifiers. IEEE Transactions on Software Engineering 48(7):2245–2261
Article Google Scholar
Raschka S (2018) Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. Journal of open source software 3(24):638
Article Google Scholar
Schelter S, Biessmann F, Januschowski T, Salinas D, Seufert S, Szarvas G (2015) On challenges in machine learning model management
Song L, Minku L, Yao X (2023) On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction. Empirical Software Engineering
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382
Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in nlp. ar**v:1906.02243
Sun Y, Tang K, Zhu Z, Yao X (2018) Concept drift adaptation by exploiting historical knowledge. IEEE Transactions on Neural Networks and Learning Systems 29(10):4822–4832
Article Google Scholar
Tabassum S, Minku LL, Feng D (2022) Cross-project online just-in-time software defect prediction. IEEE Transactions on Software Engineering 49(1):268–287
Article Google Scholar
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 99–108. IEEE
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45(7):683–711
Article Google Scholar
Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Information fusion 9(1):56–68
Article Google Scholar
Weisstein EW (2004) Bonferroni correction. https://mathworld.wolfram.com/
Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259
Article Google Scholar
Woolson RF (2007) Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials pp 1–3
Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep just-in-time defect prediction: How far are we? In: Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, pp 427–438
Zhao Y, Damevski K, Chen H (2023) A systematic survey of just-in-time software defect prediction. ACM Computing Surveys 55(10):1–35
Article Google Scholar

Download references

Author information

Authors and Affiliations

Queen’s University, Kingston, Canada
Harsh Patel, Bram Adams & Ahmed E. Hassan

Authors

Harsh Patel
View author publications
You can also search for this author in PubMed Google Scholar
Bram Adams
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harsh Patel.

Ethics declarations

Conflicts of interest

All authors declare that they have no conflicts of interest.

Additional information

Communicated by: Minghui Zhou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Patel, H., Adams, B. & Hassan, A.E. Post deployment recycling of machine learning models. Empir Software Eng 29, 100 (2024). https://doi.org/10.1007/s10664-024-10492-2

Download citation

Accepted: 24 April 2024
Published: 15 June 2024
DOI: https://doi.org/10.1007/s10664-024-10492-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Post deployment recycling of machine learning models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction

Exploring the relationship between performance metrics and cost saving potential of defect prediction models

Performance evaluation of software defect prediction with NASA dataset using machine learning techniques

Data Availibility Statement

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Post deployment recycling of machine learning models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction

Exploring the relationship between performance metrics and cost saving potential of defect prediction models

Performance evaluation of software defect prediction with NASA dataset using machine learning techniques

Data Availibility Statement

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation