Log in

Post deployment recycling of machine learning models

Don’t Throw Away Your Old Models!

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Once a Machine Learning (ML) model is deployed, the same model is typically retrained from scratch, either on a scheduled interval or as soon as model drift is detected, to make sure the model reflects current data distributions and performance experiments. As such, once a new model is available, the old model typically is discarded. This paper challenges the notion of older models being useless by showing that old models still have substantial value compared to newly trained models, and by proposing novel post-deployment model recycling techniques that help make informed decisions on which old models to reuse and when to reuse. In an empirical study on eight long-lived Apache projects comprising a total of 84,343 commits, we analyze the performance of five model recycling strategies on three different types of Just-In-Time defect prediction models (Random Forest (RF), Logistic Regression (LR) and Neural Network (NN)). Comparison against traditional model retraining from scratch (RFS) shows that our approach significantly outperforms RFS in terms of recall, g-mean, AUC and F1 by up to a median of \(30\%\), \(20\%\), \(11\%\) and \(10\%\), respectively, with the best recycling strategy (Model Stacking) outperforming the baseline in over \(50\%\) of the projects. Our recycling strategies provide this performance improvement at the cost of a median of 2x to 6-17x slower time-to-inference compared to RFS, depending on the selected strategy and variant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 9
Algorithm 4
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availibility Statement

The replication package for this project which contains the code and data used can be found here Patel (2023).

Notes

  1. https://postindustria.com/how-much-data-is-required-for-machine-learning

  2. https://github.com/apache/camel

  3. https://camel.apache.org/releases/#camel

  4. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#key-insights

References

  • Brzezinski D, Stefanowski J (2013) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Transactions on Neural Networks and Learning Systems 25(1):81–94

    Article  Google Scholar 

  • Cabral GG, Minku LL (2022) Towards reliable online just-in-time software defect prediction. IEEE Transactions on Software Engineering 49(3):1342–1358

    Article  Google Scholar 

  • Cabral GG, Minku LL, Shihab E, Mujahid S (2019) Class imbalance evolution and verification latency in just-in-time software defect prediction. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp 666–676. IEEE

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–357

    Article  Google Scholar 

  • Chen L, Zaharia M, Zou J (2023) Frugalgpt: How to use large language models while reducing cost and improving performance. ar**v:2305.05176

  • Chen Z, Liu B (2018) Lifelong machine learning, vol 1. Springer

  • Cruz YJ, Rivas M, Quiza R, Haber RE, Castaño F, Villalonga A (2022) A two-step machine learning approach for dynamic model selection: a case study on a micro milling process. Computers in Industry 143:103–764

    Article  Google Scholar 

  • De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis A, Slabaugh G, Tuytelaars T (2021) A continual learning survey: defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(7):3366–3385

    Google Scholar 

  • Diethe T, Borchert T, Thereska E, Balle B, Lawrence N (2019) Continual learning in practice. ar**v:1903.05202

  • Ekanayake J, Tappolet J, Gall HC, Bernstein A (2009) Tracking concept drift of software projects using defect prediction quality. In: 2009 6th IEEE international working conference on mining software repositories, pp 51–60. IEEE

  • Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks 22(10):1517–1531

    Article  Google Scholar 

  • Falessi D, Ahluwalia A, Penta MD (2011) The impact of dormant defects on defect prediction: a study of 19 apache projects. ACM Transactions on Software Engineering and Methodology (TOSEM) 31(1):1–26

    Article  Google Scholar 

  • Falleri JR, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp 313–324

  • Forman G (2006) Tackling concept drift by temporal inductive transfer. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 252–259

  • Gao S, Zhang H, Gao C, Wang C (2023) Kee** pace with ever-increasing data: towards continual learning of code intelligence models. ar**v:2302.03482

  • Herraiz I, Rodriguez D, Robles G, Gonzalez-Barahona JM (2013) The evolution of the laws of software evolution: a discussion based on a systematic literature review. ACM Computing Surveys (CSUR) 46(2):1–28

    Article  Google Scholar 

  • Hess MR, Kromrey JD (2004) Robust confidence intervals for effect sizes: a comparative study of cohen’sd and cliff’s under non-normality and heterogeneous variances. In: Annual meeting of the American educational research association, vol 1. Citeseer

  • Hoang T, Dam HK, Kamei Y, Lo D, Ubayashi N (2019) Deepjit: an end-to-end deep learning framework for just-in-time defect prediction. In: 2019 IEEE/ACM 16th international conference on Mining Software Repositories (MSR), pp 34–45. IEEE

  • James G, Witten D, Hastie T, Tibshirani R et al (2013) An introduction to statistical learning, vol 112. Springer

  • Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: 2013 28th IEEE/ACM international conference on Automated Software Engineering (ASE), pp 279–289. Ieee

  • Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering 21:2072–2106

    Article  Google Scholar 

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2012) A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39(6):757–773

    Article  Google Scholar 

  • Keshavarz H, Nagappan M (2022) Apachejit: a large dataset for just-in-time defect prediction. In: Proceedings of the 19th international conference on mining software repositories, pp 191–195

  • Kim S, Whitehead EJ, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Transactions on Software Engineering 34(2):181–196

    Article  Google Scholar 

  • Kononenko O, Baysal O, Guerrouj L, Cao Y, Godfrey MW (2015) Investigating code review quality: Do people and participation matter? In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 111–120. IEEE

  • Kubat M, Matwin S et al (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Icml, vol 97, p 179. Citeseer

  • Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 18(17):1–5. http://jmlr.org/papers/v18/16-365

  • Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Advances in neural information processing systems 30

  • Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555

    Article  Google Scholar 

  • McIntosh S, Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. In: Proceedings of the 40th international conference on software engineering, pp 560–560

  • Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Technical Journal 5(2):169–180

    Article  Google Scholar 

  • Olewicki D, Habchi S, Nayrolles M, Faramarzi M, Chandar S, Adams B (2023) Towards lifelong learning for software analytics models: empirical study on brown build and risk prediction. ar**v:2305.09824

  • Olewicki D, Nayrolles M, Adams B (2022) Towards language-independent brown build detection. In: Proceedings of the 44th International Conference on Software Engineering, pp 2177–2188

  • Paleyes A, Urma RG, Lawrence ND (2022) Challenges in deploying machine learning: a survey of case studies. ACM Computing Surveys 55(6):1–29

    Article  Google Scholar 

  • Patel H (2023) Post-deployment model recycling. https://github.com/SAILResearch/replication-23-harsh-model_recycling

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12:2825–2830

    MathSciNet  Google Scholar 

  • Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 31(4):497–508

  • Pornprasit C, Tantithamthavorn CK (2021) Jitline: a simpler, better, faster, finer-grained just-in-time defect prediction. In: 2021 IEEE/ACM 18th international conference on Mining Software Repositories (MSR), pp 369–379. IEEE

  • Quinonero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2008) Dataset shift in machine learning. Mit Press

  • Rajbahadur GK, Wang S, Oliva GA, Kamei Y, Hassan AE (2021) The impact of feature importance methods on the interpretation of defect classifiers. IEEE Transactions on Software Engineering 48(7):2245–2261

    Article  Google Scholar 

  • Raschka S (2018) Mlxtend: providing machine learning and data science utilities and extensions to python’s scientific computing stack. Journal of open source software 3(24):638

    Article  Google Scholar 

  • Schelter S, Biessmann F, Januschowski T, Salinas D, Seufert S, Szarvas G (2015) On challenges in machine learning model management

  • Song L, Minku L, Yao X (2023) On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction. Empirical Software Engineering

  • Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382

  • Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in nlp. ar**v:1906.02243

  • Sun Y, Tang K, Zhu Z, Yao X (2018) Concept drift adaptation by exploiting historical knowledge. IEEE Transactions on Neural Networks and Learning Systems 29(10):4822–4832

    Article  Google Scholar 

  • Tabassum S, Minku LL, Feng D (2022) Cross-project online just-in-time software defect prediction. IEEE Transactions on Software Engineering 49(1):268–287

    Article  Google Scholar 

  • Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 99–108. IEEE

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45(7):683–711

    Article  Google Scholar 

  • Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Information fusion 9(1):56–68

    Article  Google Scholar 

  • Weisstein EW (2004) Bonferroni correction. https://mathworld.wolfram.com/

  • Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259

    Article  Google Scholar 

  • Woolson RF (2007) Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials pp 1–3

  • Zeng Z, Zhang Y, Zhang H, Zhang L (2021) Deep just-in-time defect prediction: How far are we? In: Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis, pp 427–438

  • Zhao Y, Damevski K, Chen H (2023) A systematic survey of just-in-time software defect prediction. ACM Computing Surveys 55(10):1–35

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsh Patel.

Ethics declarations

Conflicts of interest

All authors declare that they have no conflicts of interest.

Additional information

Communicated by: Minghui Zhou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patel, H., Adams, B. & Hassan, A.E. Post deployment recycling of machine learning models. Empir Software Eng 29, 100 (2024). https://doi.org/10.1007/s10664-024-10492-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10492-2

Keywords

Navigation