An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center

Garg, Sheetal; Ahuja, Rohit; Singh, Raman; Perl, Ivan

doi:10.1007/s10586-023-04258-6

An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center

Published: 06 February 2024

(2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Sheetal Garg¹,
Rohit Ahuja¹,
Raman Singh² &
…
Ivan Perl³

188 Accesses
1 Citation
Explore all metrics

Abstract

Given the rise in demand for cloud computing in the modern era, the effectiveness of resource utilization is eminent to decrease energy footprint and achieve economic services. With the emerging machine learning and artificial intelligence techniques to model and predict, it is essential to explore a principal method that provides the best solution for the accurate provisioning of forthcoming requests in a cloud data center. Recent studies used machine learning and other advanced analytics to predict resource usage; however, these do not consider long-range dependencies in the time series, which is essential to capture for better prediction. Further, they show limitations in handling noise, missing values, and outliers in datasets. In this paper, we explored the problem by studying three techniques that enabled us to answer improvements in short-term forecasting of physical machines’ resource usage if the above factors are considered. We evaluated the predictions using Transformer and Informer deep learning models that cover the above aspects and compared them with the Long short-term memory (LSTM) model. We used a real-world Google cluster trace usage dataset and employed Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to select heterogeneous machines. The evaluation of the three models depicts that the Transformer architecture that considers long-range dependencies in time series and shortcomings with datasets shows improvement in forecasting with 14.2% reduction in RMSE than LSTM. However, LSTM shows better results for some machines than the Transformer, which depicts the importance of input sequence order. The Informer model, which considers both dependencies and is a hybrid of LSTM and Transformer, outperformed both models with 21.7% from LSTM and 20.8% from Transformer reduction in RMSE. The results also depict Informer model consistently performs better than the other models across all subsets of the dataset. Our study proves that considering long-range dependencies and sequence ordering for resource usage time series improves the prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Cloud Resources Usage Prediction Using Deep Learning Models

Deep learning-based multivariate resource utilization prediction for hotspots and coldspots mitigation in green cloud data centers

Article 07 October 2021

Workload Time Series Cumulative Prediction Mechanism for Cloud Resources Using Neural Machine Translation Technique

Article Open access 04 May 2022

Data availability

The Google cluster trace usage dataset is publicly available at https://github.com/google/clusterdata/blob/master/ClusterData2019

References

Abid, A., Manzoor, M.F., Farooq, M.S., Farooq, U., Hussain, M.: Challenges and issues of resource allocation techniques in cloud computing. KSII Trans. Internet Inf. Syst. (TIIS) 14(7), 2815–2839 (2020). https://doi.org/10.3837/tiis.2020.07.005
Article Google Scholar
Madni, S.H.H., Latiff, M.S.A., Coulibaly, Y., Abdulhamid, S.M.: Recent advancements in resource allocation techniques for cloud computing environment: a systematic review. Clust. Comput. 20, 2489–2533 (2017). https://doi.org/10.1007/s10586-016-0684-4
Article Google Scholar
Hameed, A., Khoshkbarforoushha, A., Ranjan, R., Jayaraman, P.P., Kolodziej, J., Balaji, P., Zeadally, S., Malluhi, Q.M., Tziritas, N., Vishnu, A., et al.: A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems. Computing 98, 751–774 (2016). https://doi.org/10.1007/s00607-014-0407-8
Article MathSciNet Google Scholar
Ahuja, R., Mohanty, S.K.: A scalable attribute-based access control scheme with flexible delegation cum sharing of access privileges for cloud storage. IEEE Trans. Cloud Comput. 8(1), 32–44 (2017)
Article Google Scholar
Calheiros, R.N., Masoumi, E., Ranjan, R., Buyya, R.: Workload prediction using Arima model and its impact on cloud applications’ GOS. IEEE Trans. Cloud Comput. 3(4), 449–458 (2014). https://doi.org/10.1109/TCC.2014.2350475
Article Google Scholar
Gao, J., Wang, H., Shen, H.: Machine learning based workload prediction in cloud computing. In: 2020 29th International Conference on Computer Communications and Networks (ICCCN), pp. 1–9 (2020). https://doi.org/10.1109/ICCCN49398.2020.9209730 . IEEE
Chen, J., Wang, Y.: A hybrid method for short-term host utilization prediction in cloud computing. J. Electr. Comput. Eng. (2019). https://doi.org/10.1155/2019/2782349
Article Google Scholar
Anupama, K., Shivakumar, B., Nagaraja, R.: Resource utilization prediction in cloud computing using hybrid model. Int. J. Adv. Comput. Sci. Appl. (2021). https://doi.org/10.14569/IJACSA.2021.0120447
Article Google Scholar
Dabral, P., Murry, M.Z.: Modelling and forecasting of rainfall time series using Sarima. Environ. Proc. 4(2), 399–419 (2017). https://doi.org/10.1007/s40710-017-0226-y
Article Google Scholar
Arora, P., Mehta, R., Ahuja, R.: An adaptive medical image registration using hybridization of teaching learning-based optimization with affine and speeded up robust features with projective transformation. Clust. Comput. (2023). https://doi.org/10.1007/s10586-023-03974-3
Article Google Scholar
Adamuthe, A.C., Gage, R.A., Thampi, G.T.: Forecasting cloud computing using double exponential smoothing methods. In: 2015 International Conference on Advanced Computing and Communication Systems, pp. 1–5 (2015). https://doi.org/10.1109/ICACCS.2015.7324108 . IEEE
Ren, X., Lin, R., Zou, H.: A dynamic load balancing strategy for cloud computing platform based on exponential smoothing forecast. In: 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 220–224 (2011). IEEE
Huang, J., Li, C., Yu, J.: Resource prediction based on double exponential smoothing in cloud computing. In: 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), pp. 2056–2060 (2012). IEEE
Rahman, Z.U., Hussain, O.K., Hussain, F.K.: Time series qos forecasting for management of cloud services. In: 2014 Ninth International Conference on Broadband and Wireless Computing, Communication and Applications, pp. 183–190 (2014). https://doi.org/10.1109/BWCCA.2014.144 . IEEE
Chandy, A., et al.: Smart resource usage prediction using cloud computing for massive data processing systems. J. Inf. Technol. 1(02), 108–118 (2019). https://doi.org/10.36548/jitdw.2019.2.006
Article Google Scholar
Deepika, T., Prakash, P.: Power consumption prediction in cloud data center using machine learning. Int. J. Electr. Comput. Eng. (IJECE) 10(2), 1524–1532 (2020). https://doi.org/10.11591/ijece.v10i2.pp1524-1532
Article Google Scholar
Bankole, A.A., Ajila, S.A.: Predicting cloud resource provisioning using machine learning techniques. In: 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4 (2013). IEEE
Mehmood, T., Latif, S., Malik, S.: Prediction of cloud computing resource utilization. In: 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT), pp. 38–42 (2018). IEEE
Duggan, M., Mason, K., Duggan, J., Howley, E., Barrett, E.: Predicting host cpu utilization in cloud computing using recurrent neural networks. In: 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 67–72 (2017). IEEE
Borkowski, M., Schulte, S., Hochreiner, C.: Predicting cloud resource utilization. In: Proceedings of the 9th International Conference on Utility and Cloud Computing, pp. 37–42 (2016). https://doi.org/10.1145/2996890.2996907
Mason, K., Duggan, M., Barrett, E., Duggan, J., Howley, E.: Predicting host CPU utilization in the cloud using evolutionary neural networks. Futur. Gener. Comput. Syst. 86, 162–173 (2018). https://doi.org/10.1016/j.future.2018.03.040
Article Google Scholar
Lin, S.-Y., Chiang, C.-C., Li, J.-B., Hung, Z.-S., Chao, K.-M.: Dynamic fine-tuning stacked auto-encoder neural network for weather forecast. Futur. Gener. Comput. Syst. 89, 446–454 (2018). https://doi.org/10.1016/j.future.2018.06.052
Article Google Scholar
Shen, H., Hong, X.: Host load prediction with bi-directional long short-term memory in cloud computing. ar**v preprint ar**v:2007.15582 (2020). https://doi.org/10.48550/ar**v.2007.15582
Garg, S., Ahuja, R., Singh, R., Perl, I.: Gmm-lstm: a component driven resource utilization prediction model leveraging lstm and gaussian mixture model. Clust. Comput. (2022). https://doi.org/10.1007/s10586-022-03747-4
Article Google Scholar
Ouhame, S., Hadi, Y., Ullah, A.: An efficient forecasting approach for resource utilization in cloud data center using CNN-lstm model. Neural Comput. Appl. 33, 10043–10055 (2021). https://doi.org/10.1007/s00521-021-05770-9
Article Google Scholar
Song, X., Liu, Y., Xue, L., Wang, J., Zhang, J., Wang, J., Jiang, L., Cheng, Z.: Time-series well performance prediction based on long short-term memory (lstm) neural network model. J. Petrol. Sci. Eng. 186, 106682 (2020). https://doi.org/10.1016/j.petrol.2019.106682
Article CAS Google Scholar
Torres, J.F., Hadjout, D., Sebaa, A., Martínez-Álvarez, F., Troncoso, A.: Deep learning for time series forecasting: a survey. Big Data 9(1), 3–21 (2021). https://doi.org/10.1089/big.2020.0159
Article PubMed Google Scholar
Parmezan, A.R.S., Souza, V.M., Batista, G.E.: Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model. Inf. Sci. 484, 302–337 (2019). https://doi.org/10.1016/j.ins.2019.01.076
Article Google Scholar
Wu, N., Green, B., Ben, X., O’Banion, S.: Deep transformer models for time series forecasting: The influenza prevalence case. ar**v preprint ar**v:2001.08317 (2020)
Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019). https://doi.org/10.1162/neco_a_01199
Article MathSciNet PubMed Google Scholar
Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., **ong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325
Farahnakian, F., Ashraf, A., Pahikkala, T., Liljeberg, P., Plosila, J., Porres, I., Tenhunen, H.: Using ant colony system to consolidate VMS for green cloud computing. IEEE Trans. Serv. Comput. 8(2), 187–198 (2014)
Article Google Scholar
Farahnakian, F., Pahikkala, T., Liljeberg, P., Plosila, J., Hieu, N.T., Tenhunen, H.: Energy-aware VM consolidation in cloud data centers using utilization prediction model. IEEE Trans. Cloud Comput. 7(2), 524–536 (2016). https://doi.org/10.1109/TCC.2016.2617374
Article Google Scholar
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+ schema. Google Inc., White Paper 1, 1–14 (2011)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1, 141–182 (1997)
Article Google Scholar
Xu, M., Song, C., Wu, H., Gill, S.S., Ye, K., Xu, C.: ESDNN: deep neural network based multivariate workload prediction in cloud computing environments. ACM Trans. Internet Technol. (TOIT) (2022). https://doi.org/10.1145/3524114
Article Google Scholar
Mrhari, A., Hadi, Y.: Workload prediction using VMD and TCN in cloud computing. J. Adv. Inf. Technol. (2022). https://doi.org/10.12720/jait.13.3.284-289
Article Google Scholar
Leka, H.L., Fengli, Z., Kenea, A.T., Tegene, A.T., Atandoh, P., Hundera, N.W.: A hybrid cnn-lstm model for virtual machine workload forecasting in cloud data center. In: 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 474–478 (2021). https://doi.org/10.1109/ICCWAMTIP53232.2021.9674067 . IEEE
Dang-Quang, N.-M., Yoo, M.: An efficient multivariate autoscaling framework using bi-lstm for cloud computing. Appl. Sci. 12(7), 3523 (2022). https://doi.org/10.3390/app12073523
Article CAS Google Scholar
Karim, M.E., Maswood, M.M.S., Das, S., Alharbi, A.G.: Bhyprec: a novel bi-lstm based hybrid recurrent neural network model to predict the cpu workload of cloud virtual machine. IEEE Access 9, 131476–131495 (2021). https://doi.org/10.1109/ACCESS.2021.3113714
Article Google Scholar
Zhu, Y., Zhang, W., Chen, Y., Gao, H.: A novel approach to workload prediction using attention-based lstm encoder-decoder network in cloud environment. EURASIP J. Wirel. Commun. Netw. 2019(1), 1–18 (2019). https://doi.org/10.1186/s13638-019-1605-z
Article Google Scholar
Nguyen, H.M., Kalra, G., Kim, D.: Host load prediction in cloud computing using long short-term memory encoder-decoder. J. Supercomput. 75(11), 7592–7605 (2019). https://doi.org/10.1007/s11227-019-02967-7
Article Google Scholar
Patel, E., Kushwaha, D.S.: A hybrid cnn-lstm model for predicting server load in cloud computing. J. Supercomput. 78(8), 1–30 (2022). https://doi.org/10.1007/s11227-021-04234-0
Article Google Scholar
Nadendla, H.: Why are LSTMs struggling to matchup with Transformers? https://medium.com/analytics-vidhya/why-are-lstms-struggling-to-matchup-with-transformers-a1cc5b2557e3
Yang, Z., Liu, L., Li, N., Tian, J.: Time series forecasting of motor bearing vibration based on informer. Sensors 22(15), 5858 (2022). https://doi.org/10.3390/s22155858
Article PubMed PubMed Central ADS Google Scholar
Zeng, A., Chen, M., Zhang, L., Xu, Q.: Are transformers effective for time series forecasting? ar**v preprint ar**v:2205.13504 (2022). https://doi.org/10.1609/aaai.v37i9.26317
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Proc. Syst.30, (2017)
Tang, B., Matteson, D.S.: Probabilistic transformer for time series analysis. Adv. Neural. Inf. Process. Syst. 34, 23592–23608 (2021)
Google Scholar
Wu, S., **ao, X., Ding, Q., Zhao, P., Wei, Y., Huang, J.: Adversarial sparse transformer for time series forecasting. Adv. Neural. Inf. Process. Syst. 33, 17105–17115 (2020)
Google Scholar
Mohammadi Farsani, R., Pazouki, E.: A transformer self-attention model for time series forecasting. J. Electrical Comput. Eng. Innov. (JECEI) 9(1), 1–10 (2020). https://doi.org/10.22061/jecei.2020.7426.391
Article Google Scholar
Qian, Y., Tian, L., Zhai, B., Zhang, S., Wu, R.: Informer-WGAN: high missing rate time series imputation based on adversarial training and a self-attention mechanism. Algorithms 15(7), 252 (2022). https://doi.org/10.3390/a15070252
Article Google Scholar
Guo, L., Li, R., Jiang, B.: A data-driven long time-series electrical line trip fault prediction method using an improved stacked-informer network. Sensors 21(13), 4466 (2021). https://doi.org/10.3390/s21134466
Article PubMed PubMed Central ADS Google Scholar
Ryan, T.: LSTMs Explained: A Complete, Technically Accurate, Conceptual Guide with Keras. https://medium.com/analytics-vidhya/lstms-explained-a-complete-technically-accurate-conceptual-guide-with-keras-2a650327e8f2
Pranav, P.: Recurrent Neural Networks, the Vanishing Gradient Problem, and Long Short-Term Memory. https://medium.com/@pranavp802/recurrent-neural-networks-the-vanishing-gradient-problem-and-lstms-3ac0ad8aff10
Patro, S., Sahu, K.K.: Normalization: A preprocessing stage. ar**v preprint ar**v:1503.06462 (2015)
Lachlan, R., Verhagen, L., Peters, S., Cate, C.T.: Are there species-universal categories in bird song phonology and syntax? a comparative study of chaffinches (fringilla coelebs), zebra finches (taenopygia guttata), and swamp sparrows (melospiza georgiana). J. Comp. Psychol. 124(1), 92 (2010). https://doi.org/10.1037/a0016996
Article CAS PubMed Google Scholar
Zhou, H.B., Gao, J.T.: Automatic method for determining cluster number based on silhouette coefficient. Adv. Mater. Res. 951, 227–230 (2014). https://doi.org/10.4028/www.scientific.net/AMR.951.227
Article Google Scholar

Download references

Funding

No funding involved for this research.

Author information

Authors and Affiliations

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India
Sheetal Garg & Rohit Ahuja
School of Computing, Engineering and Physical Sciences, University of the West of Scotland, United Kingdom, Hamilton, Scotland, G72 0LH, UK
Raman Singh
Faculty of Software Engineering and Computer Systems, ITMO University, Saint Petersburg, 197101, Russia
Ivan Perl

Authors

Sheetal Garg
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Ahuja
View author publications
You can also search for this author in PubMed Google Scholar
Raman Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Perl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheetal Garg.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Garg, S., Ahuja, R., Singh, R. et al. An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center. Cluster Comput (2024). https://doi.org/10.1007/s10586-023-04258-6

Download citation

Received: 11 September 2023
Revised: 07 December 2023
Accepted: 21 December 2023
Published: 06 February 2024
DOI: https://doi.org/10.1007/s10586-023-04258-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cloud Resources Usage Prediction Using Deep Learning Models

Deep learning-based multivariate resource utilization prediction for hotspots and coldspots mitigation in green cloud data centers

Workload Time Series Cumulative Prediction Mechanism for Cloud Resources Using Neural Machine Translation Technique

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An effective deep learning architecture leveraging BIRCH clustering for resource usage prediction of heterogeneous machines in cloud data center

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cloud Resources Usage Prediction Using Deep Learning Models

Deep learning-based multivariate resource utilization prediction for hotspots and coldspots mitigation in green cloud data centers

Workload Time Series Cumulative Prediction Mechanism for Cloud Resources Using Neural Machine Translation Technique

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation