Abstract
The paper is devoted to machine learning methods and algorithms for the supercomputer jobs execution prediction. The supercomputers statistics shows that the actual runtime of the most of the jobs substantially diverges from the time requested by the user. This reduces the efficiency of scheduling jobs, since an inaccurate job execution time estimation leads to a suboptimal jobs schedule. The job classification is considered, it is based on the difference between the job actual and the requested execution time. Forecast was made on the base of supercomputer multiuser job management system statistics by assigning a submitted job to one of the classes. The statistics of supercomputers MVS-100K and MVS-10P in the Joint Supercomputer Center of the Russian Academy of Sciences (JSCC RAS) was used. The job flow feature ranking by importance was done on the statistical analysis results. The cross-correlation of the most important features was determined. The probability estimates of correct prediction were obtained for selected well-known machine learning algorithms: logistic regression, decision trees, k-nearest neighbors, linear discriminant analysis, support vector machine, random forest, gradient boosting, and feedforward neural network. The best values were obtained using the random forest method.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS1995080220120343/MediaObjects/12202_2021_6275_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS1995080220120343/MediaObjects/12202_2021_6275_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS1995080220120343/MediaObjects/12202_2021_6275_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS1995080220120343/MediaObjects/12202_2021_6275_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS1995080220120343/MediaObjects/12202_2021_6275_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS1995080220120343/MediaObjects/12202_2021_6275_Fig6_HTML.gif)
Similar content being viewed by others
REFERENCES
A. Reuther et al., ‘‘Scalable system scheduling for HPC and big data, ’’ J. Parallel Distrib. Comput. 111, 76–92 (2018). https://doi.org/10.1016/j.jpdc.2017.06.009
A. B. Yoo, M. A. Jette, and M. Grondona, ‘‘SLURM: Simple Linux Utility for Resource Management,’’ Lect. Notes Comput. Sci. 2862, 44–60 (2003). https://doi.org/10.1007/10968987_3
R. L. Henderson, ‘‘Job scheduling under the Portable Batch System,’’ Lect. Notes Comput. Sci. 949, 279–294 (1995). https://doi.org/10.1007/3-540-60153-8_34
IBM Spectrum LSF overview. https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_ foundations/chap_lsf_overview_foundations.html. Accessed 13 May 2020.
A. V. Baranov, E. A. Kiselev, and D. S. Lyakhovets, ‘‘The quasi scheduler for utilization of multiprocessing computing system idle resources under control of the management system of the parallel jobs,’’ Vestn. YuUr Univ., Ser. Vychisl. Mat. Inform. 3 (4), 75–84 (2014). https://doi.org/10.14529/cmse140405
J. Klinkenberg, C. Terboven, S. Lankes, and M. S. Müller, ‘‘Data mining-based analysis of HPC center operations,’’ in Proceedings of the IEEE International Conference on Cluster Computing CLUSTER, Honolulu, HI (2017), pp. 766–773. https://doi.org/10.1109/CLUSTER.2017.23
W. Yoo, A. Sim, and K. Wu, ‘‘Machine learning based job status prediction in scientific clusters,’’ in Proceedings of the 2016 SAI Computing Conference (SAI), London (2016), pp. 44–53. https://doi.org/10.1109/SAI.2016.7555961
O. Tuncer, E. Ates, Y. Zhang, A. Turk, J. Brandt, V. J. Leung, M. Egele, and A. K. Coskun, ‘‘Diagnosing performance variations in HPC applications using machine learning,’’ Lect. Notes Comput. Sci. 10266, 355–373 (2017). https://doi.org/10.1007/978-3-319-58667-0_19
R. McKenna, S. Herbein, A. Moody, T. Gamblin, and M. Taufer, ‘‘Machine learning predictions of runtime and IO traffic on high-end clusters,’’ in Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei (2016), pp. 255–258. https://doi.org/10.1109/CLUSTER.2016.58
E. R. Rodrigues, R. L. F. Cunha, M. A. S. Netto, and M. Spriggs, ‘‘Hel** HPC users specify job memory requirements via machine learning,’’ in Proceedings of the 2016 3rd International Workshop on HPC User Support Tools (HUST), Salt Lake City, UT (2016), pp. 6–13. https://doi.org/10.1109/HUST.2016.006
J. Guo, A. Nomura, R. Barton, H. Zhang, and S. Matsuoka, ‘‘Machine learning predictions for underestimation of job runtime on HPC system,’’ Lect. Notes Comput. Sci. 10776, 179–198 (2018). https://doi.org/10.1007/978-3-319-69953-0_11
G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, ‘‘Joint supercomputer center of the Russian Academy of Sciences: Present and future,’’ Lobachevskii J. Math. 40, 1853–1862 (2019). https://doi.org/10.1134/S1995080219110271
Supercomputing Resources of JSCC RAS. http://www.jscc.ru/supercomputing-resources. Accessed May 12, 2020.
D. Paper, ‘‘Introduction to Scikit-Learn,’’ in Proceedings of the Conference on Hands-on Scikit-Learn for Machine Learning Applications, Apress, Berkeley, CA (2020), pp. 1–35. https://doi.org/10.1007/978-1-4842-5373-1_1
D. S. Cramer, ‘‘The origins of logistic regression,’’ Tinbergen Institute Working Paper No. 2002-119/4 (2002), pp. 167–178. https://doi.org/10.2139/ssrn.360300
L. Rokach and O. Maimon, Data Mining with Decision Trees: Theory and Applications (World Scientific, Singapore, 2007). ISBN 978-9812771711
N. Altman, ‘‘An introduction to Kernel and nearest-neighbor nonparametric regression,’’ Am. Stat. 46, 175–185 (1992). https://doi.org/10.2307/2685209
G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition (Wiley Interscience, New York, 1992). https://doi.org/10.1002/0471725293
C. P. Bennett and C. Campbell, ‘‘Support vector machines: Hype or hallelujah?’’ SIGKDD Explor. Newsl. 2, 2 (2000). https://doi.org/10.1145/380995.380999
L. Breiman, ‘‘Random forests,’’ Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
J. H. Friedman, ‘‘Greedy function approximation: A gradient boosting machine,’’ Ann. Stat. 29, 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
J. Schmidhuber, ‘‘Deep learning in neural networks: An overview,’’ Neural Networks 61, 85–117 (2015). https://doi.org/10.1016/j.neunet.2014.09.003
ACKNOWLEDGMENTS
The work was carried out at the JSCC RAS as part of the government assignment (project 0065-2019-0016). Supercomputers MVS-100K and MVS-10P were used.
Author information
Authors and Affiliations
Corresponding authors
Additional information
(Submitted by A. M. Elizarov)
Rights and permissions
About this article
Cite this article
Savin, G.I., Shabanov, B.M., Nikolaev, D.S. et al. Jobs Runtime Forecast for JSCC RAS Supercomputers Using Machine Learning Methods. Lobachevskii J Math 41, 2593–2602 (2020). https://doi.org/10.1134/S1995080220120343
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1995080220120343