Software security with natural language processing and vulnerability scoring using machine learning approach

Verma, Birendra Kumar; Yadav, Ajay Kumar

doi:10.1007/s12652-024-04778-y

Software security with natural language processing and vulnerability scoring using machine learning approach

Original Research
Published: 03 April 2024

Volume 15, pages 2641–2651, (2024)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Birendra Kumar Verma¹ &
Ajay Kumar Yadav¹

116 Accesses
Explore all metrics

Abstract

As software gets more complicated, diverse, and crucial to people’s daily lives, exploitable software vulnerabilities constitute a major security risk to the computer system. These vulnerabilities allow unauthorized access, which can cause losses in banking, energy, the military, healthcare, and other key infrastructure systems. Most vulnerability scoring methods employ Natural Language Processing to generate models from descriptions. These models ignore Impact scores, Exploitability scores, Attack Complexity and other statistical features when scoring vulnerabilities. A feature vector for machine learning models is created from a description, impact score, exploitability score, attack complexity score, etc. We score vulnerabilities more precisely than we categorize them. The Decision Tree Regressor, Random Forest Regressor, AdaBoost Regressor, K-nearest Neighbors Regressor, and Support Vector Regressor have been evaluated using the metrics explained variance, r-squared, mean absolute error, mean squared error, and root mean squared error. The tenfold cross-validation method verifies regressor test results. The research uses 193,463 Common Vulnerabilities and Exposures from the National Vulnerability Database. The Random Forest regressor performed well on four of the five criteria, and the tenfold cross-validation test performed even better (0.9968 vs. 0.9958).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advancing Software Vulnerability Scoring: A Statistical Approach with Machine Learning Techniques and GridSearchCV Parameter Tuning

Article 28 May 2024

Severity Prediction of Software Vulnerabilities Using Textual Data

Prediction of Software Vulnerabilities Using Random Forest Regressor

Data availability

The datasets have been used from the “National Vulnerability Database” (NVD) https://nvd.nist.gov/vuln.

References

Abedin M, Nessa S, Al-Shaer E, Khan L (2006) Vulnerability analysis for evaluating quality of protection of security policies. In: Proc 2nd ACM Work Qual Prot QoP’06 Co-located with 13th ACM Conf Comput Commun Secur CCS’06, pp 49–52. https://doi.org/10.1145/1179494.1179505
Anjum M, Kapur PK, Agarwal V, Khatri SK (2020) A framework for prioritizing software vulnerabilities using fuzzy best-worst method. In: ICRITO 2020—IEEE 8th Int Conf Reliab Infocom Technol Optim (Trends Futur Dir), pp 311–316. https://doi.org/10.1109/ICRITO48877.2020.9197854
Chen J, Kudjo PK, Mensah S et al (2020) An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection. J Syst Softw 167:110616. https://doi.org/10.1016/j.jss.2020.110616
Article Google Scholar
Frühwirth C, Männistö T (2009) Improving CVSS-based vulnerability prioritization and response with context information. In: 2009 3rd Int Symp Empir Softw Eng Meas ESEM 2009, pp 535–544. https://doi.org/10.1109/ESEM.2009.5314230
Gupta KK, Kalita K, Ghadai RK et al (2021) Machine learning-based predictive modelling of biodiesel production—a comparative perspective. Energies. https://doi.org/10.3390/en14041122
Article Google Scholar
Huang S, Tang H, Zhang M, Tian J (2010) Text clustering on national vulnerability database. In: 2010 2nd Int Conf Comput Eng Appl ICCEA 2010, 2:295–299. https://doi.org/10.1109/ICCEA.2010.209
Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22:679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001
Article Google Scholar
Khazaei A, Ghasemzadeh M, Derhami V (2016) An automatic method for CVSS score prediction using vulnerabilities description. J Intell Fuzzy Syst 30:89–96. https://doi.org/10.3233/IFS-151733
Article Google Scholar
Khoshgoftaar TM, Golawala M, Van Hulse J (2007) An empirical study of learning from imbalanced data using random forest. Proc Int Conf Tools Artif Intell ICTAI 2:310–317. https://doi.org/10.1109/ICTAI.2007.46
Article Google Scholar
Kohavi R, John GH (1995) Automatic parameter selection by minimizing estimated error. Mach Learn Proc 1995:304–312. https://doi.org/10.1016/b978-1-55860-377-6.50045-1
Article Google Scholar
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34:485–496. https://doi.org/10.1109/TSE.2008.35
Article Google Scholar
Ruohonen J (2019) A look at the time delays in CVSS vulnerability scoring. Appl Comput Inform 15:129–135. https://doi.org/10.1016/j.aci.2017.12.002
Article Google Scholar
Shahid MR, Debar H (2021) Cvss-bert: explainable natural language processing to determine the severity of a computer security vulnerability from its description. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp 1600–1607
Shanmugasundar G, Vanitha M, Čep R et al (2021) A comparative study of linear, random forest and adaboost regressions for modeling non-traditional machining. Processes. https://doi.org/10.3390/pr9112015
Article Google Scholar
Shuang K, Zhang Z, Loo J, Su S (2020) Convolution–deconvolution word embedding: an end-to-end multi-prototype fusion embedding method for natural language processing. Inf Fusion 53:112–122. https://doi.org/10.1016/j.inffus.2019.06.009
Article Google Scholar
Spanos G, Angelis L (2018) A multi-target approach to estimate software vulnerability characteristics and severity scores. J Syst Softw 146:152–166. https://doi.org/10.1016/j.jss.2018.09.039
Article Google Scholar
Vishnu PR, Vinod P, Yerima SY (2022) A deep learning approach for classifying vulnerability descriptions using self attention based neural network. J Netw Syst Manag 30:1–27. https://doi.org/10.1007/s10922-021-09624-6
Article Google Scholar
Wijayasekara D, Manic M, Mcqueen M (2023) Vulnerability identification and classification via text mining bug databases
Zhang S, Ou X, Caragea D (2015) Predicting cyber risks through national vulnerability database. Inf Secur J 24:194–206. https://doi.org/10.1080/19393555.2015.1111961
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Banasthali Vidyapith, Jaipur, Rajasthan, India
Birendra Kumar Verma & Ajay Kumar Yadav

Authors

Birendra Kumar Verma
View author publications
You can also search for this author in PubMed Google Scholar
Ajay Kumar Yadav
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ajay Kumar Yadav.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

This article does not consider humans and any kinds of animals.

Consent to publish

The author is agreeing to submit this version of the paper for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Verma, B.K., Yadav, A.K. Software security with natural language processing and vulnerability scoring using machine learning approach. J Ambient Intell Human Comput 15, 2641–2651 (2024). https://doi.org/10.1007/s12652-024-04778-y

Download citation

Received: 14 October 2023
Accepted: 21 February 2024
Published: 03 April 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s12652-024-04778-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software security with natural language processing and vulnerability scoring using machine learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Advancing Software Vulnerability Scoring: A Statistical Approach with Machine Learning Techniques and GridSearchCV Parameter Tuning

Severity Prediction of Software Vulnerabilities Using Textual Data

Prediction of Software Vulnerabilities Using Random Forest Regressor

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Software security with natural language processing and vulnerability scoring using machine learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Advancing Software Vulnerability Scoring: A Statistical Approach with Machine Learning Techniques and GridSearchCV Parameter Tuning

Severity Prediction of Software Vulnerabilities Using Textual Data

Prediction of Software Vulnerabilities Using Random Forest Regressor

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to publish

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation