Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing

Nguyen, Duc-Khanh; Chan, Chien-Lung; Phan, Dinh-Van

doi:10.1007/978-3-031-05491-4_23

Duc-Khanh Nguyen⁶,
Chien-Lung Chan^6,7,8 &
Dinh-Van Phan^9,10

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 314))

427 Accesses

Abstract

Imbalanced datasets are usually a challenge on classification tasks, especially in the manufacturing industry. These skewed class distributions bring out the poor performance in traditional machine learning algorithms. In addition, most of the collected datasets contain noises that make the analysis process even harder. The noises could be the missing data or irrelevant variables in the datasets. Dealing with these noisy datasets remains an important step in data analysis. For these two reasons, we propose a Gradient Deep Learning Boosting (GDLB) model to deal with imbalanced datasets containing noises in the classification task. In dealing with noise, we use the Imputation transformer for handling the missing data and deployed the Random forest method for features selection. The two benchmark datasets named SECOM and DAIWM are implemented to prove our proposed method’s performance. Those are particular imbalance datasets containing noise. Our proposed method had an accuracy, recall, Matthews correlation coefficient, and Area under the curve of 0.87, 0.70, 0.32, and 0.79, respectively on the SECOM dataset. On the other hand, on the DAIWM dataset, our proposed method achieves 0.91, 0.83, 0.56, and 0.87 respectively. We found that the combination of proposed Gradient Deep Learning Boosting and handling noises is a prospective model for imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Article 24 December 2023

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Article Open access 23 May 2022

References

Dang, T.K., et al.: Future Data and Security Engineering: 4th International Conference, FDSE 2017, Ho Chi Minh City, Vietnam, November 29–December 1, 2017, Proceedings, vol. 10646. Springer (2017)
Google Scholar
Nedelcu, B.: About Big Data and its challenges and benefits in manufacturing. Database Syst. J. 4(3), 10–19 (2013)
Google Scholar
Wheelwright, S.C., Bowen, H.K.: The challenge of manufacturing advantage. Prod. Oper. Manag. 5(1), 59–77 (1996)
Article Google Scholar
Ren, S., et al.: A comprehensive review of big data analytics throughout product lifecycle to support sustainable smart manufacturing: a framework, challenges and future research directions. J. Clean. Prod. 210, 1343–1365 (2019)
Article Google Scholar
Zhao, Z.-Q., et al.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Article Google Scholar
Nguyen, T.-T.-D., Nguyen,D.-K., Ou, Y.-Y.: Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network. Brief. Bioinform. 22(6), bbab277 (2021)
Google Scholar
Korotcov, A., et al.: Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 14(12), 4462–4475 (2017)
Article Google Scholar
Miotto, R., et al.: Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinform. 19(6), 1236–1246 (2018)
Article Google Scholar
Köse, T., et al.: Effect of missing data imputation on deep learning prediction performance for vesicoureteral reflux and recurrent urinary tract infection clinical study. BioMed Res. Int. 2020 (2020)
Google Scholar
Bengio, Y., Courville, A.C., Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. CoRR, abs/1206.5538 1, 2012 (2012)
Google Scholar
Allison, P.: Missing Data, vol. 136. Sage Publications, Thousand Oaks (2001)
Google Scholar
Sessa, J., Syed, D.: Techniques to deal with missing data. in 2016 5th international conference on electronic devices, systems and applications (ICEDSA). IEEE (2016)
Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem, in Machine learning proceedings 1994, pp. 121–129. Elsevier (1994)
Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(04), 687–719 (2009)
Article Google Scholar
López, V., et al.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Article Google Scholar
Fathy, Y., Jaber, M., Brintrup, A.: Learning with imbalanced data in smart manufacturing: a comparative analysis. IEEE Access 9, 2734–2757 (2021)
Article Google Scholar
Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-Based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)
Article Google Scholar
Haixiang, G., et al.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Kim, J., Han, Y., Lee, J.: Data imbalance problem solving for smote based oversampling: study on fault detection prediction model in semiconductor manufacturing process. Adv. Sci. Technol. Lett. 133, 79–84 (2016)
Google Scholar
Moldovan, D., et al.: Chicken swarm optimization and deep learning for manufacturing processes. In: 2018 17th RoEduNet conference: networking in education and research (RoEduNet). IEEE (2018)
Google Scholar
Tseng, J., Motoda, L.C.H., Xu, G.: Advances in knowledge discovery and data mining. In: Lecture Notes in Artificial Intelligence (2003)
Google Scholar
Nguyen, D.-K., Lan, C.-H., Chan, C.-L.: Deep ensemble learning approaches in healthcare to enhance the prediction and diagnosing performance: the workflows, deployments, and surveys on the statistical, image-based, and sequential datasets. Int. J. Environ. Res. Public Health 18(20), 10811 (2021)
Article Google Scholar
Nguyen, D.-K., et al.: Deep Stacked Generalization Ensemble Learning models in early diagnosis of depression illness from wearable devices data. In: Proceedings of the 5th International Conference on Medical and Health Informatics (ICMHI 2021), Kyoto, Japan (2021)
Google Scholar
Jia, F., et al.: Deep neural network ensemble for the intelligent fault diagnosis of machines under imbalanced data. IEEE Access 8, 120974–120982 (2020)
Article Google Scholar
machinehack: Detecting anomalies in wafer manufacturing: Weekend Hackathon #18, machinehack, Editor. Machinehack (2020)
Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository (2017)
Google Scholar
Anaconda Software Distribution. Anaconda Documentation (2020)
Google Scholar
Abadi, M., et al.: Tensorflow: A system for large-scale machine learning. In: 12th ${$USENIX$}$ Symposium on Operating Systems Design and Implementation (${$OSDI$}$ 16), pp. 265–283 (2016)
Google Scholar
Pedregosa, F., et al., Scikit-learn: Machine learning in Python. Journal of machine learning research, 2011. 12(Oct): p. 2825–2830.
Google Scholar
Bach, M., Werner, A.: Cost-Sensitive Feature Selection for Class Imbalance Problem. Springer International Publishing, Cham (2018)
Book Google Scholar
Liu, F., et al.: A New fuzzy spiking neural network based on neuronal contribution degree. IEEE Trans. Fuzzy Syst. (2021)
Google Scholar

Download references

Acknowledgements

This study is funded by Ministry of Science and Technology, Taiwan, grant number MOST 108-2221-E-155-019-MY3.

Author information

Authors and Affiliations

Department of Information Management, Yuan Ze University, Taoyuan, Taiwan
Duc-Khanh Nguyen & Chien-Lung Chan
Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, Taiwan
Chien-Lung Chan
ZDT Group – Yuan Ze University Joint Research and Development Center for Big Data, Taoyuan, Taiwan
Chien-Lung Chan
University of Economics, The University of Danang, Da Nang, Vietnam
Dinh-Van Phan
Teaching and Research Team for Business Intelligence, University of Economics, The University of Danang, Da Nang, Vietnam
Dinh-Van Phan

Authors

Duc-Khanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Chien-Lung Chan
View author publications
You can also search for this author in PubMed Google Scholar
Dinh-Van Phan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chien-Lung Chan .

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Piraeus, Greece
George A. Tsihrintzis
Central Police University, Taoyuan City, Taiwan
Shiuh-Jeng Wang
National Chung Hsing University, Taichung City, Taiwan
Iuon-Chang Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, DK., Chan, CL., Phan, DV. (2023). Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing. In: Tsihrintzis, G.A., Wang, SJ., Lin, IC. (eds) 2021 International Conference on Security and Information Technologies with AI, Internet Computing and Big-data Applications. Smart Innovation, Systems and Technologies, vol 314. Springer, Cham. https://doi.org/10.1007/978-3-031-05491-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-05491-4_23
Published: 30 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05490-7
Online ISBN: 978-3-031-05491-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Gradient Deep Learning Boosting and Its Application on the Imbalanced Datasets Containing Noises in Manufacturing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data

Improving imbalanced industrial datasets to enhance the accuracy of mechanical property prediction and process optimization for strip steel

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation