Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines

Wang, Hsiao-Yu; Tsung, Chen-Kun; Hung, Ching-Hua; Chen, Chen-Huei

doi:10.1007/s11042-021-11552-1

Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines

1213: Computational Optimization and Applications for Heterogeneous Multimedia Data
Published: 12 July 2022

Volume 81, pages 36437–36452, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hsiao-Yu Wang¹,
Chen-Kun Tsung ORCID: orcid.org/0000-0002-0042-233X²,
Ching-Hua Hung¹ &
…
Chen-Huei Chen³

232 Accesses
1 Altmetric
Explore all metrics

A Correction to this article was published on 17 November 2022

This article has been updated

Abstract

The product quality is the major factor for enhancing the production ability and competitiveness. Decreasing the cost and increasing production capacity are common approaches to realize the enhancement of the product quality. The production managers apply various multimedia data to evaluate the product quality. For example, capturing the stam** sound to evaluate the correct cutting and taking the component image to measure the chip positions are common heterogeneous multimedia data that are applied to manufacturing. However, the production managers prefer to minimize the number of defective products, e. g. the secondary operation and fixing the product tolerance in the assembly stage, to fitting the production target. Therefore, contrasting the defective product identification procedure with high accuracy becomes a challenge due to the decrease of the number of the defective products. In this paper, we propose the Rule Classification with Oversampling (RCOS) approach to provide the high accuracy with few defective products. The proposed RCOS includes the oversampling technique and the rule classification approach to emphasize the properties of the defective products and provide the precise classes. Given few defective products, capturing the properties of the failure is difficult. The RCOS considers the revised Synthetic Minority Over-Sampling Technique (SMOTE) to highlight the failure properties, and then the rule model is considered to extract the root cause of the defective products. We implement the proposed RCOS in the semiconductor production line. From the experiment results, the proposed RCOS provide about at most 98% in accuracy, and the comparison shows that the results have been improved in common criteria e. g. the true-positive rate, G mean, F1 score, and False Alarm Rate. Therefore, the proposed RCOS provides high practicality for the implementation consideration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Article Open access 23 May 2022

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Change history

17 November 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11042-022-14257-1

Notes

https://csr.tsmc.com/download/csr/2018_tsmc_csr/english/pdf/e_all.pdf

References

Agrawal R, Srikant, R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB
Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322
Article MATH Google Scholar
Anand A, Pugalenthi G, Fogel GB, Suganthan PN (2010) An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39(5):1385–1391
Article Google Scholar
Arif F, Suryana N, Hussin B (2013) Cascade quality prediction method using multiple pca+ id3 for multi-stage manufacturing system. Ieri Procedia 4:201–207
Article Google Scholar
Arif F, Suryana N, Hussin B (2013) A data mining approach for develo** quality prediction model in multi-stage manufacturing. Int J Comput Appl 69(22)
Boukerche A, Zheng L, Alfandi O (2020) Outlier detection: Methods, models, and classification. ACM Computing Surveys (CSUR) 53(3):1–37
Article Google Scholar
Chan YW, Chien FT, Chang MK, Ho WC, Hung JC (2020) A coalitional graph game approach for minimum transmission broadcast in iot networks. IEEE Access 8:24385–24396
Article Google Scholar
Chang CH, Yang CT, Lee JY, Lai CL, Kuo CC (2020) On construction and performance evaluation of a virtual desktop infrastructure with GPU accelerated. IEEE Access 8:170162–170173
Chenxi H, **n H, Yu F, Jianfeng X, Yi Q, Pengjun Z, Lin F, Hua Y, Yilu X, Jiahang L (2020) Sample imbalance disease classification model based on association rule feature selection. Pattern Recognit Lett
Chomboon K, Kerdprasop K, Kerdprasop N (2013) Rare class discovery techniques for highly imbalance data. In: Proc. International multi conference of engineers and computer scientists 1
Colledani M, Angius A (2020) Production quality performance of manufacturing systems with in-line product traceability and rework. CIRP Annals
Fan SKS, Hsu CY, Jen CH, Chen KL, Juan LT (2020) Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv Eng Inf 46:101166
Article Google Scholar
Garc S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol Comput 17(3):275–306
Article Google Scholar
S Garca, I Triguero, CJ Carmona, F Herrera (2012) Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Syst 25(1):3–12
Article Google Scholar
Hassan MM, Khanooni MM, Ali MN, Quazi MM (2017) Industrial Automation for Quality Control by SCADA. Int Res J Eng Technol (IRJET) 4(04)
Ijaz MF, Alfian G, Syafrudin M, Rhee J (2018) Hybrid prediction model for type 2 diabetes and hypertension using dbscan-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest. Appl Sci 8(8):1325
Article Google Scholar
Jiang P, Jia F, Wang Y, Zheng M (2014) Real-time quality monitoring and predicting model based on error propagation networks for multistage machining processes. J Intell Manuf 25(3):521–538
Article Google Scholar
Kerdprasop K, Kerdprasop, N (2011) Feature selection and boosting techniques to improve fault detection accuracy in the semiconductor manufacturing process. In: Proceedings of the International MultiConference of Engineers and Computer Scientist
Khoa TV, Saputra YM, Hoang DT, Trung NL, Nguyen D, Ha NV, Dutkiewicz E, (2020). Collaborative learning model for cyberattack detection systems in iot industry 4.0. In: 2020 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, pp. 1–6
Kovãcs G (83AD) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 83:105662
Kristiani E, Yang CT, Huang CY, Ko PC, Fathoni H (2020) On construction of sensors, edge, and cloud (ISEC) framework for smart system integration and applications. IEEE Internet Things J 8(1):309–319
Article Google Scholar
Kumar A, Sharma DK (2021) An optimized multilayer outlier detection for internet of things (IoT) network as industry 4.0 automation and data exchange. In: International Conference on Innovative Computing and Communications. Singapore: Springer. pp 571-584
Lee DH, Yang JK, Lee CH, Kim KJ (2019) A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. J Manuf Syst 52:146–156
Article Google Scholar
Lim P, Goh CK, Tan KC (2016) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybernetics 47(9):2850–2861
Article Google Scholar
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning pp 689-696
McCann M, Li Y, Maguire L, Johnston A (2010) Causality challenge: benchmarking relevant signal components for effective monitoring and process control. Objectives and Assessment, In Causality, pp 277–288
Google Scholar
Mubarik MS, Naghavi N, Mubarik M, Kusi-Sarpong S, Khan SA, Zaman SI, Kazmi SHA (2021) Resilience and cleaner production in industry 4.0: Role of supply chain map** and visibility. J Clean Prod 292:126058
Munirathinam S, Ramadoss B (2016) Predictive Models for Equipment Fault Detection in the Semiconductor Manufacturing Process. Int J Eng Technol 8(4):273
Article Google Scholar
Murphy P, Aha D, UCIML repository secom dataset
Sowade E, Ramon E, Mitra KY, Martnez-Domingo C, Pedr M, Pallars J, Loffredo F, Villani F, Gomes HL, Ters L, Baumann RR (2016) All-inkjet-printed thin-film transistors: manufacturing process reliability by root cause analysis. Sci Rep 6:33490
Article Google Scholar
Steiner S, Zeng Y, Young TM, Edwards DJ, Guess FM, Chen CH (2016) A study of missing data imputation in predictive modeling of a wood-composite manufacturing process. J Qual Technol 48(3):284–296
Article Google Scholar
Tibshirani R, Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267288
Triguero I, Gonzlez S, Moyano JM, Garca Lpez S, Alcal Fernndez J, Luengo Martn J, Fernndez Hilario A, Jess D, Mara Jos D, Snchez L, Herrera Triguero F (2017) KEEL 3.0: an open source software for multi-stage analysis in data mining
Tsung CK, Hsieh HY, Yang CT (2019) An implementation of scalable high throughput data platform for logging semiconductor testing results. IEEE Access 7:26497–6506
Article Google Scholar
Wang C, Wang P, Han S, Wang L, Zhao Y, Juan L (2020) FunEffector-Pred: identification of fungi effector by activate learning and genetic algorithm sampling of imbalanced data. IEEE Access 8:57674–7683
Article Google Scholar
Wang H, Chiang C, Paired Dictionary Learning Based on Discriminant Reconstruction Analysis for Sparse Representation, Computer Science and Information Engineering National Chung Cheng University Master’s Thesis
Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proceedings of the first international conference on advanced data and information engineering (DaEng-2013). Singapore: Springer, pp. 13-22
Yang CT, Liu JC, Huang KL, Jiang FC (2014) A method for managing green power of a virtual machine cluster in cloud. Future Generation Comput Syst 37:26–6
Article Google Scholar
Yang CT, Chen ST, Den W, Wang YT, Kristiani E (2019) Implementation of an intelligent indoor environmental monitoring and management system in cloud. Future Generation Comput Syst 96:731–749
Article Google Scholar
Zakzeski J, Bruijnincx PC, Jongerius AL, Weckhuysen BM (2010) The catalytic valorization of lignin for the production of renewable chemicals. Chem Rev 110(6):3552–3599
Article Google Scholar
Zou J, Chang Q, Lei Y, Arinez J (2016) Production system performance identification using sensor data. IEEE Trans Systems Man Cybernetics Syst 48(2):255–64
Article Google Scholar
Zhou X, Xu X, Liang W, Zeng Z, Shimizu S, Yang LT, ** Q (2021) Intelligent small object detection based on digital twinning for smart manufacturing in industrial CPS. IEEE Trans Ind Inf
Zhou X, Hu Y, Liang W, Ma J, ** Q (2020) Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans Ind Inf 17(5):3469–3477
Article Google Scholar
Zhong G, Pun CM (2020) Data Representation by Joint Hypergraph Embedding and Sparse Coding. IEEE Trans Knowledge Data Eng

Download references

Funding

This study is conducted under the “III Innovative and Prospective Technologies Project (1/1)” of the Institute for Information Industry which is subsidized by the Ministry of Economic Affairs of the Republic of China.

Author information

Authors and Affiliations

Department of Mechanical Engineering, National Yang Ming Chiao Tung University, 1001 University Road, 300, Hsinchu, Taiwan, Republic of China
Hsiao-Yu Wang & Ching-Hua Hung
Department of Computer Science and Information Engineering, National Chin-Yi University of Technology, No 57 Sec 2 Zhongshan Rd Tai** Dist, 41170, Taichung, Taiwan, Republic of China
Chen-Kun Tsung
Department of Computer Science and Engineering, National Chung Hsing University, 145 **ngda Rd South Dist, 402, Taichung, Taiwan, Republic of China
Chen-Huei Chen

Authors

Hsiao-Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Kun Tsung
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Hua Hung
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Huei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen-Kun Tsung.

Additional information

The original online version of this article was revised: Funding information was missing in the original publication of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, HY., Tsung, CK., Hung, CH. et al. Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines. Multimed Tools Appl 81, 36437–36452 (2022). https://doi.org/10.1007/s11042-021-11552-1

Download citation

Received: 18 January 2021
Revised: 10 August 2021
Accepted: 09 September 2021
Published: 12 July 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11042-021-11552-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Change history

17 November 2022

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

Change history

17 November 2022

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation