Abstract
The product quality is the major factor for enhancing the production ability and competitiveness. Decreasing the cost and increasing production capacity are common approaches to realize the enhancement of the product quality. The production managers apply various multimedia data to evaluate the product quality. For example, capturing the stam** sound to evaluate the correct cutting and taking the component image to measure the chip positions are common heterogeneous multimedia data that are applied to manufacturing. However, the production managers prefer to minimize the number of defective products, e. g. the secondary operation and fixing the product tolerance in the assembly stage, to fitting the production target. Therefore, contrasting the defective product identification procedure with high accuracy becomes a challenge due to the decrease of the number of the defective products. In this paper, we propose the Rule Classification with Oversampling (RCOS) approach to provide the high accuracy with few defective products. The proposed RCOS includes the oversampling technique and the rule classification approach to emphasize the properties of the defective products and provide the precise classes. Given few defective products, capturing the properties of the failure is difficult. The RCOS considers the revised Synthetic Minority Over-Sampling Technique (SMOTE) to highlight the failure properties, and then the rule model is considered to extract the root cause of the defective products. We implement the proposed RCOS in the semiconductor production line. From the experiment results, the proposed RCOS provide about at most 98% in accuracy, and the comparison shows that the results have been improved in common criteria e. g. the true-positive rate, G mean, F1 score, and False Alarm Rate. Therefore, the proposed RCOS provides high practicality for the implementation consideration.
Similar content being viewed by others
Change history
17 November 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11042-022-14257-1
References
Agrawal R, Srikant, R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB
Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322
Anand A, Pugalenthi G, Fogel GB, Suganthan PN (2010) An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39(5):1385–1391
Arif F, Suryana N, Hussin B (2013) Cascade quality prediction method using multiple pca+ id3 for multi-stage manufacturing system. Ieri Procedia 4:201–207
Arif F, Suryana N, Hussin B (2013) A data mining approach for develo** quality prediction model in multi-stage manufacturing. Int J Comput Appl 69(22)
Boukerche A, Zheng L, Alfandi O (2020) Outlier detection: Methods, models, and classification. ACM Computing Surveys (CSUR) 53(3):1–37
Chan YW, Chien FT, Chang MK, Ho WC, Hung JC (2020) A coalitional graph game approach for minimum transmission broadcast in iot networks. IEEE Access 8:24385–24396
Chang CH, Yang CT, Lee JY, Lai CL, Kuo CC (2020) On construction and performance evaluation of a virtual desktop infrastructure with GPU accelerated. IEEE Access 8:170162–170173
Chenxi H, **n H, Yu F, Jianfeng X, Yi Q, Pengjun Z, Lin F, Hua Y, Yilu X, Jiahang L (2020) Sample imbalance disease classification model based on association rule feature selection. Pattern Recognit Lett
Chomboon K, Kerdprasop K, Kerdprasop N (2013) Rare class discovery techniques for highly imbalance data. In: Proc. International multi conference of engineers and computer scientists 1
Colledani M, Angius A (2020) Production quality performance of manufacturing systems with in-line product traceability and rework. CIRP Annals
Fan SKS, Hsu CY, Jen CH, Chen KL, Juan LT (2020) Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv Eng Inf 46:101166
Garc S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol Comput 17(3):275–306
S Garca, I Triguero, CJ Carmona, F Herrera (2012) Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Syst 25(1):3–12
Hassan MM, Khanooni MM, Ali MN, Quazi MM (2017) Industrial Automation for Quality Control by SCADA. Int Res J Eng Technol (IRJET) 4(04)
Ijaz MF, Alfian G, Syafrudin M, Rhee J (2018) Hybrid prediction model for type 2 diabetes and hypertension using dbscan-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest. Appl Sci 8(8):1325
Jiang P, Jia F, Wang Y, Zheng M (2014) Real-time quality monitoring and predicting model based on error propagation networks for multistage machining processes. J Intell Manuf 25(3):521–538
Kerdprasop K, Kerdprasop, N (2011) Feature selection and boosting techniques to improve fault detection accuracy in the semiconductor manufacturing process. In: Proceedings of the International MultiConference of Engineers and Computer Scientist
Khoa TV, Saputra YM, Hoang DT, Trung NL, Nguyen D, Ha NV, Dutkiewicz E, (2020). Collaborative learning model for cyberattack detection systems in iot industry 4.0. In: 2020 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, pp. 1–6
Kovãcs G (83AD) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 83:105662
Kristiani E, Yang CT, Huang CY, Ko PC, Fathoni H (2020) On construction of sensors, edge, and cloud (ISEC) framework for smart system integration and applications. IEEE Internet Things J 8(1):309–319
Kumar A, Sharma DK (2021) An optimized multilayer outlier detection for internet of things (IoT) network as industry 4.0 automation and data exchange. In: International Conference on Innovative Computing and Communications. Singapore: Springer. pp 571-584
Lee DH, Yang JK, Lee CH, Kim KJ (2019) A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. J Manuf Syst 52:146–156
Lim P, Goh CK, Tan KC (2016) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybernetics 47(9):2850–2861
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning pp 689-696
McCann M, Li Y, Maguire L, Johnston A (2010) Causality challenge: benchmarking relevant signal components for effective monitoring and process control. Objectives and Assessment, In Causality, pp 277–288
Mubarik MS, Naghavi N, Mubarik M, Kusi-Sarpong S, Khan SA, Zaman SI, Kazmi SHA (2021) Resilience and cleaner production in industry 4.0: Role of supply chain map** and visibility. J Clean Prod 292:126058
Munirathinam S, Ramadoss B (2016) Predictive Models for Equipment Fault Detection in the Semiconductor Manufacturing Process. Int J Eng Technol 8(4):273
Murphy P, Aha D, UCIML repository secom dataset
Sowade E, Ramon E, Mitra KY, Martnez-Domingo C, Pedr M, Pallars J, Loffredo F, Villani F, Gomes HL, Ters L, Baumann RR (2016) All-inkjet-printed thin-film transistors: manufacturing process reliability by root cause analysis. Sci Rep 6:33490
Steiner S, Zeng Y, Young TM, Edwards DJ, Guess FM, Chen CH (2016) A study of missing data imputation in predictive modeling of a wood-composite manufacturing process. J Qual Technol 48(3):284–296
Tibshirani R, Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267288
Triguero I, Gonzlez S, Moyano JM, Garca Lpez S, Alcal Fernndez J, Luengo Martn J, Fernndez Hilario A, Jess D, Mara Jos D, Snchez L, Herrera Triguero F (2017) KEEL 3.0: an open source software for multi-stage analysis in data mining
Tsung CK, Hsieh HY, Yang CT (2019) An implementation of scalable high throughput data platform for logging semiconductor testing results. IEEE Access 7:26497–6506
Wang C, Wang P, Han S, Wang L, Zhao Y, Juan L (2020) FunEffector-Pred: identification of fungi effector by activate learning and genetic algorithm sampling of imbalanced data. IEEE Access 8:57674–7683
Wang H, Chiang C, Paired Dictionary Learning Based on Discriminant Reconstruction Analysis for Sparse Representation, Computer Science and Information Engineering National Chung Cheng University Master’s Thesis
Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proceedings of the first international conference on advanced data and information engineering (DaEng-2013). Singapore: Springer, pp. 13-22
Yang CT, Liu JC, Huang KL, Jiang FC (2014) A method for managing green power of a virtual machine cluster in cloud. Future Generation Comput Syst 37:26–6
Yang CT, Chen ST, Den W, Wang YT, Kristiani E (2019) Implementation of an intelligent indoor environmental monitoring and management system in cloud. Future Generation Comput Syst 96:731–749
Zakzeski J, Bruijnincx PC, Jongerius AL, Weckhuysen BM (2010) The catalytic valorization of lignin for the production of renewable chemicals. Chem Rev 110(6):3552–3599
Zou J, Chang Q, Lei Y, Arinez J (2016) Production system performance identification using sensor data. IEEE Trans Systems Man Cybernetics Syst 48(2):255–64
Zhou X, Xu X, Liang W, Zeng Z, Shimizu S, Yang LT, ** Q (2021) Intelligent small object detection based on digital twinning for smart manufacturing in industrial CPS. IEEE Trans Ind Inf
Zhou X, Hu Y, Liang W, Ma J, ** Q (2020) Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans Ind Inf 17(5):3469–3477
Zhong G, Pun CM (2020) Data Representation by Joint Hypergraph Embedding and Sparse Coding. IEEE Trans Knowledge Data Eng
Funding
This study is conducted under the “III Innovative and Prospective Technologies Project (1/1)” of the Institute for Information Industry which is subsidized by the Ministry of Economic Affairs of the Republic of China.
Author information
Authors and Affiliations
Corresponding author
Additional information
The original online version of this article was revised: Funding information was missing in the original publication of this article.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, HY., Tsung, CK., Hung, CH. et al. Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines. Multimed Tools Appl 81, 36437–36452 (2022). https://doi.org/10.1007/s11042-021-11552-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11552-1