Log in

Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines

  • 1213: Computational Optimization and Applications for Heterogeneous Multimedia Data
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

A Correction to this article was published on 17 November 2022

This article has been updated

Abstract

The product quality is the major factor for enhancing the production ability and competitiveness. Decreasing the cost and increasing production capacity are common approaches to realize the enhancement of the product quality. The production managers apply various multimedia data to evaluate the product quality. For example, capturing the stam** sound to evaluate the correct cutting and taking the component image to measure the chip positions are common heterogeneous multimedia data that are applied to manufacturing. However, the production managers prefer to minimize the number of defective products, e. g. the secondary operation and fixing the product tolerance in the assembly stage, to fitting the production target. Therefore, contrasting the defective product identification procedure with high accuracy becomes a challenge due to the decrease of the number of the defective products. In this paper, we propose the Rule Classification with Oversampling (RCOS) approach to provide the high accuracy with few defective products. The proposed RCOS includes the oversampling technique and the rule classification approach to emphasize the properties of the defective products and provide the precise classes. Given few defective products, capturing the properties of the failure is difficult. The RCOS considers the revised Synthetic Minority Over-Sampling Technique (SMOTE) to highlight the failure properties, and then the rule model is considered to extract the root cause of the defective products. We implement the proposed RCOS in the semiconductor production line. From the experiment results, the proposed RCOS provide about at most 98% in accuracy, and the comparison shows that the results have been improved in common criteria e. g. the true-positive rate, G mean, F1 score, and False Alarm Rate. Therefore, the proposed RCOS provides high practicality for the implementation consideration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Change history

Notes

  1. https://csr.tsmc.com/download/csr/2018_tsmc_csr/english/pdf/e_all.pdf

References

  1. Agrawal R, Srikant, R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB

  2. Aharon M, Elad M, Bruckstein A (2006) K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54:4311–4322

    Article  MATH  Google Scholar 

  3. Anand A, Pugalenthi G, Fogel GB, Suganthan PN (2010) An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39(5):1385–1391

    Article  Google Scholar 

  4. Arif F, Suryana N, Hussin B (2013) Cascade quality prediction method using multiple pca+ id3 for multi-stage manufacturing system. Ieri Procedia 4:201–207

    Article  Google Scholar 

  5. Arif F, Suryana N, Hussin B (2013) A data mining approach for develo** quality prediction model in multi-stage manufacturing. Int J Comput Appl 69(22)

  6. Boukerche A, Zheng L, Alfandi O (2020) Outlier detection: Methods, models, and classification. ACM Computing Surveys (CSUR) 53(3):1–37

    Article  Google Scholar 

  7. Chan YW, Chien FT, Chang MK, Ho WC, Hung JC (2020) A coalitional graph game approach for minimum transmission broadcast in iot networks. IEEE Access 8:24385–24396

    Article  Google Scholar 

  8. Chang CH, Yang CT, Lee JY, Lai CL, Kuo CC (2020) On construction and performance evaluation of a virtual desktop infrastructure with GPU accelerated. IEEE Access 8:170162–170173

  9. Chenxi H, **n H, Yu F, Jianfeng X, Yi Q, Pengjun Z, Lin F, Hua Y, Yilu X, Jiahang L (2020) Sample imbalance disease classification model based on association rule feature selection. Pattern Recognit Lett

  10. Chomboon K, Kerdprasop K, Kerdprasop N (2013) Rare class discovery techniques for highly imbalance data. In: Proc. International multi conference of engineers and computer scientists 1

  11. Colledani M, Angius A (2020) Production quality performance of manufacturing systems with in-line product traceability and rework. CIRP Annals

  12. Fan SKS, Hsu CY, Jen CH, Chen KL, Juan LT (2020) Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv Eng Inf 46:101166

    Article  Google Scholar 

  13. Garc S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol Comput 17(3):275–306

    Article  Google Scholar 

  14. S Garca, I Triguero, CJ Carmona, F Herrera (2012) Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Syst 25(1):3–12

    Article  Google Scholar 

  15. Hassan MM, Khanooni MM, Ali MN, Quazi MM (2017) Industrial Automation for Quality Control by SCADA. Int Res J Eng Technol (IRJET) 4(04)

  16. Ijaz MF, Alfian G, Syafrudin M, Rhee J (2018) Hybrid prediction model for type 2 diabetes and hypertension using dbscan-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest. Appl Sci 8(8):1325

    Article  Google Scholar 

  17. Jiang P, Jia F, Wang Y, Zheng M (2014) Real-time quality monitoring and predicting model based on error propagation networks for multistage machining processes. J Intell Manuf 25(3):521–538

    Article  Google Scholar 

  18. Kerdprasop K, Kerdprasop, N (2011) Feature selection and boosting techniques to improve fault detection accuracy in the semiconductor manufacturing process. In: Proceedings of the International MultiConference of Engineers and Computer Scientist

  19. Khoa TV, Saputra YM, Hoang DT, Trung NL, Nguyen D, Ha NV, Dutkiewicz E, (2020). Collaborative learning model for cyberattack detection systems in iot industry 4.0. In: 2020 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, pp. 1–6

  20. Kovãcs G (83AD) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 83:105662

  21. Kristiani E, Yang CT, Huang CY, Ko PC, Fathoni H (2020) On construction of sensors, edge, and cloud (ISEC) framework for smart system integration and applications. IEEE Internet Things J 8(1):309–319

    Article  Google Scholar 

  22. Kumar A, Sharma DK (2021) An optimized multilayer outlier detection for internet of things (IoT) network as industry 4.0 automation and data exchange. In: International Conference on Innovative Computing and Communications. Singapore: Springer. pp 571-584

  23. Lee DH, Yang JK, Lee CH, Kim KJ (2019) A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. J Manuf Syst 52:146–156

    Article  Google Scholar 

  24. Lim P, Goh CK, Tan KC (2016) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans Cybernetics 47(9):2850–2861

    Article  Google Scholar 

  25. Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning pp 689-696

  26. McCann M, Li Y, Maguire L, Johnston A (2010) Causality challenge: benchmarking relevant signal components for effective monitoring and process control. Objectives and Assessment, In Causality, pp 277–288

    Google Scholar 

  27. Mubarik MS, Naghavi N, Mubarik M, Kusi-Sarpong S, Khan SA, Zaman SI, Kazmi SHA (2021) Resilience and cleaner production in industry 4.0: Role of supply chain map** and visibility. J Clean Prod 292:126058

  28. Munirathinam S, Ramadoss B (2016) Predictive Models for Equipment Fault Detection in the Semiconductor Manufacturing Process. Int J Eng Technol 8(4):273

    Article  Google Scholar 

  29. Murphy P, Aha D, UCIML repository secom dataset

  30. Sowade E, Ramon E, Mitra KY, Martnez-Domingo C, Pedr M, Pallars J, Loffredo F, Villani F, Gomes HL, Ters L, Baumann RR (2016) All-inkjet-printed thin-film transistors: manufacturing process reliability by root cause analysis. Sci Rep 6:33490

    Article  Google Scholar 

  31. Steiner S, Zeng Y, Young TM, Edwards DJ, Guess FM, Chen CH (2016) A study of missing data imputation in predictive modeling of a wood-composite manufacturing process. J Qual Technol 48(3):284–296

    Article  Google Scholar 

  32. Tibshirani R, Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267288

  33. Triguero I, Gonzlez S, Moyano JM, Garca Lpez S, Alcal Fernndez J, Luengo Martn J, Fernndez Hilario A, Jess D, Mara Jos D, Snchez L, Herrera Triguero F (2017) KEEL 3.0: an open source software for multi-stage analysis in data mining

  34. Tsung CK, Hsieh HY, Yang CT (2019) An implementation of scalable high throughput data platform for logging semiconductor testing results. IEEE Access 7:26497–6506

    Article  Google Scholar 

  35. Wang C, Wang P, Han S, Wang L, Zhao Y, Juan L (2020) FunEffector-Pred: identification of fungi effector by activate learning and genetic algorithm sampling of imbalanced data. IEEE Access 8:57674–7683

    Article  Google Scholar 

  36. Wang H, Chiang C, Paired Dictionary Learning Based on Discriminant Reconstruction Analysis for Sparse Representation, Computer Science and Information Engineering National Chung Cheng University Master’s Thesis

  37. Yap BW, Abd Rani K, Abd Rahman HA, Fong S, Khairudin Z, Abdullah NN (2014) An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Proceedings of the first international conference on advanced data and information engineering (DaEng-2013). Singapore: Springer, pp. 13-22

  38. Yang CT, Liu JC, Huang KL, Jiang FC (2014) A method for managing green power of a virtual machine cluster in cloud. Future Generation Comput Syst 37:26–6

    Article  Google Scholar 

  39. Yang CT, Chen ST, Den W, Wang YT, Kristiani E (2019) Implementation of an intelligent indoor environmental monitoring and management system in cloud. Future Generation Comput Syst 96:731–749

    Article  Google Scholar 

  40. Zakzeski J, Bruijnincx PC, Jongerius AL, Weckhuysen BM (2010) The catalytic valorization of lignin for the production of renewable chemicals. Chem Rev 110(6):3552–3599

    Article  Google Scholar 

  41. Zou J, Chang Q, Lei Y, Arinez J (2016) Production system performance identification using sensor data. IEEE Trans Systems Man Cybernetics Syst 48(2):255–64

    Article  Google Scholar 

  42. Zhou X, Xu X, Liang W, Zeng Z, Shimizu S, Yang LT, ** Q (2021) Intelligent small object detection based on digital twinning for smart manufacturing in industrial CPS. IEEE Trans Ind Inf

  43. Zhou X, Hu Y, Liang W, Ma J, ** Q (2020) Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans Ind Inf 17(5):3469–3477

    Article  Google Scholar 

  44. Zhong G, Pun CM (2020) Data Representation by Joint Hypergraph Embedding and Sparse Coding. IEEE Trans Knowledge Data Eng

Download references

Funding

This study is conducted under the “III Innovative and Prospective Technologies Project (1/1)” of the Institute for Information Industry which is subsidized by the Ministry of Economic Affairs of the Republic of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen-Kun Tsung.

Additional information

The original online version of this article was revised: Funding information was missing in the original publication of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, HY., Tsung, CK., Hung, CH. et al. Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines. Multimed Tools Appl 81, 36437–36452 (2022). https://doi.org/10.1007/s11042-021-11552-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11552-1

Keywords

Navigation