Abstract
Boolean Matrix Factorization (BMF) helps unveil hidden patterns in boolean datasets and is a powerful tool in machine learning. However, when dealing with large datasets, reducing data size becomes crucial for BMF algorithms. In this paper, we revisit and propose novel data reduction approaches for BMF algorithms based on Formal Concept Analysis (FCA), aiming to minimize the impact of data reduction on factor quality. Specifically, we introduce the concept of intent vectors , and present incremental algorithms along with their associated theorems for capturing and quantifying these vectors, thereby facilitating a reduction in data size. More importantly, we propose two innovative approaches based on FCA principles that effectively identify and eliminate redundant rows in datasets through distinct deletion strategies. The first approach incrementally deletes rows while preserving the intent vectors of attribute concepts, thus maintaining the quality of factors. The second approach progressively removes rows from the reduced dataset by the first approach, by gradually adjusting the amount of concept loss to minimize any degradation in factor quality. Experiments demonstrate that our first reduction algorithm significantly decreases data size without degrading factor quality, consistently outperforming current leading algorithms with a \(100\%\) success rate. Our second algorithm outperformed the existing algorithm in 72 out of 96 comparisons, greatly reducing data size with minimal loss in factor quality.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reason request.
References
Wille R (1982) Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival I (ed) Ordered Sets. Reidel, Dordrecht -Boston, pp 445–470
Ganter B, Stumme G, Wille R (2005) Formal concept analysis: foundations and applications. Springer-Verlag
Singh PK, Kumar CA, Gani A (2016) A comprehensive survey on formal concept analysis, its research trends and applications. Int J Appl Math Comput Sci 26(2):495–516
Kriegel F (2017) Probabilistic implication bases in FCA and probabilistic bases of GCIs in. Int J Gen Syst 46(5):511–546
Yao Y (2016) Rough-set concept analysis: interpreting RS-definable concepts based on ideas from formal concept analysis. Inf Sci 346:442–462
Hao F, Sun Y, Lin Y (2022) Rough maximal cliques enumeration in incomplete graphs based on partially-known concept learning. Neurocomputing 496:96–106
Zhang C, Tsang ECC, Xu W et al (2023) Incremental concept-cognitive learning approach for concept classification oriented to weighted fuzzy concepts. Knowl-Based Syst 260:110093
Wei L, Qian T, Wan Q et al (2018) A research summary about triadic concept analysis. Int J Mach Learn Cybern 9:699–712
Guo D, Xu W, Qian Y et al (2023) Fuzzy-granular concept-cognitive learning via three-way decision: performance evaluation on dynamic knowledge discovery. IEEE Trans Fuzzy Syst 32(3):1409–1423
Konecny J, Trnecka M (2023) Boolean matrix factorization for symmetric binary variables. Knowl-Based Syst 279:110944
Felde M, Stumme G (2023) Interactive collaborative exploration using incomplete contexts. Data Knowl Eng 143:102104
Zhang X, Chen D, Mi J (2023) Fuzzy decision rule-Based online classification algorithm in fuzzy formal decision contexts. IEEE Trans Fuzzy Syst 31(9):3263–3277
Wu J, sang ECC, Xu W et al (2024) Correlation concept-cognitive learning model for multi-label classification. Knowl-Based Syst 290:111566
Guo D, Xu W, Qian Y et al (2023) M-FCCL: memory-based concept-cognitive learning for dynamic fuzzy data classification and knowledge fusion. Inf Fusion 100:101962
Guo D, Xu W, Ding W et al (2024) Concept-cognitive learning survey: mining and fusing knowledge from Data. Inf Fusion 109:102426
Guo D, Xu W (2023) Fuzzy-based concept-cognitive learning: an investigation of novel approach to tumor diagnosis analysis. Inf Sci 639:118998
Xu W, Guo D, Mi J et al (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Trans Neural Netw Learn Syst 34(10):6798–6812
Hu Q, Yuan Z, Qin K et al (2023) A novel outlier detection approach based on formal concept analysis. Knowl-Based Syst 268:110486
Chen X, Qi J, Zhu X et al (2020) Unlabelled text mining methods based on two extension models of concept lattices. Int J Mach Learn Cybern 11:475–490
Salman HE (2023) Leveraging a combination of machine learning and formal concept analysis to locate the implementation of features in software variants. Inf Softw Technol 164:107320
Khaund A, Sharma AM, Tiwari A et al (2023) RD-FCA: a resilient distributed framework for formal concept analysis. J Parallel Distrib Comput 179:104710
Hao F, Yang Y, Min G, Loia V (2021) Incremental construction of three-way concept lattice for knowledge discovery in social networks. Inf Sci 578:257–280
Fan M, Luo S, Li J (2023) Network rule extraction under the network formal context based on three-way decision. Appl Intell 53(5):5126–5145
Zhang C, Tsang ECC, Xu W et al (2024) Dynamic updating variable precision three-way concept method based on two-way concept-cognitive learning in fuzzy formal contexts. Inf Sci 655:119818
Rungruang C, Riyapan P, Intarasit A et al (2024) RFM model customer segmentation based on hierarchical approach using FCA. Expert Syst Appl 237:121449
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques (3rd ed.). Morgan Kaufmann
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Elsevier
Kelleher JD, Mac Namee B, D’Arcy A (2015) Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT Press
Van Der Maaten L, Postma E, Van den Herik J (2008) Dimensionality Reduction: a comparative review. Tilburg University Technical Report
Ayesha S, Hanif MK, Talib R (2020) Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf Fusion 59:44–58
Yuan Z, Chen H, **e P et al (2021) Attribute reduction methods in fuzzy rough set theory: an overview, comparative experiments, and new directions. Appl Soft Comput 107:107353
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306
Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Liu H, Motoda H, Setiono R, Zhao Z (2019) Feature selection: an ever evolving frontier in data mining. J Mach Learn Res 20:1–39
Jia W, Sun M, Lian J, Hou S (2022) Feature dimensionality reduction: a review. Complex Intell Syst 8(3):2663–2693
Ahadzadeh B, Abdar M, Safara F et al (2023) SFE: a simple, fast and efficient feature selection algorithm for high-dimensional data. IEEE Trans Evol Comput 27(6):1896–1911
Elloumi S, Jaam J, Hasnah A et al (2004) A multi-level conceptual data reduction approach based on the Lukasiewicz implication. Inf Sci 163(4):253–262
Li J, Mei C, Wang J et al (2014) Rule-preserved object compression in formal decision contexts using concept lattices. Knowl-Based Syst 71:435–445
Trnecka M, Trneckova M (2018) Data reduction for Boolean matrix factorization algorithms based on formal concept analysis. Knowl-Based Syst 158:75–80
Belohlavek R, Vychodil V (2010) Discovery of optimal factors in binary data via a novel method of matrix decomposition. J Comput Syst Sci 76(1):3–20
Belohlavek R, Trnecka M (2015) From-below approximations in Boolean matrix factorization: geometry and new algorithm. J Comput Syst Sci 81(8):1678–1697
Miettinen P, Mielikäinen T, Gionis A et al (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362
Hess S, Morik K, Piatkowski N (2017) The PRIMPING routine-Tiling through proximal alternating linearized minimization. Data Min Knowl Disc 31:1090–1131
Ravanbakhsh S, Póczos B, Greiner R (2016) Boolean matrix factorization and noisy completion via message passing. International Conference on Machine Learning. PMLR, 945-954
Han B, Zhao N, Zeng C et al (2022) ACPred-BMF: bidirectional LSTM with multiple feature representations for explainable anticancer peptide prediction. Sci Rep 12(1):21915
Stockmeyer LJ (1975) The set basis problem is NP-complete, research reports. IBM Thomas J, Watson Research Division
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer-Verlag
Godin R, Missaoui R, Alaoui H (1995) Incremental concept formation algorithms based on Galois (concept) lattices. Comput Intell 11(2):246–267
**ang Y, ** R, Fuhry D et al (2011) Summarizing transactional databases with overlapped hyperrectangles. Data Min Knowl Disc 23:215–251
Lucchese C, Orlando S, Perego R (2014) A unifying framework for mining approximate top-\(k\) binary patterns. IEEE Trans Knowl Data Eng 26(12):2900–2913
Mouakher A, Yahia SB (2016) QualityCover: efficient binary relation coverage guided by induced knowledge quality. Inf Sci 355:58–73
Liang L, Zhu K, Lu S (2020) BEM: mining coregulation patterns in transcriptomics via boolean matrix factorization. Bioinformatics 36(13):4030–4037
Dixon WJ (1992) BMDP statistical software manual: to accompany BMDP release 7. University of California Press
Schütt D (1987) Abschätzungen für die Anzahl der Begriffe von Kontexten. Master’s Thesis, TH Darmstadt
Kuznetsov S (2001) On computing the size of a lattice and related decision problems. Order 18(4):313–321
Prisner E (2000) Bicliques in graphs I: bounds on their number. Combinatorica 20(1):109–117
Kovács L (2018) Efficient approximation for counting of formal concepts generated from FCA context. Miskolc Math Notes 19(2):983–996
Sakurai T (2021) On formal concepts of random formal contexts. Inf Sci 578:615–620
Bordat JP (1986) Calcul pratique du treillis de Galois d’une Correspondence. Math Sci Hum 96:31–47
Zou L, He T, Dai J (2022) A new parallel algorithm for computing formal concepts based on two parallel stages. Inf Sci 586:514–524
Andrews S (2009) In-close, a fast algorithm for computing formal concepts. In: ICCS supplementary proceedings, Springer, 483
Zou LL, Zhang Z, Long JJ (2015) A fast incremental algorithm for constructing concept lattices. Expert Syst Appl 42(9):4474–4481
Kourie DG, Obiedkov S, Watson BW et al (2009) An incremental algorithm to construct a lattice of set intersections. Sci Comput Program 74(3):128–142
Ke Y, Li J, Li S (2024) Bit-Close: a fast incremental concept calculation method. Appl Intell 54:2582–2593
Harman HH (1970) Modern factor analysis, 2nd edn. The University of Chicago Press, Chicago
Rosen KH (2011) Discrete mathematics and its applications, 7th edn. McGraw Hill
Acknowledgements
This work is supported by the Macao Science and Technology Development Funds (0036/2023/RIA1).
Author information
Authors and Affiliations
Contributions
Lanzhen Yang: Conceptualization, Methodology, Software, Writing - original draft; Eric C.C. Tsang: Supervision, Writing - review & editing, Funding acquisition; Hua Mao: review & editing; Chengling Zhang: Formal analysis, editing; Jiaming Wu: Formal analysis, editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, L., Tsang, E.C.C., Mao, H. et al. Revisiting data reduction for boolean matrix factorization algorithms based on formal concept analysis. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02226-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13042-024-02226-z