Log in

Revisiting data reduction for boolean matrix factorization algorithms based on formal concept analysis

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Boolean Matrix Factorization (BMF) helps unveil hidden patterns in boolean datasets and is a powerful tool in machine learning. However, when dealing with large datasets, reducing data size becomes crucial for BMF algorithms. In this paper, we revisit and propose novel data reduction approaches for BMF algorithms based on Formal Concept Analysis (FCA), aiming to minimize the impact of data reduction on factor quality. Specifically, we introduce the concept of intent vectors , and present incremental algorithms along with their associated theorems for capturing and quantifying these vectors, thereby facilitating a reduction in data size. More importantly, we propose two innovative approaches based on FCA principles that effectively identify and eliminate redundant rows in datasets through distinct deletion strategies. The first approach incrementally deletes rows while preserving the intent vectors of attribute concepts, thus maintaining the quality of factors. The second approach progressively removes rows from the reduced dataset by the first approach, by gradually adjusting the amount of concept loss to minimize any degradation in factor quality. Experiments demonstrate that our first reduction algorithm significantly decreases data size without degrading factor quality, consistently outperforming current leading algorithms with a \(100\%\) success rate. Our second algorithm outperformed the existing algorithm in 72 out of 96 comparisons, greatly reducing data size with minimal loss in factor quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 4
Fig. 5
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Fig. 6
Fig. 7
Algorithm 9
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reason request.

References

  1. Wille R (1982) Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival I (ed) Ordered Sets. Reidel, Dordrecht -Boston, pp 445–470

    Chapter  Google Scholar 

  2. Ganter B, Stumme G, Wille R (2005) Formal concept analysis: foundations and applications. Springer-Verlag

    Book  Google Scholar 

  3. Singh PK, Kumar CA, Gani A (2016) A comprehensive survey on formal concept analysis, its research trends and applications. Int J Appl Math Comput Sci 26(2):495–516

    Article  MathSciNet  Google Scholar 

  4. Kriegel F (2017) Probabilistic implication bases in FCA and probabilistic bases of GCIs in. Int J Gen Syst 46(5):511–546

    Article  MathSciNet  Google Scholar 

  5. Yao Y (2016) Rough-set concept analysis: interpreting RS-definable concepts based on ideas from formal concept analysis. Inf Sci 346:442–462

    Article  MathSciNet  Google Scholar 

  6. Hao F, Sun Y, Lin Y (2022) Rough maximal cliques enumeration in incomplete graphs based on partially-known concept learning. Neurocomputing 496:96–106

    Article  Google Scholar 

  7. Zhang C, Tsang ECC, Xu W et al (2023) Incremental concept-cognitive learning approach for concept classification oriented to weighted fuzzy concepts. Knowl-Based Syst 260:110093

    Article  Google Scholar 

  8. Wei L, Qian T, Wan Q et al (2018) A research summary about triadic concept analysis. Int J Mach Learn Cybern 9:699–712

    Article  Google Scholar 

  9. Guo D, Xu W, Qian Y et al (2023) Fuzzy-granular concept-cognitive learning via three-way decision: performance evaluation on dynamic knowledge discovery. IEEE Trans Fuzzy Syst 32(3):1409–1423

    Article  Google Scholar 

  10. Konecny J, Trnecka M (2023) Boolean matrix factorization for symmetric binary variables. Knowl-Based Syst 279:110944

    Article  Google Scholar 

  11. Felde M, Stumme G (2023) Interactive collaborative exploration using incomplete contexts. Data Knowl Eng 143:102104

    Article  Google Scholar 

  12. Zhang X, Chen D, Mi J (2023) Fuzzy decision rule-Based online classification algorithm in fuzzy formal decision contexts. IEEE Trans Fuzzy Syst 31(9):3263–3277

    Article  Google Scholar 

  13. Wu J, sang ECC, Xu W et al (2024) Correlation concept-cognitive learning model for multi-label classification. Knowl-Based Syst 290:111566

    Article  Google Scholar 

  14. Guo D, Xu W, Qian Y et al (2023) M-FCCL: memory-based concept-cognitive learning for dynamic fuzzy data classification and knowledge fusion. Inf Fusion 100:101962

    Article  Google Scholar 

  15. Guo D, Xu W, Ding W et al (2024) Concept-cognitive learning survey: mining and fusing knowledge from Data. Inf Fusion 109:102426

    Article  Google Scholar 

  16. Guo D, Xu W (2023) Fuzzy-based concept-cognitive learning: an investigation of novel approach to tumor diagnosis analysis. Inf Sci 639:118998

    Article  Google Scholar 

  17. Xu W, Guo D, Mi J et al (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Trans Neural Netw Learn Syst 34(10):6798–6812

    Article  MathSciNet  Google Scholar 

  18. Hu Q, Yuan Z, Qin K et al (2023) A novel outlier detection approach based on formal concept analysis. Knowl-Based Syst 268:110486

    Article  Google Scholar 

  19. Chen X, Qi J, Zhu X et al (2020) Unlabelled text mining methods based on two extension models of concept lattices. Int J Mach Learn Cybern 11:475–490

    Article  Google Scholar 

  20. Salman HE (2023) Leveraging a combination of machine learning and formal concept analysis to locate the implementation of features in software variants. Inf Softw Technol 164:107320

    Article  Google Scholar 

  21. Khaund A, Sharma AM, Tiwari A et al (2023) RD-FCA: a resilient distributed framework for formal concept analysis. J Parallel Distrib Comput 179:104710

    Article  Google Scholar 

  22. Hao F, Yang Y, Min G, Loia V (2021) Incremental construction of three-way concept lattice for knowledge discovery in social networks. Inf Sci 578:257–280

    Article  MathSciNet  Google Scholar 

  23. Fan M, Luo S, Li J (2023) Network rule extraction under the network formal context based on three-way decision. Appl Intell 53(5):5126–5145

    Google Scholar 

  24. Zhang C, Tsang ECC, Xu W et al (2024) Dynamic updating variable precision three-way concept method based on two-way concept-cognitive learning in fuzzy formal contexts. Inf Sci 655:119818

    Article  Google Scholar 

  25. Rungruang C, Riyapan P, Intarasit A et al (2024) RFM model customer segmentation based on hierarchical approach using FCA. Expert Syst Appl 237:121449

    Article  Google Scholar 

  26. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques (3rd ed.). Morgan Kaufmann

  27. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Elsevier

    Google Scholar 

  28. Kelleher JD, Mac Namee B, D’Arcy A (2015) Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT Press

  29. Van Der Maaten L, Postma E, Van den Herik J (2008) Dimensionality Reduction: a comparative review. Tilburg University Technical Report

  30. Ayesha S, Hanif MK, Talib R (2020) Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inf Fusion 59:44–58

    Article  Google Scholar 

  31. Yuan Z, Chen H, **e P et al (2021) Attribute reduction methods in fuzzy rough set theory: an overview, comparative experiments, and new directions. Appl Soft Comput 107:107353

    Article  Google Scholar 

  32. Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306

    Article  MathSciNet  Google Scholar 

  33. Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of continuous attributes as preprocessing for machine learning. Int J Approx Reason 15(4):319–331

    Article  Google Scholar 

  34. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  35. Liu H, Motoda H, Setiono R, Zhao Z (2019) Feature selection: an ever evolving frontier in data mining. J Mach Learn Res 20:1–39

    MathSciNet  Google Scholar 

  36. Jia W, Sun M, Lian J, Hou S (2022) Feature dimensionality reduction: a review. Complex Intell Syst 8(3):2663–2693

    Article  Google Scholar 

  37. Ahadzadeh B, Abdar M, Safara F et al (2023) SFE: a simple, fast and efficient feature selection algorithm for high-dimensional data. IEEE Trans Evol Comput 27(6):1896–1911

    Article  Google Scholar 

  38. Elloumi S, Jaam J, Hasnah A et al (2004) A multi-level conceptual data reduction approach based on the Lukasiewicz implication. Inf Sci 163(4):253–262

    Article  MathSciNet  Google Scholar 

  39. Li J, Mei C, Wang J et al (2014) Rule-preserved object compression in formal decision contexts using concept lattices. Knowl-Based Syst 71:435–445

    Article  Google Scholar 

  40. Trnecka M, Trneckova M (2018) Data reduction for Boolean matrix factorization algorithms based on formal concept analysis. Knowl-Based Syst 158:75–80

    Article  Google Scholar 

  41. Belohlavek R, Vychodil V (2010) Discovery of optimal factors in binary data via a novel method of matrix decomposition. J Comput Syst Sci 76(1):3–20

    Article  MathSciNet  Google Scholar 

  42. Belohlavek R, Trnecka M (2015) From-below approximations in Boolean matrix factorization: geometry and new algorithm. J Comput Syst Sci 81(8):1678–1697

    Article  MathSciNet  Google Scholar 

  43. Miettinen P, Mielikäinen T, Gionis A et al (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362

    Article  Google Scholar 

  44. Hess S, Morik K, Piatkowski N (2017) The PRIMPING routine-Tiling through proximal alternating linearized minimization. Data Min Knowl Disc 31:1090–1131

    Article  MathSciNet  Google Scholar 

  45. Ravanbakhsh S, Póczos B, Greiner R (2016) Boolean matrix factorization and noisy completion via message passing. International Conference on Machine Learning. PMLR, 945-954

  46. Han B, Zhao N, Zeng C et al (2022) ACPred-BMF: bidirectional LSTM with multiple feature representations for explainable anticancer peptide prediction. Sci Rep 12(1):21915

    Article  Google Scholar 

  47. Stockmeyer LJ (1975) The set basis problem is NP-complete, research reports. IBM Thomas J, Watson Research Division

    Google Scholar 

  48. Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer-Verlag

    Book  Google Scholar 

  49. Godin R, Missaoui R, Alaoui H (1995) Incremental concept formation algorithms based on Galois (concept) lattices. Comput Intell 11(2):246–267

    Article  Google Scholar 

  50. **ang Y, ** R, Fuhry D et al (2011) Summarizing transactional databases with overlapped hyperrectangles. Data Min Knowl Disc 23:215–251

    Article  MathSciNet  Google Scholar 

  51. Lucchese C, Orlando S, Perego R (2014) A unifying framework for mining approximate top-\(k\) binary patterns. IEEE Trans Knowl Data Eng 26(12):2900–2913

    Article  Google Scholar 

  52. Mouakher A, Yahia SB (2016) QualityCover: efficient binary relation coverage guided by induced knowledge quality. Inf Sci 355:58–73

    Article  Google Scholar 

  53. Liang L, Zhu K, Lu S (2020) BEM: mining coregulation patterns in transcriptomics via boolean matrix factorization. Bioinformatics 36(13):4030–4037

    Article  Google Scholar 

  54. Dixon WJ (1992) BMDP statistical software manual: to accompany BMDP release 7. University of California Press

    Google Scholar 

  55. Schütt D (1987) Abschätzungen für die Anzahl der Begriffe von Kontexten. Master’s Thesis, TH Darmstadt

  56. Kuznetsov S (2001) On computing the size of a lattice and related decision problems. Order 18(4):313–321

    Article  MathSciNet  Google Scholar 

  57. Prisner E (2000) Bicliques in graphs I: bounds on their number. Combinatorica 20(1):109–117

    Article  MathSciNet  Google Scholar 

  58. Kovács L (2018) Efficient approximation for counting of formal concepts generated from FCA context. Miskolc Math Notes 19(2):983–996

    Article  MathSciNet  Google Scholar 

  59. Sakurai T (2021) On formal concepts of random formal contexts. Inf Sci 578:615–620

    Article  MathSciNet  Google Scholar 

  60. Bordat JP (1986) Calcul pratique du treillis de Galois d’une Correspondence. Math Sci Hum 96:31–47

    MathSciNet  Google Scholar 

  61. Zou L, He T, Dai J (2022) A new parallel algorithm for computing formal concepts based on two parallel stages. Inf Sci 586:514–524

    Article  Google Scholar 

  62. Andrews S (2009) In-close, a fast algorithm for computing formal concepts. In: ICCS supplementary proceedings, Springer, 483

  63. Zou LL, Zhang Z, Long JJ (2015) A fast incremental algorithm for constructing concept lattices. Expert Syst Appl 42(9):4474–4481

    Article  Google Scholar 

  64. Kourie DG, Obiedkov S, Watson BW et al (2009) An incremental algorithm to construct a lattice of set intersections. Sci Comput Program 74(3):128–142

    Article  MathSciNet  Google Scholar 

  65. Ke Y, Li J, Li S (2024) Bit-Close: a fast incremental concept calculation method. Appl Intell 54:2582–2593

    Article  Google Scholar 

  66. Harman HH (1970) Modern factor analysis, 2nd edn. The University of Chicago Press, Chicago

    Google Scholar 

  67. Rosen KH (2011) Discrete mathematics and its applications, 7th edn. McGraw Hill

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Macao Science and Technology Development Funds (0036/2023/RIA1).

Author information

Authors and Affiliations

Authors

Contributions

Lanzhen Yang: Conceptualization, Methodology, Software, Writing - original draft; Eric C.C. Tsang: Supervision, Writing - review & editing, Funding acquisition; Hua Mao: review & editing; Chengling Zhang: Formal analysis, editing; Jiaming Wu: Formal analysis, editing.

Corresponding author

Correspondence to Eric C. C. Tsang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Tsang, E.C.C., Mao, H. et al. Revisiting data reduction for boolean matrix factorization algorithms based on formal concept analysis. Int. J. Mach. Learn. & Cyber. (2024). https://doi.org/10.1007/s13042-024-02226-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13042-024-02226-z

Keywords

Navigation