Abstract
Formal concept analysis (FCA) algorithms are computationally expensive. Several parallel and distributed FCA algorithms are proposed to reduce the execution time by taking advantage of the parallelism available in the cloud environment. These approaches can be broadly classified into replication-based and partitioning-based strategies. Replication-based strategies require the entire input data to be in memory throughout execution, and therefore, they suffer from memory bottlenecks while dealing with large datasets. Horizontal partitioning-based approaches overcome the memory bottlenecks but incur enormous communication overhead during concept discovery as they require all-to-all worker communication per concept. Thus, the state-of-the-art frameworks implementing the above strategies do not scale to large datasets in cloud environments running commodity hardware. In this paper, we propose HyPar-FCA+, an improved workload-aware elastic framework for FCA, that overcomes the above scalability issues and also improves resource utilization/cost in a cloud environment. Its salient features are a novel vector bin packing-based algorithm for partitioning the input context, a new workload estimator, and elastic provisioning of cloud resources using a feedback-based predictor. Compared with state-of-the-art distributed FCA frameworks, HyPar-FCA+ is 3–11% and 15–36% faster for real-world datasets (Susy, Webdocs) and synthetic datasets, respectively. In terms of execution cost in the cloud, HyPar-FCA+ takes 27–38% and \(\sim\) 85% less cost for real-world datasets and synthetic datasets, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11227-023-05116-3/MediaObjects/11227_2023_5116_Fig10_HTML.png)
Similar content being viewed by others
Data availability
All of the material is owned by the authors, and/or no permissions are required.
Notes
We have plotted the RMSE using only the first-level concepts.
References
Belohlavek R (2008) Introduction to formal concept analysis, vol 47. Palacky University, Department of Computer Science, Olomouc
Priss U (2006) Formal concept analysis in information science. Annu Rev Inf Sci Technol 40(1):521–543
Gao J, Hao F, Pei Z, Min G (2021) Learning concept interestingness for identifying key structures from social networks. IEEE Trans Netw Sci Eng 8(4):3220–3232
Hao F, Min G, Pei Z, Park D-S, Yang LT (2015) \(k\)-clique community detection in social networks based on formal concept analysis. IEEE Syst J 11(1):250–259
Sun Z, Wang B, Sheng J, Hu Y, Wang Y, Shao J (2017) Identifying influential nodes in complex networks based on weighted formal concept analysis. IEEE Access 5:3777–3789
Hao F, Gao J, Chen J, Nasridinov A, Min G (2021) Skyline \((\lambda ,k)\)-cliques identification from fuzzy attributed social networks. In: IEEE Transactions on Computational Social Systems, pp 1–12
Huang Y, Bian L (2015) Using ontologies and formal concept analysis to integrate heterogeneous tourism information. IEEE Trans Emerg Top Comput 3(2):172–184
Zou C, Zhang D, Wan J, Hassan MM, Lloret J (2017) Using concept lattice for personalized recommendation system design. IEEE Syst J 11(1):305–314
Tu X, Wang Y, Zhang M, Wu J (2016) Using formal concept analysis to identify negative correlations in gene expression data. IEEE/ACM Trans Comput Biol Bioinform 13(2):380–391
Jiang G, Pathak J, Chute CG (2009) Formalizing ICD coding rules using formal concept analysis. J Biomed Inform 42(3):504–517
Yang E, Hao F, Yang Y, De Maio C, Nasridinov A, Min G, Yang LT (2021) Incremental entity summarization with formal concept analysis. IEEE Transactions on Services Computing, pp 1–1
GligorijeviĆ MF, BogdanoviĆ M, VeljkoviĆ N, Stoimenov L (2021) Open data categorization based on formal concept analysis. IEEE Trans Emerg Top Comput 9(2):571–581
Atif J, Hudelot C, Bloch I (2013) Explanatory reasoning for image understanding using formal concept analysis and description logics. IEEE Trans Syst Man Cybern Syst 44(5):552–570
Castellanos A, Cigarrán J, García-Serrano A (2017) Formal concept analysis for topic detection: a clustering quality experimental analysis. Inf Syst 66:24–42
Hao F, Pang G, Pei Z, Qin K, Zhang Y, Wang X (2019) Virtual machines scheduling in mobile edge computing: a formal concept analysis approach. IEEE Trans Sustain Comput 5(3):319–328
Poelmans J, Ignatov DI, Kuznetsov SO, Dedene G (2013) Formal concept analysis in knowledge processing: a survey on applications. Expert Syst Appl 40(16):6538–6560
Poelmans J, Kuznetsov SO, Ignatov DI, Dedene G (2013) Formal concept analysis in knowledge processing: a survey on models and techniques. Expert Syst Appl 40(16):6601–6623
Kuznetsov SO (2001) On computing the size of a lattice and related decision problems. Order 18(4):313–321
Andrews S (2011) In-close2, a high performance formal concept miner. In: International Conference on Conceptual Structures, pp 50–62. Springer
Lucchese C, Orlando S, Perego R (2005) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: International Conference on Discovery Science, pp 16–31. Springer
Ganter B (2010) Two basic algorithms in concept analysis. In: International Conference on Formal Concept Analysis, pp 312–340. Springer
Kuznetsov SO (1999) Learning of simple conceptual graphs from positive and negative examples. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp 384–391. Springer
Negrevergne B, Termier B, Méhaut J-F, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: 2010 International Conference on High Performance Computing & Simulation, pp 521–528. IEEE
Patel S, Agarwal U, Kailasam S (2018) A dynamic load balancing scheme for distributed formal concept analysis. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp 489–496. IEEE
Krajca P, Outrata J, Vychodil V (2010) Parallel algorithm for computing fixpoints of galois connections. Ann Math Artif Intell 59(2):257–272
Zou L, He T, Dai J (2022) A new parallel algorithm for computing formal concepts based on two parallel stages. Inf Sci 586:514–524
Zou L, Chen X, He T, Dai J (2022) Computing formal concepts in parallel via a workload rebalance approach. Int J Mach Learn Cybern 13:2837
Xu B, de Fréin R, Robson E, Foghlú MÓ (2012) Distributed formal concept analysis algorithms based on an iterative MapReduce framework. In: International Conference on Formal Concept Analysis, pp 292–308. Springer
Yoshizoe K, Terada A, Tsuda K (2015) Redesigning pattern mining algorithms for supercomputers. ar**v preprint ar**v:1510.07787
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, Von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not 40(10):519–538
Leroy V, Kirchgessner M, Termier A, Amer-Yahia S (2017) TopPI: an efficient algorithm for item-centric mining. Inf Syst 64:104–118
Goel S, Broder A, Gabrilovich E, Pang B (2010) Anatomy of the long tail: ordinary people with extraordinary tastes. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp 201–210
Packiaraj M, Kailasam S (2022) Hypar-fca: a distributed framework based on hybrid partitioning for fca. J Supercomput 78(10):12589–12620
Krajca P, Vychodil V (2009) Distributed algorithm for computing formal concepts using map-reduce framework. In: International Symposium on Intelligent Data Analysis, pp 333–344. Springer
Chunduri RK, Cherukuri AK (2019) Scalable formal concept analysis algorithms for large datasets using spark. J Ambient Intell Humaniz Comput 10(11):4283–4303
Venkataraman S, Yang Z, Liu D, Liang E, Falaki H, Meng X, **n R, Ghodsi A, Franklin M, Stoica I et al (2016) Sparkr: scaling r programs with spark. In: Proceedings of the 2016 International Conference on Management of Data, pp 1099–1104
Muneeswaran P, Jyoti, Kailasam S (2020) A hybrid partitioning strategy for distributed FCA. In: Proceedings of the Fifthteenth International Conference on Concept Lattices and their Applications, Tallinn, Estonia, June 29–July 1, 2020, pp 71–82
“FIMI repository.” http://fimi.cs.helsinki.fi/. [Online, accessed 01 Aug 2022]
“SPMF repository.” https://www.philippe-fournier-viger.com/spmf/. [Online, accessed 01 Aug 2022]
Al-Dhuraibi Y, Paraiso F, Djarallah N, Merle P (2017) Elasticity in cloud computing: state of the art and research challenges. IEEE Trans Serv Comput 11(2):430–447
da Rosa Righi R, Rodrigues VF, Da Costa CA, Galante G, De Bona LCE, Ferreto T (2015) Autoelastic: automatic resource elasticity for high performance applications in the cloud. IEEE Trans Cloud Comput 4(1):6–19
Kehrer S, Blochinger W (2020) Equilibrium: an elasticity controller for parallel tree search in the cloud. J Supercomput 76(11):9211–9245
Chen X, Zhu F, Chen Z, Min G, Zheng X, Rong C (2020) Resource allocation for cloud-based software services using prediction-enabled feedback control with reinforcement learning. IEEE Trans Cloud Comput 10:1117
Lemire D, Kaser O, Kurz N, Deri L, O’Hara C, Saint-Jacques F, Ssiyankai G (2018) Roaring bitmaps: Implementation of an optimized software library. Softw Pract Exp 48(4):867–895
Panigrahy K, Talwar K, Uyeda L, Wieder U (2011) Heuristics for vector bin packing. research. microsoft. com
Vazirani V (2001) Approximation algorithms. Springer-verlag, New York
Apache Kafka http://kafka.apache.org/. [Online, accessed 01 Aug 2022]
Apache ZooKeeper—Home. https://zookeeper.apache.org/. [Online, accessed 01 Aug 2022]
Welcome to Apache Hadoop. https://hadoop.apache.org/. [Online, accessed 01 Aug 2022]
Chunduri RK, Cherukuri AK (2018) Haloop approach for concept generation in formal concept analysis. J Inf Knowl Manag 17(03):1850029
Funding
This work is partially funded by SPARC, a Govt. of India Initiative under Grant No. SPARC/2018-2019/P682/SL.
Author information
Authors and Affiliations
Contributions
MP and SK contributed to the design and implementation of the research, to the analysis of the results, and the writing of the manuscript. SK supervised the project.
Corresponding author
Ethics declarations
Conflict of interest
None, we declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
There were no human subjects in this manuscript, and informed consent is not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported by SPARC, a Govt. of India Initiative under Grant No. SPARC/2018-2019/P682/SL.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Packiaraj, M., Kailasam, S. HyPar-FCA+: an improved workload-aware elastic framework for FCA. J Supercomput 79, 11767–11796 (2023). https://doi.org/10.1007/s11227-023-05116-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05116-3