Log in

HyPar-FCA+: an improved workload-aware elastic framework for FCA

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Formal concept analysis (FCA) algorithms are computationally expensive. Several parallel and distributed FCA algorithms are proposed to reduce the execution time by taking advantage of the parallelism available in the cloud environment. These approaches can be broadly classified into replication-based and partitioning-based strategies. Replication-based strategies require the entire input data to be in memory throughout execution, and therefore, they suffer from memory bottlenecks while dealing with large datasets. Horizontal partitioning-based approaches overcome the memory bottlenecks but incur enormous communication overhead during concept discovery as they require all-to-all worker communication per concept. Thus, the state-of-the-art frameworks implementing the above strategies do not scale to large datasets in cloud environments running commodity hardware. In this paper, we propose HyPar-FCA+, an improved workload-aware elastic framework for FCA, that overcomes the above scalability issues and also improves resource utilization/cost in a cloud environment. Its salient features are a novel vector bin packing-based algorithm for partitioning the input context, a new workload estimator, and elastic provisioning of cloud resources using a feedback-based predictor. Compared with state-of-the-art distributed FCA frameworks, HyPar-FCA+ is 3–11% and 15–36% faster for real-world datasets (Susy, Webdocs) and synthetic datasets, respectively. In terms of execution cost in the cloud, HyPar-FCA+ takes 27–38% and \(\sim\) 85% less cost for real-world datasets and synthetic datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

All of the material is owned by the authors, and/or no permissions are required.

Notes

  1. We have plotted the RMSE using only the first-level concepts.

References

  1. Belohlavek R (2008) Introduction to formal concept analysis, vol 47. Palacky University, Department of Computer Science, Olomouc

  2. Priss U (2006) Formal concept analysis in information science. Annu Rev Inf Sci Technol 40(1):521–543

    Article  Google Scholar 

  3. Gao J, Hao F, Pei Z, Min G (2021) Learning concept interestingness for identifying key structures from social networks. IEEE Trans Netw Sci Eng 8(4):3220–3232

    Article  MathSciNet  Google Scholar 

  4. Hao F, Min G, Pei Z, Park D-S, Yang LT (2015) \(k\)-clique community detection in social networks based on formal concept analysis. IEEE Syst J 11(1):250–259

    Article  Google Scholar 

  5. Sun Z, Wang B, Sheng J, Hu Y, Wang Y, Shao J (2017) Identifying influential nodes in complex networks based on weighted formal concept analysis. IEEE Access 5:3777–3789

    Article  Google Scholar 

  6. Hao F, Gao J, Chen J, Nasridinov A, Min G (2021) Skyline \((\lambda ,k)\)-cliques identification from fuzzy attributed social networks. In: IEEE Transactions on Computational Social Systems, pp 1–12

  7. Huang Y, Bian L (2015) Using ontologies and formal concept analysis to integrate heterogeneous tourism information. IEEE Trans Emerg Top Comput 3(2):172–184

    Article  Google Scholar 

  8. Zou C, Zhang D, Wan J, Hassan MM, Lloret J (2017) Using concept lattice for personalized recommendation system design. IEEE Syst J 11(1):305–314

    Article  Google Scholar 

  9. Tu X, Wang Y, Zhang M, Wu J (2016) Using formal concept analysis to identify negative correlations in gene expression data. IEEE/ACM Trans Comput Biol Bioinform 13(2):380–391

    Article  Google Scholar 

  10. Jiang G, Pathak J, Chute CG (2009) Formalizing ICD coding rules using formal concept analysis. J Biomed Inform 42(3):504–517

    Article  Google Scholar 

  11. Yang E, Hao F, Yang Y, De Maio C, Nasridinov A, Min G, Yang LT (2021) Incremental entity summarization with formal concept analysis. IEEE Transactions on Services Computing, pp 1–1

  12. GligorijeviĆ MF, BogdanoviĆ M, VeljkoviĆ N, Stoimenov L (2021) Open data categorization based on formal concept analysis. IEEE Trans Emerg Top Comput 9(2):571–581

    Article  Google Scholar 

  13. Atif J, Hudelot C, Bloch I (2013) Explanatory reasoning for image understanding using formal concept analysis and description logics. IEEE Trans Syst Man Cybern Syst 44(5):552–570

    Article  Google Scholar 

  14. Castellanos A, Cigarrán J, García-Serrano A (2017) Formal concept analysis for topic detection: a clustering quality experimental analysis. Inf Syst 66:24–42

    Article  Google Scholar 

  15. Hao F, Pang G, Pei Z, Qin K, Zhang Y, Wang X (2019) Virtual machines scheduling in mobile edge computing: a formal concept analysis approach. IEEE Trans Sustain Comput 5(3):319–328

    Article  Google Scholar 

  16. Poelmans J, Ignatov DI, Kuznetsov SO, Dedene G (2013) Formal concept analysis in knowledge processing: a survey on applications. Expert Syst Appl 40(16):6538–6560

    Article  Google Scholar 

  17. Poelmans J, Kuznetsov SO, Ignatov DI, Dedene G (2013) Formal concept analysis in knowledge processing: a survey on models and techniques. Expert Syst Appl 40(16):6601–6623

    Article  Google Scholar 

  18. Kuznetsov SO (2001) On computing the size of a lattice and related decision problems. Order 18(4):313–321

    Article  MathSciNet  MATH  Google Scholar 

  19. Andrews S (2011) In-close2, a high performance formal concept miner. In: International Conference on Conceptual Structures, pp 50–62. Springer

  20. Lucchese C, Orlando S, Perego R (2005) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36

    Article  Google Scholar 

  21. Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: International Conference on Discovery Science, pp 16–31. Springer

  22. Ganter B (2010) Two basic algorithms in concept analysis. In: International Conference on Formal Concept Analysis, pp 312–340. Springer

  23. Kuznetsov SO (1999) Learning of simple conceptual graphs from positive and negative examples. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp 384–391. Springer

  24. Negrevergne B, Termier B, Méhaut J-F, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: 2010 International Conference on High Performance Computing & Simulation, pp 521–528. IEEE

  25. Patel S, Agarwal U, Kailasam S (2018) A dynamic load balancing scheme for distributed formal concept analysis. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pp 489–496. IEEE

  26. Krajca P, Outrata J, Vychodil V (2010) Parallel algorithm for computing fixpoints of galois connections. Ann Math Artif Intell 59(2):257–272

    Article  MathSciNet  MATH  Google Scholar 

  27. Zou L, He T, Dai J (2022) A new parallel algorithm for computing formal concepts based on two parallel stages. Inf Sci 586:514–524

    Article  Google Scholar 

  28. Zou L, Chen X, He T, Dai J (2022) Computing formal concepts in parallel via a workload rebalance approach. Int J Mach Learn Cybern 13:2837

    Article  Google Scholar 

  29. Xu B, de Fréin R, Robson E, Foghlú MÓ (2012) Distributed formal concept analysis algorithms based on an iterative MapReduce framework. In: International Conference on Formal Concept Analysis, pp 292–308. Springer

  30. Yoshizoe K, Terada A, Tsuda K (2015) Redesigning pattern mining algorithms for supercomputers. ar**v preprint ar**v:1510.07787

  31. Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, Von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not 40(10):519–538

    Article  Google Scholar 

  32. Leroy V, Kirchgessner M, Termier A, Amer-Yahia S (2017) TopPI: an efficient algorithm for item-centric mining. Inf Syst 64:104–118

    Article  Google Scholar 

  33. Goel S, Broder A, Gabrilovich E, Pang B (2010) Anatomy of the long tail: ordinary people with extraordinary tastes. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp 201–210

  34. Packiaraj M, Kailasam S (2022) Hypar-fca: a distributed framework based on hybrid partitioning for fca. J Supercomput 78(10):12589–12620

    Article  Google Scholar 

  35. Krajca P, Vychodil V (2009) Distributed algorithm for computing formal concepts using map-reduce framework. In: International Symposium on Intelligent Data Analysis, pp 333–344. Springer

  36. Chunduri RK, Cherukuri AK (2019) Scalable formal concept analysis algorithms for large datasets using spark. J Ambient Intell Humaniz Comput 10(11):4283–4303

    Article  Google Scholar 

  37. Venkataraman S, Yang Z, Liu D, Liang E, Falaki H, Meng X, **n R, Ghodsi A, Franklin M, Stoica I et al (2016) Sparkr: scaling r programs with spark. In: Proceedings of the 2016 International Conference on Management of Data, pp 1099–1104

  38. Muneeswaran P, Jyoti, Kailasam S (2020) A hybrid partitioning strategy for distributed FCA. In: Proceedings of the Fifthteenth International Conference on Concept Lattices and their Applications, Tallinn, Estonia, June 29–July 1, 2020, pp 71–82

  39. “FIMI repository.” http://fimi.cs.helsinki.fi/. [Online, accessed 01 Aug 2022]

  40. “SPMF repository.” https://www.philippe-fournier-viger.com/spmf/. [Online, accessed 01 Aug 2022]

  41. Al-Dhuraibi Y, Paraiso F, Djarallah N, Merle P (2017) Elasticity in cloud computing: state of the art and research challenges. IEEE Trans Serv Comput 11(2):430–447

    Article  Google Scholar 

  42. da Rosa Righi R, Rodrigues VF, Da Costa CA, Galante G, De Bona LCE, Ferreto T (2015) Autoelastic: automatic resource elasticity for high performance applications in the cloud. IEEE Trans Cloud Comput 4(1):6–19

    Article  Google Scholar 

  43. Kehrer S, Blochinger W (2020) Equilibrium: an elasticity controller for parallel tree search in the cloud. J Supercomput 76(11):9211–9245

    Article  Google Scholar 

  44. Chen X, Zhu F, Chen Z, Min G, Zheng X, Rong C (2020) Resource allocation for cloud-based software services using prediction-enabled feedback control with reinforcement learning. IEEE Trans Cloud Comput 10:1117

    Article  Google Scholar 

  45. Lemire D, Kaser O, Kurz N, Deri L, O’Hara C, Saint-Jacques F, Ssiyankai G (2018) Roaring bitmaps: Implementation of an optimized software library. Softw Pract Exp 48(4):867–895

    Article  Google Scholar 

  46. Panigrahy K, Talwar K, Uyeda L, Wieder U (2011) Heuristics for vector bin packing. research. microsoft. com

  47. Vazirani V (2001) Approximation algorithms. Springer-verlag, New York

  48. Apache Kafka http://kafka.apache.org/. [Online, accessed 01 Aug 2022]

  49. Apache ZooKeeper—Home. https://zookeeper.apache.org/. [Online, accessed 01 Aug 2022]

  50. Welcome to Apache Hadoop. https://hadoop.apache.org/. [Online, accessed 01 Aug 2022]

  51. Chunduri RK, Cherukuri AK (2018) Haloop approach for concept generation in formal concept analysis. J Inf Knowl Manag 17(03):1850029

    Article  Google Scholar 

Download references

Funding

This work is partially funded by SPARC, a Govt. of India Initiative under Grant No. SPARC/2018-2019/P682/SL.

Author information

Authors and Affiliations

Authors

Contributions

MP and SK contributed to the design and implementation of the research, to the analysis of the results, and the writing of the manuscript. SK supervised the project.

Corresponding author

Correspondence to Muneeswaran Packiaraj.

Ethics declarations

Conflict of interest

None, we declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

There were no human subjects in this manuscript, and informed consent is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by SPARC, a Govt. of India Initiative under Grant No. SPARC/2018-2019/P682/SL.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Packiaraj, M., Kailasam, S. HyPar-FCA+: an improved workload-aware elastic framework for FCA. J Supercomput 79, 11767–11796 (2023). https://doi.org/10.1007/s11227-023-05116-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05116-3

Keywords

Navigation