Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale

  • Conference paper
  • First Online:
Computer Supported Cooperative Work and Social Computing (ChineseCSCW 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1681))

  • 420 Accesses

Abstract

Frequent Itemset Mining (FIM), as an effective means of discovering related information or knowledge, has high time and space complexity. However, in the era of big data, data shows distributed characteristics and dynamic growth, which brings greater challenges to frequent itemsets mining. Considering that the data in the practical application field usually involves different concept hierarchies and granularity, the multi-scale concept is introduced into the incremental mining process of frequent itemsets to avoid the huge overhead of rescanning the dataset and adjusting the tree structure in the maintenance process. Simultaneously, in order to effectively deal with large-scale and massive data, a memory-effective parallel incremental FIM algorithm is proposed based on Spark parallel computing platform, which can ensure the load balance of the node calculation as much as possible by estimating the load of each group. And in the RDD caching strategy of the parallel algorithm, factors such as RDD access frequency and cost are comprehensively considered to reduce the memory occupancy rate and the recalculation of RDDs with high computational cost. Extensive experimental results verify that the memory-effective parallel algorithm has good scalability and high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dessokey, M., Saif, S.M., Salem, S., Saad, E., Eldeeb, H.: Memory management approaches in apache spark: a review. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 (2020)

    Google Scholar 

  2. Enders, T., Martin, D., Sehgal, G., Schüritz, R.: Igniting the spark: overcoming organizational change resistance to advance innovation adoption - the case of data-driven services, pp. 217–230 (Jan 2020). https://doi.org/10.1007/978-3-030-38724-2_16

  3. Huynh, V.Q.P., Küng, J., Dang, T.: A parallel incremental frequent itemsets mining IFIN+: improvement and extensive evaluation: special issue on data and security engineering, pp. 78–106 (Jan 2019). https://doi.org/10.1007/978-3-662-58808-6_4

  4. Huynh, V., Küng, J., Jger, M., Dang, T.K.: IFIN+: a parallel incremental frequent itemsets mining in shared-memory environment. In: International Conference on Future Data and Security Engineering, pp. 121–138 (2017)

    Google Scholar 

  5. Inagaki, H., Fujii, T., Kawashima, R., Matsuo, H.: Adaptive control of apache spark’s data caching mechanism based on workload characteristics, pp. 64–69 (Aug 2018). https://doi.org/10.1109/W-FiCloud.2018.00016

  6. Jiang, Z., Chen, H., Zhou, H., Wu, J.: An elastic data persisting solution with high performance for spark. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 656–661. IEEE Computer Society, Los Alamitos, CA, USA (Dec 2015). https://doi.org/10.1109/SmartCity.2015.144, https://doi.ieeecomputersociety.org/10.1109/SmartCity.2015.144

  7. Koliopoulos, A.K., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: Towards automatic memory tuning for in-memory big data analytics in clusters. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 353–356 (2016). https://doi.org/10.1109/BigDataCongress.2016.56

  8. Li, H., Ghodsi, A., Zaharia, M., Baldeschwieler, E., Shenker, S., Stoica, I.: Tachyon: memory throughput I/O for cluster computing frameworks. In: Proceedings of the 27th IEEE Conference on SYSTEM-ON-CHIP. Las Vegas, NV, pp. 1–15. IEEE (2014)

    Google Scholar 

  9. Lv, D.T., Fu, B., Sun, X., Qiu, H., Liu, X.: Efficient fast updated frequent pattern tree algorithm and its parallel implementation. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 970–974. IEEE (2017)

    Google Scholar 

  10. Mao, Y., Deng, Q., Chen, Z.: Parallel association rules incremental mining algorithm based on information entropy and genetic algorithm. J. Commun. 42(5), 122–136 (2021)

    Google Scholar 

  11. Mengmeng, L., Shuliang, Z., Yuhui, H., Donghai, S., **aochao, L., Min, C.: Research on multi-scale data mining method. Res. Multi-Scale Data Min. Method 27(12), 3030–3050 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Park, S., Jeong, M., Han, H.: CCA: cost-capacity-aware caching for in-memory data analytics frameworks. Sensors 21, 2321 (2021). https://doi.org/10.3390/s21072321

  13. Sethi, K.K., Ramesh, D.: HFIM: a spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomputing 73(8), 3652–3668 (2017). https://doi.org/10.1007/s11227-017-1963-4

    Article  Google Scholar 

  14. Sun, J., Xun, Y., Zhang, J., Li, J.: Incremental frequent itemsets mining with FCFP tree. IEEE Access PP(99), 136511–136524 (2019)

    Google Scholar 

  15. Thurachon, W., Kreesuradej, W.: Incremental association rule mining with a fast incremental updating frequent pattern growth algorithm. IEEE Access PP(99), 55726–55741 (2021)

    Google Scholar 

  16. **ao, W., Hu, J.: SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming. J. Supercomputing 76(10), 7619–7634 (2020). https://doi.org/10.1007/s11227-020-03190-5

    Article  Google Scholar 

  17. Xu, L., Zhang, Y.: A novel parallel algorithm for frequent itemset mining of incremental dataset. In: International Conference on Information Science and Control Engineering, pp. 41–44 (2015)

    Google Scholar 

  18. Xu, Y., Liu, L., Ding, Z.: Dag-aware joint task scheduling and cache management in spark clusters, pp. 378–387 (May 2020). https://doi.org/10.1109/IPDPS47924.2020.00047

  19. Xun, Y., Cui, X., Zhang, J., Yin, Q.: Incremental frequent itemsets mining based on frequent pattern tree and multi-scale. Expert Syst. Appl. 163, 113805 (2020). https://doi.org/10.1016/j.eswa.2020.113805

  20. Youssef, N., Abd elkader, H., Abdelwahab, A.: Enhanced parallel mining algorithm for frequent sequential rules. Ain Shams Eng. J. 13(2), 1–11 (2021). https://doi.org/10.1016/j.asej.2021.05.019

  21. Yu, M., Zuo, C., Yuan, Y., Yang, Y.: An incremental algorithm for frequent itemset mining on spark, pp. 276–280 (03 2017). https://doi.org/10.1109/ICBDA.2017.8078823

  22. Zhao, Y., Huang, F., Wang, S., Yu, K., Zhang, C.: Incremental temporal frequent pattern mining based on spark streaming. In: 2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). vol. 2, pp. 22–27 (2020). https://doi.org/10.1109/IHMSC49165.2020.10084

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of P.R. China (No. 62272336, U1931209), Graduate student scientific research innovation projects of Shanxi Province, China (No. 2022Y699).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaling Xun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, L., Xun, Y., Zhang, J., Bi, H. (2023). Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1681. Springer, Singapore. https://doi.org/10.1007/978-981-99-2356-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2356-4_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2355-7

  • Online ISBN: 978-981-99-2356-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation