Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale

Wang, Linqing; Xun, Yaling; Zhang, Jifu; Bi, Huimin

doi:10.1007/978-981-99-2356-4_22

Linqing Wang¹³,
Yaling Xun¹³,
Jifu Zhang¹³ &
…
Huimin Bi¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1681))

Included in the following conference series:

CCF Conference on Computer Supported Cooperative Work and Social Computing

420 Accesses

Abstract

Frequent Itemset Mining (FIM), as an effective means of discovering related information or knowledge, has high time and space complexity. However, in the era of big data, data shows distributed characteristics and dynamic growth, which brings greater challenges to frequent itemsets mining. Considering that the data in the practical application field usually involves different concept hierarchies and granularity, the multi-scale concept is introduced into the incremental mining process of frequent itemsets to avoid the huge overhead of rescanning the dataset and adjusting the tree structure in the maintenance process. Simultaneously, in order to effectively deal with large-scale and massive data, a memory-effective parallel incremental FIM algorithm is proposed based on Spark parallel computing platform, which can ensure the load balance of the node calculation as much as possible by estimating the load of each group. And in the RDD caching strategy of the parallel algorithm, factors such as RDD access frequency and cost are comprehensively considered to reduce the memory occupancy rate and the recalculation of RDDs with high computational cost. Extensive experimental results verify that the memory-effective parallel algorithm has good scalability and high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MapFIM: Memory Aware Parallelized Frequent Itemset Mining in Very Large Datasets

Parallel Implementation of PrePost Algorithm Based on Spark for Big Data

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

References

Dessokey, M., Saif, S.M., Salem, S., Saad, E., Eldeeb, H.: Memory management approaches in apache spark: a review. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 (2020)
Google Scholar
Enders, T., Martin, D., Sehgal, G., Schüritz, R.: Igniting the spark: overcoming organizational change resistance to advance innovation adoption - the case of data-driven services, pp. 217–230 (Jan 2020). https://doi.org/10.1007/978-3-030-38724-2_16
Huynh, V.Q.P., Küng, J., Dang, T.: A parallel incremental frequent itemsets mining IFIN+: improvement and extensive evaluation: special issue on data and security engineering, pp. 78–106 (Jan 2019). https://doi.org/10.1007/978-3-662-58808-6_4
Huynh, V., Küng, J., Jger, M., Dang, T.K.: IFIN+: a parallel incremental frequent itemsets mining in shared-memory environment. In: International Conference on Future Data and Security Engineering, pp. 121–138 (2017)
Google Scholar
Inagaki, H., Fujii, T., Kawashima, R., Matsuo, H.: Adaptive control of apache spark’s data caching mechanism based on workload characteristics, pp. 64–69 (Aug 2018). https://doi.org/10.1109/W-FiCloud.2018.00016
Jiang, Z., Chen, H., Zhou, H., Wu, J.: An elastic data persisting solution with high performance for spark. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 656–661. IEEE Computer Society, Los Alamitos, CA, USA (Dec 2015). https://doi.org/10.1109/SmartCity.2015.144, https://doi.ieeecomputersociety.org/10.1109/SmartCity.2015.144
Koliopoulos, A.K., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: Towards automatic memory tuning for in-memory big data analytics in clusters. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 353–356 (2016). https://doi.org/10.1109/BigDataCongress.2016.56
Li, H., Ghodsi, A., Zaharia, M., Baldeschwieler, E., Shenker, S., Stoica, I.: Tachyon: memory throughput I/O for cluster computing frameworks. In: Proceedings of the 27th IEEE Conference on SYSTEM-ON-CHIP. Las Vegas, NV, pp. 1–15. IEEE (2014)
Google Scholar
Lv, D.T., Fu, B., Sun, X., Qiu, H., Liu, X.: Efficient fast updated frequent pattern tree algorithm and its parallel implementation. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 970–974. IEEE (2017)
Google Scholar
Mao, Y., Deng, Q., Chen, Z.: Parallel association rules incremental mining algorithm based on information entropy and genetic algorithm. J. Commun. 42(5), 122–136 (2021)
Google Scholar
Mengmeng, L., Shuliang, Z., Yuhui, H., Donghai, S., **aochao, L., Min, C.: Research on multi-scale data mining method. Res. Multi-Scale Data Min. Method 27(12), 3030–3050 (2016)
MathSciNet MATH Google Scholar
Park, S., Jeong, M., Han, H.: CCA: cost-capacity-aware caching for in-memory data analytics frameworks. Sensors 21, 2321 (2021). https://doi.org/10.3390/s21072321
Sethi, K.K., Ramesh, D.: HFIM: a spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomputing 73(8), 3652–3668 (2017). https://doi.org/10.1007/s11227-017-1963-4
Article Google Scholar
Sun, J., Xun, Y., Zhang, J., Li, J.: Incremental frequent itemsets mining with FCFP tree. IEEE Access PP(99), 136511–136524 (2019)
Google Scholar
Thurachon, W., Kreesuradej, W.: Incremental association rule mining with a fast incremental updating frequent pattern growth algorithm. IEEE Access PP(99), 55726–55741 (2021)
Google Scholar
**ao, W., Hu, J.: SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming. J. Supercomputing 76(10), 7619–7634 (2020). https://doi.org/10.1007/s11227-020-03190-5
Article Google Scholar
Xu, L., Zhang, Y.: A novel parallel algorithm for frequent itemset mining of incremental dataset. In: International Conference on Information Science and Control Engineering, pp. 41–44 (2015)
Google Scholar
Xu, Y., Liu, L., Ding, Z.: Dag-aware joint task scheduling and cache management in spark clusters, pp. 378–387 (May 2020). https://doi.org/10.1109/IPDPS47924.2020.00047
Xun, Y., Cui, X., Zhang, J., Yin, Q.: Incremental frequent itemsets mining based on frequent pattern tree and multi-scale. Expert Syst. Appl. 163, 113805 (2020). https://doi.org/10.1016/j.eswa.2020.113805
Youssef, N., Abd elkader, H., Abdelwahab, A.: Enhanced parallel mining algorithm for frequent sequential rules. Ain Shams Eng. J. 13(2), 1–11 (2021). https://doi.org/10.1016/j.asej.2021.05.019
Yu, M., Zuo, C., Yuan, Y., Yang, Y.: An incremental algorithm for frequent itemset mining on spark, pp. 276–280 (03 2017). https://doi.org/10.1109/ICBDA.2017.8078823
Zhao, Y., Huang, F., Wang, S., Yu, K., Zhang, C.: Incremental temporal frequent pattern mining based on spark streaming. In: 2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). vol. 2, pp. 22–27 (2020). https://doi.org/10.1109/IHMSC49165.2020.10084

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of P.R. China (No. 62272336, U1931209), Graduate student scientific research innovation projects of Shanxi Province, China (No. 2022Y699).

Author information

Authors and Affiliations

Taiyuan University of Science and Technology (TYUST), Taiyuan, 030024, Shanxi, China
Linqing Wang, Yaling Xun, Jifu Zhang & Huimin Bi

Authors

Linqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yaling Xun
View author publications
You can also search for this author in PubMed Google Scholar
Jifu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Bi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaling Xun .

Editor information

Editors and Affiliations

Shandong University, **an, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Taiyuan University of Science and Technology, Taiyuan, China
Yinzhang Guo
Shanxi Datong University, Datong, China
** Gao
Tongji University, Shanghai, China
Bowen Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Xun, Y., Zhang, J., Bi, H. (2023). Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1681. Springer, Singapore. https://doi.org/10.1007/978-981-99-2356-4_22

Download citation

DOI: https://doi.org/10.1007/978-981-99-2356-4_22
Published: 13 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2355-7
Online ISBN: 978-981-99-2356-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale

Abstract

Access this chapter

Similar content being viewed by others

MapFIM: Memory Aware Parallelized Frequent Itemset Mining in Very Large Datasets

Parallel Implementation of PrePost Algorithm Based on Spark for Big Data

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale

Abstract

Access this chapter

Similar content being viewed by others

MapFIM: Memory Aware Parallelized Frequent Itemset Mining in Very Large Datasets

Parallel Implementation of PrePost Algorithm Based on Spark for Big Data

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation