Abstract
Frequent Itemset Mining (FIM), as an effective means of discovering related information or knowledge, has high time and space complexity. However, in the era of big data, data shows distributed characteristics and dynamic growth, which brings greater challenges to frequent itemsets mining. Considering that the data in the practical application field usually involves different concept hierarchies and granularity, the multi-scale concept is introduced into the incremental mining process of frequent itemsets to avoid the huge overhead of rescanning the dataset and adjusting the tree structure in the maintenance process. Simultaneously, in order to effectively deal with large-scale and massive data, a memory-effective parallel incremental FIM algorithm is proposed based on Spark parallel computing platform, which can ensure the load balance of the node calculation as much as possible by estimating the load of each group. And in the RDD caching strategy of the parallel algorithm, factors such as RDD access frequency and cost are comprehensively considered to reduce the memory occupancy rate and the recalculation of RDDs with high computational cost. Extensive experimental results verify that the memory-effective parallel algorithm has good scalability and high efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dessokey, M., Saif, S.M., Salem, S., Saad, E., Eldeeb, H.: Memory management approaches in apache spark: a review. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 (2020)
Enders, T., Martin, D., Sehgal, G., Schüritz, R.: Igniting the spark: overcoming organizational change resistance to advance innovation adoption - the case of data-driven services, pp. 217–230 (Jan 2020). https://doi.org/10.1007/978-3-030-38724-2_16
Huynh, V.Q.P., Küng, J., Dang, T.: A parallel incremental frequent itemsets mining IFIN+: improvement and extensive evaluation: special issue on data and security engineering, pp. 78–106 (Jan 2019). https://doi.org/10.1007/978-3-662-58808-6_4
Huynh, V., Küng, J., Jger, M., Dang, T.K.: IFIN+: a parallel incremental frequent itemsets mining in shared-memory environment. In: International Conference on Future Data and Security Engineering, pp. 121–138 (2017)
Inagaki, H., Fujii, T., Kawashima, R., Matsuo, H.: Adaptive control of apache spark’s data caching mechanism based on workload characteristics, pp. 64–69 (Aug 2018). https://doi.org/10.1109/W-FiCloud.2018.00016
Jiang, Z., Chen, H., Zhou, H., Wu, J.: An elastic data persisting solution with high performance for spark. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 656–661. IEEE Computer Society, Los Alamitos, CA, USA (Dec 2015). https://doi.org/10.1109/SmartCity.2015.144, https://doi.ieeecomputersociety.org/10.1109/SmartCity.2015.144
Koliopoulos, A.K., Yiapanis, P., Tekiner, F., Nenadic, G., Keane, J.: Towards automatic memory tuning for in-memory big data analytics in clusters. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 353–356 (2016). https://doi.org/10.1109/BigDataCongress.2016.56
Li, H., Ghodsi, A., Zaharia, M., Baldeschwieler, E., Shenker, S., Stoica, I.: Tachyon: memory throughput I/O for cluster computing frameworks. In: Proceedings of the 27th IEEE Conference on SYSTEM-ON-CHIP. Las Vegas, NV, pp. 1–15. IEEE (2014)
Lv, D.T., Fu, B., Sun, X., Qiu, H., Liu, X.: Efficient fast updated frequent pattern tree algorithm and its parallel implementation. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 970–974. IEEE (2017)
Mao, Y., Deng, Q., Chen, Z.: Parallel association rules incremental mining algorithm based on information entropy and genetic algorithm. J. Commun. 42(5), 122–136 (2021)
Mengmeng, L., Shuliang, Z., Yuhui, H., Donghai, S., **aochao, L., Min, C.: Research on multi-scale data mining method. Res. Multi-Scale Data Min. Method 27(12), 3030–3050 (2016)
Park, S., Jeong, M., Han, H.: CCA: cost-capacity-aware caching for in-memory data analytics frameworks. Sensors 21, 2321 (2021). https://doi.org/10.3390/s21072321
Sethi, K.K., Ramesh, D.: HFIM: a spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomputing 73(8), 3652–3668 (2017). https://doi.org/10.1007/s11227-017-1963-4
Sun, J., Xun, Y., Zhang, J., Li, J.: Incremental frequent itemsets mining with FCFP tree. IEEE Access PP(99), 136511–136524 (2019)
Thurachon, W., Kreesuradej, W.: Incremental association rule mining with a fast incremental updating frequent pattern growth algorithm. IEEE Access PP(99), 55726–55741 (2021)
**ao, W., Hu, J.: SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming. J. Supercomputing 76(10), 7619–7634 (2020). https://doi.org/10.1007/s11227-020-03190-5
Xu, L., Zhang, Y.: A novel parallel algorithm for frequent itemset mining of incremental dataset. In: International Conference on Information Science and Control Engineering, pp. 41–44 (2015)
Xu, Y., Liu, L., Ding, Z.: Dag-aware joint task scheduling and cache management in spark clusters, pp. 378–387 (May 2020). https://doi.org/10.1109/IPDPS47924.2020.00047
Xun, Y., Cui, X., Zhang, J., Yin, Q.: Incremental frequent itemsets mining based on frequent pattern tree and multi-scale. Expert Syst. Appl. 163, 113805 (2020). https://doi.org/10.1016/j.eswa.2020.113805
Youssef, N., Abd elkader, H., Abdelwahab, A.: Enhanced parallel mining algorithm for frequent sequential rules. Ain Shams Eng. J. 13(2), 1–11 (2021). https://doi.org/10.1016/j.asej.2021.05.019
Yu, M., Zuo, C., Yuan, Y., Yang, Y.: An incremental algorithm for frequent itemset mining on spark, pp. 276–280 (03 2017). https://doi.org/10.1109/ICBDA.2017.8078823
Zhao, Y., Huang, F., Wang, S., Yu, K., Zhang, C.: Incremental temporal frequent pattern mining based on spark streaming. In: 2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). vol. 2, pp. 22–27 (2020). https://doi.org/10.1109/IHMSC49165.2020.10084
Acknowledgments
This work is supported by the National Natural Science Foundation of P.R. China (No. 62272336, U1931209), Graduate student scientific research innovation projects of Shanxi Province, China (No. 2022Y699).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, L., Xun, Y., Zhang, J., Bi, H. (2023). Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1681. Springer, Singapore. https://doi.org/10.1007/978-981-99-2356-4_22
Download citation
DOI: https://doi.org/10.1007/978-981-99-2356-4_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2355-7
Online ISBN: 978-981-99-2356-4
eBook Packages: Computer ScienceComputer Science (R0)