Log in

Tree-based partitioning of date for association rule mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal R, Aggarwal C, Prasad V (2000) Depth first generation of long patterns. In: Proceedings of the ACM KDD conference on management of data, Boston, pp 108–118

  2. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, Washington, DC, pp 207–216

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Santiago, Chile, pp 487–499

  4. Bayardo RJ (1998) Efficiently mining long pattern from databases. In: Proceedings of the ACM SIGMOD conference on management of data, pp 85–93

  5. Bayardo RJ, Agrawal R, Gunopulos D (1999) Constraint-based rule mining in large, dense databases. In: Proceedings of the 15th interantional conference on data engineering, pp 188–197

  6. Berry MJ, Linoff GS (1997) Data mining techniques for marketing, sales and customer support. Wiley, New York

    Google Scholar 

  7. Cheung W, Zaiane OR (2003) Incremental mining of frequent patterns without candidate generation or support constraint. In: Proceedings of seventh international database engineering and applications symposium (IDEAS'03), pp 111–116

  8. Coenen F, Goulbourne G, Leng P (2001) Computing association rules using partial totals. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery (PKDD'01). LNCS, vol 2168. Springer, Berlin Heidelberg New York, pp 54–66

    Google Scholar 

  9. Coenen F, Leng P (2001) Optimising association rule algorithms using itemset ordering. In: Bramer M, Coenen F, Preece A (eds) Research and development in intelligent systems XVIII (Proceedings of ES2001). Springer-Verlag, London, pp 53–66

    Google Scholar 

  10. El-Hajj M, Zaiane OR (2003) Non recursive generation of frequent K-itemsets from frequent pattern tree representations. In: Proceedings of 5th international conference on data warehousing and knowledge discovery (DaWaK 2003). LNCS, vol 2737. Springer-Verlag, Berlin Heidelberg New York, pp 371–380

    Google Scholar 

  11. El-Hajj M, Zaiane OR (2003) Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2003), ACM, pp 109–118

  12. Goulbourne G, Coenen F, Leng P (2000) Algorithms for computing association rules using a partial-support tree. J Knowl Based Syst 13:141–149 (also in Proceedings ES'99, December 1999. Springer, London, pp 132–147)

    Article  Google Scholar 

  13. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD conference on management of data, Dallas, pp 1–12

  14. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  15. Huang H, Wu X, Relue R (2002) Association analysis with one scan of databases. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM'02), pp 629–632

  16. Liu G, Lu H, Lou W, Yu J (2003) On computing, storing and querying frequent patterns. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2003), ACM, pp 607–612

  17. Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of ACM SIGMOD workshop on data mining and knowledge discovery, pp 11–20

  18. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21th VLDB conference, pp 432–444

  19. Toivonen H (1996) Sampling large databases for association rules. In: Proceedings of the 22th VLDB conference, pp 1–12

  20. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. Technical report 651, University of Rochester, Computer Science Department, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Leng.

Additional information

Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition.

Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05).

Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmed, S., Coenen, F. & Leng, P. Tree-based partitioning of date for association rule mining. Knowl Inf Syst 10, 315–331 (2006). https://doi.org/10.1007/s10115-006-0010-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0010-1

Keywords

Navigation