Abstract
We present an information-theoretic framework for mining dependencies between itemsets in binary data. The problem of closure-based redundancy in this context is theoretically investigated, and we present both lossless and lossy pruning techniques. An efficient and scalable algorithm is proposed, which exploits the inclusion-exclusion principle for fast entropy computation. This algorithm is empirically evaluated through experiments on synthetic and real-world data.
An extended version of this paper is available as a technical report [1].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mampaey, M.: Mining non-redundant information-theoretic dependencies between itemsets. Technical Report, University of Antwerp (2010)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2), 207–216 (1993)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. In: Proceedings of KDD (1997)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. ACM SIGMOD Record 25(2), 1–12 (1996)
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoretical Computer Science 149(1), 129–149 (1995)
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)
Zaki, M.J.: Generating non-redundant association rules. In: Proceedings of KDD, pp. 34–43 (2000)
Balcázar, J.L.: Minimum-size bases of association rules. In: Proceedings of ECML PKDD, pp. 86–101 (2008)
Dalkilic, M.M., Robertson, E.L.: Information dependencies. In: Proceedings of ACM PODS, pp. 245–253 (2000)
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikäinen, T., Seppänen, J.K.: Finding low-entropy sets and trees from binary data. In: Proceedings of KDD, pp. 350–359 (2007)
Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 135–147. Springer, Heidelberg (2002)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)
Bayardo Jr., R.: Efficiently mining long patterns from databases. In: Proceedings of ACM SIGMOD, pp. 85–93 (1998)
Gouda, K., Zaki, M.: Efficiently mining maximal frequent itemsets. In: Proceedings of IEEE ICDM, pp. 163–170 (2001)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT, pp. 398–416 (1999)
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Mining and Knowledge Discovery 14(1), 171–206 (2007)
Calders, T., Goethals, B.: Quick inclusion-exclusion. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 86–103. Springer, Heidelberg (2006)
Goethals, B.: Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/data
Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE homepage, http://www.cs.helsinki.fi/research/fdk/datamining/tane
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mampaey, M. (2010). Mining Non-redundant Information-Theoretic Dependencies between Itemsets. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15105-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15104-0
Online ISBN: 978-3-642-15105-7
eBook Packages: Computer ScienceComputer Science (R0)