Generalizing the notion of confidence

Steinbach, Michael; Kumar, Vipin

doi:10.1007/s10115-006-0041-7

Generalizing the notion of confidence

Regular Paper
Published: 03 October 2006

Volume 12, pages 279–299, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Michael Steinbach¹ &
Vipin Kumar¹

152 Accesses
18 Citations
Explore all metrics

Abstract

In this paper, we explore extending association analysis to non-traditional types of patterns and non-binary data by generalizing the notion of confidence. We begin by describing a general framework that measures the strength of the connection between two association patterns by the extent to which the strength of one association pattern provides information about the strength of another. Although this framework can serve as the basis for designing or analyzing measures of association, the focus in this paper is to use the framework as the basis for extending the traditional concept of confidence to error-tolerant itemsets (ETIs) and continuous data. To that end, we provide two examples. First, we (1) describe an approach to defining confidence for ETIs that preserves the interpretation of confidence as an estimate of a conditional probability, and (2) show how association rules based on ETIs can have better coverage (at an equivalent confidence level) than rules based on traditional itemsets. Next, we derive a confidence measure for continuous data that agrees with the standard confidence measure when applied to binary transaction data. Further analysis of this result exposes some of the important issues involved in constructing a confidence measure for continuous data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

Sets of Robust Rules, and How to Find Them

Objectively Evaluating Interestingness Measures for Frequent Itemset Mining

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference (VLDB 94), Santiago, Chile pp 487–499
Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99), San Diego, CA, USA ACM Press, New York, NY, pp 261–270
Chapter Google Scholar
Banerjee A, Merugu S, Dhillon IS, Ghosh J (2004) Clustering with Bregman divergences. In: Proceedings of the 2004 SIAM international conference on data mining, Lake Buena Vista, FL pp 234–245
Bollmann-Sdorra P, Hafez A, Raghavan VV (2001) A theoretical framework for association mining based on the boolean retrieval model. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery, third international conference, DaWaK 2001, Munich, Germany. Lecture notes in computer science, vol 2114. Springer, Berlin Heidelberg New York, pp 21–30
Google Scholar
Elble J, Heeren C, Pitt L (2003) Optimized disjunctive association rules via sampling. In: ICDM ′03: Proceedings of the third IEEE international conference on data mining, IEEE computer society, Washington, DC p 43
Chapter Google Scholar
Goethals B, Zaki MJ (2003) Frequent itemset mining implementations repository. This site contains a wide-variety of algorithms for mining frequent, closed, and maximal itemsets, http://fimi.cs.helsinki.fi/
Han E-H, Karypis G, Kumar V (1997) TR# 97-068: Min-apriori: an algorithm for finding association rules in data with continuous attributes. Technical report, Department of Computer Science, University of Minnesota, Minneapolis, MN
Nanavati AA, Chitrapura KP, Joshi S, Krishnapuram R (2001) Mining generalised disjunctive association rules. In: CIKM ′01: Proceedings of the tenth international conference on information and knowledge management. ACM Press, New York, NY, pp 482–489
Chapter Google Scholar
Newman D, Hettich S, Blake C, Merz C (1998) UCI Repository of machine learning databases, http://www.ics.uci.edu/mlearn/MLRepository.html
Okoniewski M, Gancarz L, Gawrysiak P (2003) Mining multi-dimensional quantitative associations. In: Bartenstein O, Geske U, Hannebauer M, Yoshie O (eds) Proceedings of the 14th international conference on applications of prolog, INAP 2001, Tokyo, Japan, LNCS, vol 2543. Springer, Berlin Heidelberg New York
Google Scholar
Ozgur A, Tan P-N, Kumar V (2004) RBA: an integrated framework for regression based on association rules. In: Proceedings of the 2004 SIAM international conference on data mining, Lake Buena Vista, Florida, USA
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Jagadish HV, Mumick IS (eds) Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, pp 1–12
Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds) Proceedings of the 3rd international conference knowledge discovery and data mining, KDD. AAAI Press, Menlo Park, pp 67–73
Google Scholar
Steinbach M, Kumar V (2005) Generalizing the notion of confidence. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005), IEEE Computer Society, Houston, Texas, USA, pp 402–409
Chapter Google Scholar
Steinbach M, Tan P-N, **ong H, Kumar V (2004) Generalizing the notion of support. In: KDD ′04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 689–694
Chapter Google Scholar
Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: Workshop of artificial intelligence for web search (AAAI 2000). AAAI Press, Menlo Park, pp 58–64
Google Scholar
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 32–41
Chapter Google Scholar
Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313
Article Google Scholar
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Pearson Addison Wesley, Boston
Google Scholar
Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, ACM Press, New York, pp 383–388
Chapter Google Scholar
Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, ACM Press, New York, pp 194–203
Chapter Google Scholar
Zaki MJ, Ogihara M (1998) Theoretical foundations of association rules. In: 3rd SIGMOD′98 workshop on research issues in data mining and knowledge discovery (DMKD). ACM Press, New York, pp 7:1–7:8
Google Scholar
Zelenko D (1999) Optimizing disjunctive association rules. In: PKDD ′99: Proceedings of the third European conference on principles of data mining and knowledge discovery. Springer-Verlag, London, UK, pp 204–213
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union Street SE, Minneapolis, MN, 55455, USA
Michael Steinbach & Vipin Kumar

Authors

Michael Steinbach
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Steinbach.

Additional information

Michael Steinbach earned the B.S. degree in mathematics, the M.S. degree in statistics, and the M.S. and Ph.D. degrees in computer science, all from the University of Minnesota. He also has held a variety of software engineering, analysis, and design positions in industry at Silicon Biology, Racotek, and NCR. Steinbach is currently a research associate in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. He is a co-author of the textbook,Introduction to Data Mining and has published numerous technical papers in peer-reviewed journals and conference proceedings. His research interests include data mining, statistics, and bioinformatics. He is a member of the IEEE and the ACM.

Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota. He received the B.E. degree in electronics and communication engineering from the University of Roorkee, India, in 1977, the M.E. degree in electronics engineering from Philips International Institute, Eindhoven, The Netherlands, in 1979, and the Ph.D. degree in computer science from the University of Maryland, College Park, in 1982. Kumar’s current research interests include high-performance computing and data mining. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES), graph partitioning (METIS, ParMetis, hMetis), and dense hierarchical solvers. He has authored over 200 research articles, and has coedited or coauthored 9 books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. Kumar has served as chair/co-chair for many conferences/workshops in the area of data mining and parallel computing, including theIEEE International Conference on Data Mining (2002) and the 15th International Parallel and Distributed Processing Symposium (2001). Currently, Kumar is the Chair of the steering committee of theSIAM International Conference on Data Mining, and a member of the steering committee of theIEEE International Conference on Data Mining. Kumar serves or has served on the editorial boards ofData Mining and Knowledge Discovery,Knowledge and Information Systems,IEEE Computational Intelligence Bulletin,Annual Review of Intelligent Informatics, Parallel Computing,Journal of Parallel and Distributed Computing,IEEE Transactions of Data and Knowledge Engineering (1993–1997),IEEE Concurrency (1997–2000), andIEEE Parallel and Distributed Technology (1995–1997). He is a Fellow of the ACM and IEEE and a member of SIAM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steinbach, M., Kumar, V. Generalizing the notion of confidence. Knowl Inf Syst 12, 279–299 (2007). https://doi.org/10.1007/s10115-006-0041-7

Download citation

Received: 30 November 2005
Revised: 04 February 2006
Accepted: 01 April 2006
Published: 03 October 2006
Issue Date: August 2007
DOI: https://doi.org/10.1007/s10115-006-0041-7

Keywords

Access this article

Log in via an institution

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Generalizing the notion of confidence

Abstract

Access this article

Similar content being viewed by others

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

Sets of Robust Rules, and How to Find Them

Objectively Evaluating Interestingness Measures for Frequent Itemset Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalizing the notion of confidence

Abstract

Access this article

Similar content being viewed by others

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

Sets of Robust Rules, and How to Find Them

Objectively Evaluating Interestingness Measures for Frequent Itemset Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation