Log in

Generalizing the notion of confidence

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we explore extending association analysis to non-traditional types of patterns and non-binary data by generalizing the notion of confidence. We begin by describing a general framework that measures the strength of the connection between two association patterns by the extent to which the strength of one association pattern provides information about the strength of another. Although this framework can serve as the basis for designing or analyzing measures of association, the focus in this paper is to use the framework as the basis for extending the traditional concept of confidence to error-tolerant itemsets (ETIs) and continuous data. To that end, we provide two examples. First, we (1) describe an approach to defining confidence for ETIs that preserves the interpretation of confidence as an estimate of a conditional probability, and (2) show how association rules based on ETIs can have better coverage (at an equivalent confidence level) than rules based on traditional itemsets. Next, we derive a confidence measure for continuous data that agrees with the standard confidence measure when applied to binary transaction data. Further analysis of this result exposes some of the important issues involved in constructing a confidence measure for continuous data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Germany)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference (VLDB 94), Santiago, Chile pp 487–499

  3. Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 99), San Diego, CA, USA ACM Press, New York, NY, pp 261–270

    Chapter  Google Scholar 

  4. Banerjee A, Merugu S, Dhillon IS, Ghosh J (2004) Clustering with Bregman divergences. In: Proceedings of the 2004 SIAM international conference on data mining, Lake Buena Vista, FL pp 234–245

  5. Bollmann-Sdorra P, Hafez A, Raghavan VV (2001) A theoretical framework for association mining based on the boolean retrieval model. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery, third international conference, DaWaK 2001, Munich, Germany. Lecture notes in computer science, vol 2114. Springer, Berlin Heidelberg New York, pp 21–30

    Google Scholar 

  6. Elble J, Heeren C, Pitt L (2003) Optimized disjunctive association rules via sampling. In: ICDM ′03: Proceedings of the third IEEE international conference on data mining, IEEE computer society, Washington, DC p 43

    Chapter  Google Scholar 

  7. Goethals B, Zaki MJ (2003) Frequent itemset mining implementations repository. This site contains a wide-variety of algorithms for mining frequent, closed, and maximal itemsets, http://fimi.cs.helsinki.fi/

  8. Han E-H, Karypis G, Kumar V (1997) TR# 97-068: Min-apriori: an algorithm for finding association rules in data with continuous attributes. Technical report, Department of Computer Science, University of Minnesota, Minneapolis, MN

  9. Nanavati AA, Chitrapura KP, Joshi S, Krishnapuram R (2001) Mining generalised disjunctive association rules. In: CIKM ′01: Proceedings of the tenth international conference on information and knowledge management. ACM Press, New York, NY, pp 482–489

    Chapter  Google Scholar 

  10. Newman D, Hettich S, Blake C, Merz C (1998) UCI Repository of machine learning databases, http://www.ics.uci.edu/mlearn/MLRepository.html

  11. Okoniewski M, Gancarz L, Gawrysiak P (2003) Mining multi-dimensional quantitative associations. In: Bartenstein O, Geske U, Hannebauer M, Yoshie O (eds) Proceedings of the 14th international conference on applications of prolog, INAP 2001, Tokyo, Japan, LNCS, vol 2543. Springer, Berlin Heidelberg New York

    Google Scholar 

  12. Ozgur A, Tan P-N, Kumar V (2004) RBA: an integrated framework for regression based on association rules. In: Proceedings of the 2004 SIAM international conference on data mining, Lake Buena Vista, Florida, USA

  13. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Jagadish HV, Mumick IS (eds) Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, pp 1–12

  14. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, Uthurusamy R (eds) Proceedings of the 3rd international conference knowledge discovery and data mining, KDD. AAAI Press, Menlo Park, pp 67–73

    Google Scholar 

  15. Steinbach M, Kumar V (2005) Generalizing the notion of confidence. In: Proceedings of the 5th IEEE international conference on data mining (ICDM 2005), IEEE Computer Society, Houston, Texas, USA, pp 402–409

    Chapter  Google Scholar 

  16. Steinbach M, Tan P-N, **ong H, Kumar V (2004) Generalizing the notion of support. In: KDD ′04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 689–694

    Chapter  Google Scholar 

  17. Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: Workshop of artificial intelligence for web search (AAAI 2000). AAAI Press, Menlo Park, pp 58–64

    Google Scholar 

  18. Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 32–41

    Chapter  Google Scholar 

  19. Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313

    Article  Google Scholar 

  20. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Pearson Addison Wesley, Boston

    Google Scholar 

  21. Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, ACM Press, New York, pp 383–388

    Chapter  Google Scholar 

  22. Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2001), San Francisco, CA, USA, ACM Press, New York, pp 194–203

    Chapter  Google Scholar 

  23. Zaki MJ, Ogihara M (1998) Theoretical foundations of association rules. In: 3rd SIGMOD′98 workshop on research issues in data mining and knowledge discovery (DMKD). ACM Press, New York, pp 7:1–7:8

    Google Scholar 

  24. Zelenko D (1999) Optimizing disjunctive association rules. In: PKDD ′99: Proceedings of the third European conference on principles of data mining and knowledge discovery. Springer-Verlag, London, UK, pp 204–213

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Steinbach.

Additional information

Michael Steinbach earned the B.S. degree in mathematics, the M.S. degree in statistics, and the M.S. and Ph.D. degrees in computer science, all from the University of Minnesota. He also has held a variety of software engineering, analysis, and design positions in industry at Silicon Biology, Racotek, and NCR. Steinbach is currently a research associate in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. He is a co-author of the textbook,Introduction to Data Mining and has published numerous technical papers in peer-reviewed journals and conference proceedings. His research interests include data mining, statistics, and bioinformatics. He is a member of the IEEE and the ACM.

Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota. He received the B.E. degree in electronics and communication engineering from the University of Roorkee, India, in 1977, the M.E. degree in electronics engineering from Philips International Institute, Eindhoven, The Netherlands, in 1979, and the Ph.D. degree in computer science from the University of Maryland, College Park, in 1982. Kumar’s current research interests include high-performance computing and data mining. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES), graph partitioning (METIS, ParMetis, hMetis), and dense hierarchical solvers. He has authored over 200 research articles, and has coedited or coauthored 9 books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. Kumar has served as chair/co-chair for many conferences/workshops in the area of data mining and parallel computing, including theIEEE International Conference on Data Mining (2002) and the 15th International Parallel and Distributed Processing Symposium (2001). Currently, Kumar is the Chair of the steering committee of theSIAM International Conference on Data Mining, and a member of the steering committee of theIEEE International Conference on Data Mining. Kumar serves or has served on the editorial boards ofData Mining and Knowledge Discovery,Knowledge and Information Systems,IEEE Computational Intelligence Bulletin,Annual Review of Intelligent Informatics, Parallel Computing,Journal of Parallel and Distributed Computing,IEEE Transactions of Data and Knowledge Engineering (1993–1997),IEEE Concurrency (1997–2000), andIEEE Parallel and Distributed Technology (1995–1997). He is a Fellow of the ACM and IEEE and a member of SIAM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steinbach, M., Kumar, V. Generalizing the notion of confidence. Knowl Inf Syst 12, 279–299 (2007). https://doi.org/10.1007/s10115-006-0041-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0041-7

Keywords

Navigation