Detecting Aggregate Incongruities in XML

Hsu, Wynne; Lau, Qiangfeng Peter; Lee, Mong Li

doi:10.1007/978-3-642-00887-0_54

Wynne Hsu¹⁹,
Qiangfeng Peter Lau¹⁹ &
Mong Li Lee¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5463))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1482 Accesses

Abstract

The problem of identifying deviating patterns in XML repositories has important applications in data cleaning, fraud detection, and stock market analysis. Current methods determine data discrepancies by assessing whether the data conforms to the expected distribution of its immediate neighborhood. This approach may miss interesting deviations involving aggregated information. For example, the average number of transactions of a particular bank account may be exceptionally high as compared to other accounts with similar profiles. Such incongruity could only be revealed through aggregating appropriate data and analyzing the aggregated results in the associated neighborhood. This neighborhood is implicitly encapsulated in the XML structure. In addition, the hierarchical nature of the XML structure reflects the different levels of abstractions in the real world. This work presents a framework that detects incongruities in aggregate information. It utilizes the inherent characteristics of the XML structure to systematically aggregate leaf-level data and propagate the aggregated information up the hierarchy. The aggregated information is analyzed using a novel method by first clustering similar data, then, assuming a statistical distribution and identifying aggregate incongruity within the clusters. Experiments results indicate that the proposed approach is effective in detecting interesting discrepancies in a real world bank data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 85.59; Price includes VAT (Germany)

Softcover Book: EUR 106.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Analysis of XML Data Integrity Using Multiple Digest Schemes

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Clustering XML documents by patterns

Article Open access 23 January 2015

References

Koh, J.L.Y., Li Lee, M., Hsu, W., Lam, K.-T.: Correlation-based detection of attribute outliers. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 164–175. Springer, Heidelberg (2007)
Chapter Google Scholar
Koh, J.L.Y., Lee, M., Hsu, W., Ang, W.T.: Correlation-based attribute outlier detection in XML. In: Proceedings of the 24th International Conference on Data Engineering, Cancun, Mexico, pp. 1522–1524 (2008)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231 (1996)
Google Scholar
Aggarwal, C., Yu, S.: An effective and efficient algorithm for high-dimensional outlier detection. The VLDB Journal 14(2), 211–221 (2005)
Article Google Scholar
Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: VLDB 1999: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 211–222. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Teng, C.M.: Polishing blemishes: Issues in data correction. IEEE Intelligent Systems 19(2), 34–39 (2004)
Article MathSciNet Google Scholar
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study of their impacts. Artif. Intell. Rev. 22(3), 177–210 (2004)
Article MATH Google Scholar
Low, W.L., Tok, W.H., Lee, M.L., Ling, T.W.: Data Cleaning and XML: The DBLP Experience. In: ICDE, p. 269. IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Puhlmann, S., Weis, M., Naumann, F.: XML duplicate detection using sorted neighborhoods. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 773–791. Springer, Heidelberg (2006)
Chapter Google Scholar
Weis, M., Naumann, F.: Dogmatix tracks down duplicates in XML. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 431–442. ACM Press, New York (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Singapore, Singapore
Wynne Hsu, Qiangfeng Peter Lau & Mong Li Lee

Authors

Wynne Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Qiangfeng Peter Lau
View author publications
You can also search for this author in PubMed Google Scholar
Mong Li Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, Brisbane, Australia
**aofang Zhou & Ke Deng &
Tokyo Institute of Technology, Graduate School of Information Science and Engineering, 2-12-1 Oh-Okayama Meguro-ku, 152-8552, Tokyo, Japan
Haruo Yokota
CSIRO, Castray Esplanade, TAS 7000, Hobart, Australia
Qing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hsu, W., Lau, Q.P., Lee, M.L. (2009). Detecting Aggregate Incongruities in XML. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00887-0_54

Download citation

DOI: https://doi.org/10.1007/978-3-642-00887-0_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00886-3
Online ISBN: 978-3-642-00887-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detecting Aggregate Incongruities in XML

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Analysis of XML Data Integrity Using Multiple Digest Schemes

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Clustering XML documents by patterns

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Detecting Aggregate Incongruities in XML

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Analysis of XML Data Integrity Using Multiple Digest Schemes

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Clustering XML documents by patterns

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation