Abstract
In a data warehouse (DW) environment, when the operational environment does not posses or does not want to inform the data about the changes that occurred, controls have to be implemented to enable detection of these changes and to reflect them in the DW environment. The main scenarios are: i) the impossibility to instrument the DBMS (triggers, transaction log, stored procedures, replication, materialized views, old and new versions of data, etc) due to security policies, data property or performance issues; ii) the lack of instrumentation resources on the DBMS; iii) the use of legacy technologies such file systems or semi-structured data; iv) application proprietary databases and ERP systems. In another article [1], we presented the development and implementation of a technique that was derived for the comparison of database snapshots, where we use signatures to mark and detect changes. The technique is simple and can be applied to all four scenarios above. To prove the efficiency of our technique, in this article we do comparative performance tests between these approaches. We performed two benchmarks: the first one using synthetic data and the second one using the real data from a case study in the data warehouse project developed for Rio Sul Airlines, a regional aviation company belonging to the Brazil-based Varig group. We also describe the main approaches to solve the detection of changes in data origin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rocha, R. L. A., Cardoso, L. F., Souza, J. M., 2003, An Improved Approach in Data Warehousing ETLM Process for Detection of Changes in Data Origin. COPPE/UFRJ, Report No ES-593/03 http://www.cos.ufrj.br/publicacoes/reltec/es59303.pdf
Do, L., Drew, P., **, W., et al.: Issues in Develo** Very Large Databases. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 633–636 (August 1998)
Özsu, M.T., Valduriez, P.: Principles of Distributes Database Systems, 1st edn. Prentice Hall Inc., New Jersey (1991)
Zhuge, Y., Garcia-Molina, H., Hammer, J., et al.: View Maintenance in a Warehousing Environment. In: Proceedings of ACM SIGMOD International Conference on Management Data, San Jose, California, USA, pp. 316–327 (June 1995)
Zhuge, Y., Garcia-Molina, H., Wiener, J.L.: The Strobe Algorithms for Multi-Source Warehouse Consistency. In: Proceedings on Parallel and Distributed Information Systems, Miami Beach, Florida, USA, pp. 146–157 (December 1996)
Quass, D., Widom, J.: On-Line Warehouse View Maintenance. In: Proceedings of ACM SIGMOD International Conference on Management Data, Tucson, Arizona, USA, pp. 405–416 (May 1997)
Hull, R., Zhou, G.: Towards the Study of Performance Trade-offs Between Materialized and Virtual Integrated Views. In: Proc. Workshop on Materialized Views: Techniques and Applications (VIEWS 1996), Canada, pp. 91–102 (June 1996)
Quass, D., Gupta, A., Mumick, I.S., et al.: Making Views Self-Maintainable for Data Warehousing. In: Proceedings on Parallel and Distributed Information Systems, Miami Beach, Florida, USA, pp. 158–169 (December 1996)
Inmon, W.H., Kelley, C.: Rdb/VMS, develo** the data warehouse. QED Pub. Group, Boston (1993)
Labio, W.J., Yerneni, R., Garcia-Molina, H.: Shrinking the Warehouse Update Window. In: Proceedings of ACM SIGMOD International Conference on Management Data, Philadelphia, USA, pp. 383–394 (June 1999)
Widom, J., Ceri, S.: Active Databases Systems: Triggers and Rules for Advanced Database Processing, San Francisco, California, USA (1996)
Craig, R.S., Vivona, J.A., Berkovitch, D.: Microsoft data warehousing building distributed decision support systems. Wiley, New York (1999)
Widom, J.: Research Problems in Data Warehousing. In: Proceedings of ACM CIKM International Conference on Management Data, USA, pp. 25–30 (November 1995)
Hammer, J., Garcia-Molina, H., Widom, J., et al.: The Stanford Data Warehousing Project. IEEE Quarterly Bulletin on Data Engineering; Special Issue on Materialized Views and Data Warehousing 18(2), 41–48 (1995)
Chawathe, S.S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Proceedings of ACM SIGMOD International Conference on Management Data, Arizona, USA, pp. 26–37 (May1997)
Kimball, R.: Data Warehouse Toolkit. John Wiley & Sons, Inc., New York (1996)
Kimball, R.: The Data Warehouse Lifecycle Toolkit. In: Expert Methods for Designing, Develo**, and Deploying Data Warehouses. John Wiley & Sons, Inc., New York (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rocha, R.L.A., Cardoso, L.F., de Souza, J.M. (2003). Performance Tests in Data Warehousing ETLM Process for Detection of Changes in Data Origin. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2003. Lecture Notes in Computer Science, vol 2737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45228-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-45228-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40807-9
Online ISBN: 978-3-540-45228-7
eBook Packages: Springer Book Archive