Abstract
Analyzing data from multiple sources is a common task in scientific research. In particular, spreadsheet data is often aggregated from a variety of sources to identify patterns and synthesize reports. Yet, techniques are lacking for automatically capturing the provenance of such data within spreadsheet environments like Excel. We present a novel approach for fine-grained tracing of tabular data that may have been obtained from files, databases, or the Web. Our approach provides relevant provenance information at both the micro-level (per cell) and the macro-level (per sheet). Initial results suggest that our approach is scalable and beneficial to data analysts.
Chapter PDF
Similar content being viewed by others
References
EXIF, http://www.exif.org/
Asuncion, H.U.: In Situ data provenance capture in spreadsheets. In: Proc. of the 7th International Conference on e-Science (2011)
Asuncion, H.U., Taylor, R.N.: Automated Techniques for Capturing Custom Traceability Links Across Heterogeneous Artifacts. In: Software and Systems Traceability, pp. 129–146. Springer (2012)
Benabdelkader, A., Santcroos, M., Madougou, S., van Kampen, A.H.C., Olabarriaga, S.D.: A provenance approach to trace scientific experiments on a grid infrastructure. In: Proc. of the 7th International Conference on e-Science (2011)
Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An annotation management system for relational databases. VLDB Journal 14 (2005)
Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-science experiments - experience from bioinformatics. In: The UK OST e-Science Second All Hands Meeting (2003)
Groth, P.: ProvenanceJS: Revealing the Provenance of Web Pages. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 283–285. Springer, Heidelberg (2010)
Hermans, F., Pinzger, M., van Deursen, A.: Supporting professional spreadsheet users by generating leveled dataflow diagrams. In: Proc. of ICSE (2011)
Jensen, C., Lonsdale, H., Wynn, E., Cao, J., Slater, M., Dietterich, T.G.: The life and times of files and information: A study of desktop provenance. In: Proc. of International Conf. on Human Factors in Computing Systems, pp. 767–776. ACM (2010)
Koop, D., Santos, E., Bauer, B., Troyer, M., Freire, J., Silva, C.T.: Bridging Workflow and Data Provenance Using Strong Links. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 397–415. Springer, Heidelberg (2010)
Malik, T., Gehani, A., Tariq, D., Zaffar, F.: Sketching Distributed Data Provenance. In: Liu, Q., Bai, Q., Giugni, S., Williamson, D., Taylor, J. (eds.) Data Provenance and Data Management in eScience. SCI, vol. 426, pp. 85–108. Springer, Heidelberg (2013)
Microsoft Corporation. MS Excel, http://office.microsoft.com/en-us/excel/
Olteanu, D., Zavodny, J.: On factorisation of provenance polynomials. In: USENIX Theory and Practice of Provenance (2011)
Osterweil, L.J., Clarke, L.A., Ellison, A.M., Boose, E., Podorozhny, R., Wise, A.: Clear and precise specification of ecological data management processes and dataset provenance. IEEE Trans. on Automation Science & Engr. 7, 189–195 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asuncion, H.U. (2012). SourceTrac: Tracing Data Sources within Spreadsheets. In: Groth, P., Frew, J. (eds) Provenance and Annotation of Data and Processes. IPAW 2012. Lecture Notes in Computer Science, vol 7525. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34222-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-34222-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34221-9
Online ISBN: 978-3-642-34222-6
eBook Packages: Computer ScienceComputer Science (R0)