StatSpace: A Unified Platform for Statistical Data Exploration

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems: OTM 2016 Conferences (OTM 2016)

Abstract

In recent years, the amount of statistical data available on the web has been growing fast. Numerous organizations and governments publish data sets in a multitude of formats and encodings, using different scales, and providing access through a wide range of mechanisms. Due to such inconsistent publishing practices, integrated analysis of statistical data is challenging. StatSpace tackles this problem through semantic integration and provides uniform access to disparate statistical data. At present, it incorporates more than 1,800 data sets published by a variety of data providers including the World Bank, the European Union, and the European Environment Agency. StatSpace transparently lifts data from raw sources, maps geographical and temporal dimensions, aligns value ranges, and allows users to explore and integrate the previously isolated data sets. This paper introduces the constituent elements of the StatSpace architecture – i.e., a metadata repository, URI design patterns, and supporting services – and demonstrates the usefulness of the resulting Linked Data infrastructure by means of use case examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    e.g., http://data.worldbank.org/indicator, accessed August 30, 2016.

  2. 2.

    e.g., https://www.ons.gov.uk, accessed August 30, 2016.

  3. 3.

    http://data.europa.eu/euodp/en/linked-data, accessed August 30, 2016.

  4. 4.

    http://semantic.eea.europa.eu/sparql, accessed August 30, 2016.

  5. 5.

    http://data.cso.ie/query.html, accessed August 30, 2016.

  6. 6.

    Available at http://statspace.linkedwidgets.org.

  7. 7.

    https://sdmx.org/, accessed August 30, 2016.

  8. 8.

    statspace.linkedwidgets.org/metadata/ONS-Population-1851-2014, accessed August 30, 2016.

  9. 9.

    All prefixes used in this paper can be looked up at http://prefix.cc.

  10. 10.

    http://statspace.linkedwidgets.org/dimension/expenditure, accessed August 30, 2016.

  11. 11.

    http://statspace.linkedwidgets.org/dimension/economicActivity, accessed August 30, 2016.

  12. 12.

    http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistical-data, accessed August 30, 2016.

  13. 13.

    https://sdmx.org/?page_id=3215, accessed August 30, 2016.

  14. 14.

    The expenditure dimension consists of four code lists, i.e., classification of individual consumption by purpose (COICOP), classification of functions of government (COFOG), classification of purposes of non-profit institutions serving households (COPNI), and classification of outlays of producers by purpose (COPP).

  15. 15.

    For instance, the top concept in the age code list is total (i.e., http://statspace.linkedwidgets.org/codelist/cl_age/Total) which is split into various age groups such as 0–4, 5–9, ... , 105–109 (type: age-group), and special values, e.g., 70+, 75+, 80+ (type: age-plus). Each age group is split into individual ages (type: age-individual), and each special value is split into age groups.

  16. 16.

    http://databank.worldbank.org/data/download/site-content/WDI_CETS.xls, accessed August 30, 2016.

  17. 17.

    https://www.w3.org/TR/r2rml, accessed August 30, 2016.

  18. 18.

    http://statspace.linkedwidgets.org/map**/wb.ttl, accessed August 30, 2016.

  19. 19.

    https://developers.google.com/chart/, accessed August 30, 2016.

  20. 20.

    http://c3js.org/, accessed August 30, 2016.

  21. 21.

    http://wifo5-03.informatik.uni-mannheim.de/pubby/, accessed August 30, 2016.

  22. 22.

    http://statspace.linkedwidgets.org/compareDataSet?&id1=http://statspace.linkedwidgets.org/metadata/WorldBank-SL.UEM.TOTL.ZS&id2=http://statspace.linkedwidgets.org/metadata/WorldBank-NY.GDP.DEFL.KD.ZG, accessed August 30, 2016.

  23. 23.

    http://statspace.linkedwidgets.org/map**/wb.ttl, accessed August 30, 2016.

  24. 24.

    http://statspace.linkedwidgets.org/performance/, accessed August 30, 2016.

References

  1. Becker, K., Tan, X., Jahangiri, S., Knoblock, C.A.: Finding, assessing, and integrating statistical sources for data mining. In: Proceedings of Know@LOD 2015. CEUR (2015)

    Google Scholar 

  2. Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009)

    Article  Google Scholar 

  3. Capadisli, S., Auer, S., Ngonga Ngomo, A.C.: Linked SDMX data: path to high fidelity statistical linked data. Semantic Web 6(2), 105–112 (2015)

    Google Scholar 

  4. Cyganiak, R., Reynolds, D., Tennison, J.: The RDF data cube vocabulary (2014). https://www.w3.org/TR/vocab-data-cube/

  5. Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated rdf map**s of heterogeneous data. In: Proceedings of Workshop on Linked Data on the Web (LDOW) (2014)

    Google Scholar 

  6. Do, B.L.: Technical report - documentation of uri design and map** (2016). http://statspace.linkedwidgets.org/documentation.pdf

  7. Do, B.L., Aryan, P.R., Trinh, T.D., Wetz, P., Kiesling, E., Tjoa, A.M.: Toward a framework for statistical data integration. In: Proceedings of Workshop on Semantic Statistics (SemStats). CEUR (2015)

    Google Scholar 

  8. Do, B.L., Trinh, T.D., Aryan, P.R., Wetz, P., Kiesling, E., Tjoa, A.M.: Toward a statistical data integration environment: the role of semantic metadata. In: Proceedings of SEMANTICS Conference, pp. 25–32. ACM (2015)

    Google Scholar 

  9. Do, B.L., Trinh, T.D., Wetz, P., Anjomshoaa, A., Kiesling, E., Tjoa, A.M.: Widget-based exploration of linked statistical data spaces. In: Proceedings of Conference on Data Management Technologies and Applications (DATA). SciTePress (2014)

    Google Scholar 

  10. Kalampokis, E., Karamanou, A., Nikolov, A., Haase, P., Cyganiak, R., Roberts, B., Hermans, P., Tambouris, E., Tarabanis, K.: Creating and utilizing linked open statistical data for the development of advanced analytics services. In: Proceedings of Workshop on Semantic Statistics (SemStats). CEUR (2014)

    Google Scholar 

  11. Kalampokis, E., Roberts, B., Karamanou, A., Tambouris, E., Tarabanis, K.: Challenges on develo** tools for exploiting linked open data cubes. In: Proceedings of Workshop on Semantic Statistics (SemStats). CEUR (2015)

    Google Scholar 

  12. Kämpgen, B., Stadtmüller, S., Harth, A.: Querying the Global Cube: integration of multidimensional datasets from the web. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 250–265. Springer, Heidelberg (2014)

    Google Scholar 

  13. Kelly, D., Gyllstrom, K., Bailey, E.W.: A comparison of query and term suggestion features for interactive searching. In: Proceedings of ACM SIGIR Conference on Research and development in information retrieval, pp. 371–378. ACM (2009)

    Google Scholar 

  14. Meroño-Peñuela, A.: LSD Dimensions: use and reuse of linked statistical data. In: Lambrix, P., Hyvönen, E., Blomqvist, E., Presutti, V., Qi, G., Sattler, U., Ding, Y., Ghidini, C. (eds.) EKWA 2014 Satellite Events. LNCS, vol. 8982, pp. 159–163. Springer, Heidelberg (2015)

    Google Scholar 

  15. Mutlu, B., Hoefler, P., Tschinkel, G., Veas, E., Sabol, V., Stegmaier, F., Granitzer, M.: Suggesting visualisations for published data. In: Proceedings of Conference on Information Visualization Theory and Applications (IVAPP), pp. 267–275. IEEE (2014)

    Google Scholar 

  16. Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  17. Phillips, A.W.: The relation between unemployment and the rate of change of money wage rates in the United Kingdom, 1861–19571. Economica 25(100), 283–299 (1958)

    Google Scholar 

  18. Ruback, L., Manso, S., Salas, P.E.R., Pesce, M., Ortiga, S., Casanova, M.A.: A mediator for statistical linked data. In: Proceedings of Annual ACM Symposium on Applied Computing, pp. 339–341. ACM (2013)

    Google Scholar 

  19. Sabou, M., Arsal, I., Braşoveanu, A.M.: TourMISLOD: a tourism linked data set. Semant. Web 4(3), 271–276 (2013)

    Google Scholar 

  20. Salas, P.E.R., Martin, M., Da Mota, F.M., Auer, S., Breitman, K., Casanova, M.A.: Publishing statistical data on the web. In: Proceedings of International Conference on Semantic Computing (ICSC), pp. 285–292. IEEE (2012)

    Google Scholar 

  21. Schlegel, K., Stegmaier, F., Bayerl, S., Granitzer, M., Kosch, H.: Balloon fusion: SPARQL rewriting based on unified co-reference information. In: Proceedings of International Workshop on Data Engineering Meets the Semantic Web, pp. 254–259. IEEE (2014)

    Google Scholar 

  22. Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)

    Google Scholar 

  23. Trinh, T.D., Wetz, P., Do, B.L., Anjomshoaa, A., Kiesling, E., Tjoa, A.M.: Open linked widgets mashup platform. In: Proceedings of the AI Mashup Challenge 2014. CEUR (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ba-Lam Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Do, BL., Wetz, P., Kiesling, E., Aryan, P.R., Trinh, TD., Tjoa, A.M. (2016). StatSpace: A Unified Platform for Statistical Data Exploration. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48472-3_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48471-6

  • Online ISBN: 978-3-319-48472-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation