Abstract
Semantic models of data sources and services provide support to automate many tasks such as source discovery, data integration, and service composition, but writing these semantic descriptions by hand is a tedious and time-consuming task. Most of the related work focuses on automatic annotation with classes or properties of source attributes or input and output parameters. However, constructing a source model that includes the relationships between the attributes in addition to their semantic types remains a largely unsolved problem. In this paper, we present a graph-based approach to hypothesize a rich semantic description of a new target source from a set of known sources that have been modeled over the same domain ontology. We exploit the domain ontology and the known source models to build a graph that represents the space of plausible source descriptions. Then, we compute the top k candidates and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic descriptions of future data sources. Our evaluation shows that our method produces models that are twice as accurate than the models produced using a state of the art system that does not learn from prior models.
Chapter PDF
Similar content being viewed by others
References
Alexe, B., ten Cate, B., Kolaitis, P.G., Tan, W.C.: Designing and Refining Schema Map**s via Data Examples. In: SIGMOD, Athens, Greece, pp. 133–144 (2011)
An, Y., Borgida, A., Miller, R.J., Mylopoulos, J.: A Semantic Approach to Discovering Schema Map** Expressions. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, pp. 206–215 (2007)
Arenas, M., Barcelo, P., Libkin, L., Murlak, F.: Relational and XML Data Exchange. Morgan & Claypool, San Rafael (2010)
Bellahsene, Z., Bonifati, A., Rahm, E.: Schema Matching and Map**, 1st edn. Springer (2011)
Carman, M.J., Knoblock, C.A.: Learning Semantic Definitions of Online Information Sources. Journal of Artificial Intelligence Research 30(1), 1–50 (2007)
Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Map** Language. W3C Recommendation (September 27, 2012), http://www.w3.org/TR/r2rml/
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: iMAP: Discovering Complex Semantic Matches between Database Schemas. In: International Conference on Management of Data (SIGMOD), New York, NY, pp. 383–394 (2004)
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Morgan Kauffman (2012)
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Heidelberg (2007)
Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema Map** Creation and Data Exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data Exchange: Semantics and Query Answering. Theoretical Computer Science 336(1), 89–124 (2005)
Goel, A., Knoblock, C.A., Lerman, K.: Exploiting Structure within Data for Accurate Labeling Using Conditional Random Fields. In: Proc. ICAI (2012)
Knoblock, C.A., et al.: Semi-Automatically Map** Structured Sources into the Semantic Web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012)
Kou, L., Markowsky, G., Berman, L.: A Fast Algorithm for Steiner Trees. Acta Informatica 15, 141–145 (1981)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th International Conference on Machine Learning (2001)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and Searching Web Tables Using Entities, Types and Relationships. PVLDB 3(1), 1338–1347 (2010)
Marnette, B., Mecca, G., Papotti, P., Raunich, S., Santoro, D.: ++Spicy: an OpenSource Tool for Second-Generation Schema Map** and Data Exchange. In: Procs. VLDB, Seattle, WA, pp. 1438–1441 (2011)
Mulwad, V., Finin, T., Joshi, A.: A Domain Independent Framework for Extracting Linked Semantic Data from Tables. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 16–33. Springer, Heidelberg (2012)
Parundekar, R., Knoblock, C.A., Ambite, J.L.: Discovering Concept Coverings in Ontologies of Linked Data Sources. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 427–443. Springer, Heidelberg (2012)
Rahm, E., Bernstein, P.: A Survey of Approaches to Automatic Schema Matching. VLDB Journal 10(4) (2001)
Saquicela, V., Vilches-Blazquez, L.M., Corcho, O.: Lightweight Semantic Annotation of Geospatial RESTful Services. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 330–344. Springer, Heidelberg (2011)
Sheth, A.P., Gomadam, K., Ranabahu, A.: Semantics Enhanced Services: METEOR-S, SAWSDL and SA-REST. IEEE Data Eng. Bulletin 31(3), 8–12 (2008)
Szekely, P., Knoblock, C.A., Yang, F., Zhu, X., Fink, E.E., Allen, R., Goodlander, G.: Connecting the Smithsonian American Art Museum to the Linked Data Cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 593–607. Springer, Heidelberg (2013)
Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: Rapidly Integrating Services into the Linked Data Cloud. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 559–574. Springer, Heidelberg (2012)
Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering Semantics of Tables on the Web. Proc. VLDB Endow. 4(9), 528–538 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L. (2013). A Graph-Based Approach to Learn Semantic Descriptions of Data Sources. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8218. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41335-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-41335-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41334-6
Online ISBN: 978-3-642-41335-3
eBook Packages: Computer ScienceComputer Science (R0)