Abstract
The paper considers methods for automated XML schema inference for a collection of documents from the point of the integration and usage at the enterprise. Most existing algorithms work with certain XML data that is claimed to be without errors or inaccuracies in the collection. The paper analyzes the theoretical foundations for inferring XML schema for decision maker who is working with automatically or manually created heterogeneous data. An algorithm based on a probabilistic approach is supposed to work on any data and allows the decision maker to have alternatives with a certain confidence level when working with an XML schema inferred. As a result of our findings, we introduce xml.schema.inference application for inferring XML schemas for intra-enterprise use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abiteboul, S., Amsterdamer, Y., Deutch, D., Milo, T., Senellart, P.: Optimal probabilistic generators for xml corpora. In: BDA (Bases de donnƩes avancƩes), p. 20 (2011)
Benedikt, M., et al.: Probabilistic XML via Markov chains. Proc. VLDB Endow. 3(1) (2010)
Barbosa, D., Mignet, L., Veltri, P.: Studying the XML web: gathering statistics from an XML sample. World Wide Web 8(4), 413ā438 (2005)
Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 998ā1009 (2007)
Bex, G.J., Neven, F., Vansummeren, S.: SchemaScope: a system for inferring and cleaning XML schemas. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1259ā1262 (2008)
Besedina, K.V.: Osobennosti yazyka razmetki xml. Eur. Res. 8(19), 51ā52 (2016)
Borisenko, E.V.: XML-bazy dannykh v sisteme upravleniya elektronnym dokumentooborotom. Nauka i sovremennost, no. 4-2, pp. 248ā253 (2010)
GĆ³mez, S.N. et al.: Findings from Two Decades of Research on Schema Discovery using a Systematic Literature Review. AMW (2018)
Groz, B., et al.: Inference of Shape Expression Schemas Typed RDF Graphs. ar**v preprint ar**v:2107.04891 (2021)
Janga, P., Davis, K.C.: Schema extraction and integration of heterogeneous XML document collections. In: Cuzzocrea, A., Maabout, S. (eds.) MEDI 2013. LNCS, vol. 8216, pp. 176ā187. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41366-7_15
Kharlamov, E., Senellart, P.: Modeling, querying, and mining uncertain XML data. In: XML Data Mining: Models, Methods, and Applications, pp. 29ā52. IGI Global (2012)
Kimelfeld, B., Senellart, P.: Probabilistic XML: models and complexity. In: Advances in Probabilistic Databases for Uncertain Information Management, pp. 39ā66 (2013)
Klempa, M., et al.: JInfer: a framework for XML schema inference. Comput. J. 58(1), 134ā156 (2015)
Li, Y., et al.: FlashSchema: achieving high quality XML schemas with powerful inference algorithms and large-scale schema data. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1962ā1965 (2020)
Ma, Z., Zhao, Z., Yan, L.: Heterogeneous fuzzy XML data integration based on structural and semantic similarities. Fuzzy Sets Syst. 351, 64ā89 (2018)
MlĆ½nkovĆ”, I.: XML Schema Inference: A Study. TechnickĆ” zprĆ”va, Charles University, Prague, Czech Republic (2008)
MlĆ½nkovĆ”, I., NeÄaskĆ½, M.: Heuristic methods for inference of XML schemas: lessons learned and open issues. Informatica 24(4), 577ā602 (2013)
Oliveira, A., et al.: An efficient similarity-based approach for comparing XML documents. Inf. Syst. 78, 40ā57 (2018)
Oliveira, A., et al.: XChange: a semantic diff approach for XML documents. Inf. Syst. 94, 101610 (2020)
Sahuguet, A.: Everything you ever wanted to know about DTDs, but were afraid to ask (extended abstract). In: Goos, G., Hartmanis, J., van Leeuwen, J., Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 171ā183. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45271-0_11
SimĆ©on, J., Wadler, P.: The essence of XML. ACM SIGPLAN Not. 38(1), 1ā13 (2003)
Van Keulen, M., De Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 459ā470 (2005)
World Wide Web Consortium. XML Technology. http://www.w3.org/standards/xml/. Accessed 7 Jan 2022
Liquid Technologies. XML Liquid Studio. http://www.liquid-technologies.com/. Accessed 18 May 2022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Uraev, D., Babkin, E. (2022). Designing XML Schema Inference Algorithm forĀ Intra-enterprise Use. In: Nazaruka, Ä., Sandkuhl, K., Seigerroth, U. (eds) Perspectives in Business Informatics Research. BIR 2022. Lecture Notes in Business Information Processing, vol 462. Springer, Cham. https://doi.org/10.1007/978-3-031-16947-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-16947-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16946-5
Online ISBN: 978-3-031-16947-2
eBook Packages: Computer ScienceComputer Science (R0)