Designing XML Schema Inference Algorithm forĀ Intra-enterprise Use

  • Conference paper
  • First Online:
Perspectives in Business Informatics Research (BIR 2022)

Abstract

The paper considers methods for automated XML schema inference for a collection of documents from the point of the integration and usage at the enterprise. Most existing algorithms work with certain XML data that is claimed to be without errors or inaccuracies in the collection. The paper analyzes the theoretical foundations for inferring XML schema for decision maker who is working with automatically or manually created heterogeneous data. An algorithm based on a probabilistic approach is supposed to work on any data and allows the decision maker to have alternatives with a certain confidence level when working with an XML schema inferred. As a result of our findings, we introduce xml.schema.inference application for inferring XML schemas for intra-enterprise use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 53.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abiteboul, S., Amsterdamer, Y., Deutch, D., Milo, T., Senellart, P.: Optimal probabilistic generators for xml corpora. In: BDA (Bases de donnƩes avancƩes), p. 20 (2011)

    Google ScholarĀ 

  2. Benedikt, M., et al.: Probabilistic XML via Markov chains. Proc. VLDB Endow. 3(1) (2010)

    Google ScholarĀ 

  3. Barbosa, D., Mignet, L., Veltri, P.: Studying the XML web: gathering statistics from an XML sample. World Wide Web 8(4), 413ā€“438 (2005)

    ArticleĀ  Google ScholarĀ 

  4. Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 998ā€“1009 (2007)

    Google ScholarĀ 

  5. Bex, G.J., Neven, F., Vansummeren, S.: SchemaScope: a system for inferring and cleaning XML schemas. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1259ā€“1262 (2008)

    Google ScholarĀ 

  6. Besedina, K.V.: Osobennosti yazyka razmetki xml. Eur. Res. 8(19), 51ā€“52 (2016)

    Google ScholarĀ 

  7. Borisenko, E.V.: XML-bazy dannykh v sisteme upravleniya elektronnym dokumentooborotom. Nauka i sovremennost, no. 4-2, pp. 248ā€“253 (2010)

    Google ScholarĀ 

  8. GĆ³mez, S.N. et al.: Findings from Two Decades of Research on Schema Discovery using a Systematic Literature Review. AMW (2018)

    Google ScholarĀ 

  9. Groz, B., et al.: Inference of Shape Expression Schemas Typed RDF Graphs. ar**v preprint ar**v:2107.04891 (2021)

  10. Janga, P., Davis, K.C.: Schema extraction and integration of heterogeneous XML document collections. In: Cuzzocrea, A., Maabout, S. (eds.) MEDI 2013. LNCS, vol. 8216, pp. 176ā€“187. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41366-7_15

    ChapterĀ  Google ScholarĀ 

  11. Kharlamov, E., Senellart, P.: Modeling, querying, and mining uncertain XML data. In: XML Data Mining: Models, Methods, and Applications, pp. 29ā€“52. IGI Global (2012)

    Google ScholarĀ 

  12. Kimelfeld, B., Senellart, P.: Probabilistic XML: models and complexity. In: Advances in Probabilistic Databases for Uncertain Information Management, pp. 39ā€“66 (2013)

    Google ScholarĀ 

  13. Klempa, M., et al.: JInfer: a framework for XML schema inference. Comput. J. 58(1), 134ā€“156 (2015)

    ArticleĀ  Google ScholarĀ 

  14. Li, Y., et al.: FlashSchema: achieving high quality XML schemas with powerful inference algorithms and large-scale schema data. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1962ā€“1965 (2020)

    Google ScholarĀ 

  15. Ma, Z., Zhao, Z., Yan, L.: Heterogeneous fuzzy XML data integration based on structural and semantic similarities. Fuzzy Sets Syst. 351, 64ā€“89 (2018)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  16. MlĆ½nkovĆ”, I.: XML Schema Inference: A Study. TechnickĆ” zprĆ”va, Charles University, Prague, Czech Republic (2008)

    Google ScholarĀ 

  17. MlĆ½nkovĆ”, I., NečaskĆ½, M.: Heuristic methods for inference of XML schemas: lessons learned and open issues. Informatica 24(4), 577ā€“602 (2013)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  18. Oliveira, A., et al.: An efficient similarity-based approach for comparing XML documents. Inf. Syst. 78, 40ā€“57 (2018)

    ArticleĀ  Google ScholarĀ 

  19. Oliveira, A., et al.: XChange: a semantic diff approach for XML documents. Inf. Syst. 94, 101610 (2020)

    ArticleĀ  Google ScholarĀ 

  20. Sahuguet, A.: Everything you ever wanted to know about DTDs, but were afraid to ask (extended abstract). In: Goos, G., Hartmanis, J., van Leeuwen, J., Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, pp. 171ā€“183. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45271-0_11

    ChapterĀ  Google ScholarĀ 

  21. SimĆ©on, J., Wadler, P.: The essence of XML. ACM SIGPLAN Not. 38(1), 1ā€“13 (2003)

    ArticleĀ  Google ScholarĀ 

  22. Van Keulen, M., De Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 459ā€“470 (2005)

    Google ScholarĀ 

  23. World Wide Web Consortium. XML Technology. http://www.w3.org/standards/xml/. Accessed 7 Jan 2022

  24. Liquid Technologies. XML Liquid Studio. http://www.liquid-technologies.com/. Accessed 18 May 2022

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Uraev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Uraev, D., Babkin, E. (2022). Designing XML Schema Inference Algorithm forĀ Intra-enterprise Use. In: Nazaruka, Ē., Sandkuhl, K., Seigerroth, U. (eds) Perspectives in Business Informatics Research. BIR 2022. Lecture Notes in Business Information Processing, vol 462. Springer, Cham. https://doi.org/10.1007/978-3-031-16947-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16947-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16946-5

  • Online ISBN: 978-3-031-16947-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation