Abstract
When publishing papers, researchers in mathematics and related disciplines typically focus on the presentation, i.e. type-setting, of their ideas and provide little semantic information. This impedes the development of services that benefit from semantic information, such as semantic search and screen readers for vision-impaired researchers. As a remedy, there have been attempts to infer semantic data from already published papers using small programs that we call spotters. Unfortunately, there is no standardized format for semantic annotations and spotter authors typically invent their own format. This leads to two problems: i) there is no ecosystem of tools for common tasks like the visualization of results or the manual annotation of a gold standard, and ii) re-using, evaluating and combining results becomes very difficult.
In this paper, we address these issues by describing a standardized, flexible way to represent semantic annotations, using semantic web technologies and, in particular, the Web Annotation standard. Furthermore, we describe SpotterBase, a set of tools to help with processing the annotations and creating new ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Open source; code at https://github.com/jfschaefer/spotterbase.
- 2.
References
Aizawa, A., Kohlhase, M.: Mathematical information retrieval. In: Sakai, T., Oard, D.W., Kando, N. (eds.) Evaluating Information Retrieval and Access Tasks. TIRS, vol. 43, pp. 169–185. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5554-1_12
Asakura, T., Miyao, Y., Aizawa, A.: Building dataset for grounding of formulae - annotating coreference relations among math identifiers. In: Proceedings of the Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association, pp. 4851–4858 (2022). https://aclanthology.org/2022.lrec-1.519
Asakura, T., et al.: Miogatto: a math identifier-oriented grounding annotation tool. In: 13th MathUI Workshop at 14th Conference on Intelligent Computer Mathematics (MathUI 2021) (2021)
Brat rapid annotation tool. http://brat.nlplab.org. Accessed 06 Apr 2023
de Castilho, R.E., et al.: A web-based tool for the integrated annotation of semantic and syntactic structures. In: Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), Osaka, Japan: The COLING 2016 Organizing Committee, pp. 76–84 (2016). https://www.aclweb.org/anthology/W16-4011
Kohlhase, M., Müller, D.: System description: sTeX3 - a LATEX-based ecosystem for semantic/active mathematical documents. In: Buzzard, K., Kutsia, T. (eds.) CICM 2022. LNCS, vol. 13467, pp. 184–188. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16681-5_13
CoNLL-U Format. https://universaldependencies.org/format.html
Formal Abstracts. https://formalabstracts.github.io/. Accessed 15 Feb 2020
Ginev, D., et al.: KAT: an annotation tool for STEM documents. In: Kohlhase, A., Libbrecht, P. (eds.) Mathematical User Interfaces Workshop (2015). http://www.cermat.org/events/MathUI/15/proceedings/Lal-Kohlhase-Ginev_KAT_annotations_MathUI_15.pdf
Ginev, D.: arXMLiv:2020 dataset, an HTML5 conversion of ar**v.org. SIGMathLing - Special Interest Group on Math Linguistics (2020). https://sigmathling.kwarc.info/resources/arxmliv-dataset-2020/
Ginev, D., Miller, B.R.: Scientific Statement Classification over ar**v org. English. In: Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp. 1219–1226 (2020). https://aclanthology.org/2020.lrec-1.153
Hales, T., et al.: A formal proof of the Kepler conjecture. In: Forum of Mathematics, Pi, vol. 5 (2017). https://doi.org/10.1017/fmp.2017.1
Herman, I., et al.: RDF 1.1 Primer (Second Edition). Rich Structured Data Markup for Web Documents. W3CWorking Group Note. World Wide Web Consortium (W3C) (2013). http://www.w3.org/TR/rdfa-primer
Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3C Recommendation. World Wide Web Consortium (W3C) (2013). https://www.w3.org/TR/sparql11-query/
Hypothes.is. http://hypothes.is. Accessed 06 Apr 2023
JSON for Linking Data. https://json-ld.org/
Mansouri, B., et al.: Overview of ARQMath-3 (2022): third CLEF Lab on answer retrieval for questions on math. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 286–310. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_20
Bruce Miller. LaTeXML: A LATEX to XML Converter. http://dlmf.nist.gov/LaTeXML/. Accessed 22 Mar 2023
Rabenstein, U.: Meaning Extraction and Semantic Services in STEMDocuments - A case study on Quantity Expressions and Units. Master’s Thesis. Informatik, FAU Erlangen-Nürnberg (2017). https://gl.kwarc.info/supervision/MSc-archive/blob/master/2017/urabenstein/Rabenstein.pdf
World Wide Web Consortium (W3C), ed. Resource Description Framework (RDF). http://www.w3.org/RDF/. Accessed 05 Apr 2023
Rijgersberg, H., Van Assem, M., Top, J.: Ontology of units of measure and related concepts. Semant. Web 4(1), 3–13 (2013)
SIGMathLing - Special Interest Group on Maths Linguistics. http://sigmathling.kwarc.info. Accessed 07 Dec 2018
Wang, Q., et al.: Exploration of neural machine translation in autoformalization of mathematics in Mizar. In: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 85–98 (2020)
Web Annotation Ontology. https://www.w3.org/ns/oa
Web Annotation Working Group. https://www.w3.org/annotation/
XPath Reference (2010). http://www.w3.org/TR/xpath/. Accessed 05 Apr 2023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schaefer, J.F., Kohlhase, M. (2023). Towards an Annotation Standard for STEM Documents. In: Dubois, C., Kerber, M. (eds) Intelligent Computer Mathematics. CICM 2023. Lecture Notes in Computer Science(), vol 14101. Springer, Cham. https://doi.org/10.1007/978-3-031-42753-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-42753-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42752-7
Online ISBN: 978-3-031-42753-4
eBook Packages: Computer ScienceComputer Science (R0)