Log in

Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper stems from the project A World of Possibilities. Modal pathways over an extra-long period of time: the diachrony of modality in the Latin language (WoPoss) which involves a corpus-based approach to the study of modality in the history of the Latin language. Linguistic annotation and, in particular, the semantic annotation of modality is a keystone of the project. Besides the difficulties intrinsic to any annotation task dealing with semantics, our annotation scheme involves multiple layers of annotation that are interconnected, adding complexity to the task. Considering the intricacies of our fine-grained semantic annotation, we needed to develop well-documented schemas in order to control the consistency of the annotation, but also to enable an efficient reuse of our annotated corpus. This paper presents the different elements involved in the annotation task, and how the description and the relations between the different linguistic components were formalised and documented, combining schema languages with XML documentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. It is worth specifying that the WoPoss team has contributed to this project by annotating some words and then discussing the annotation of modal markers. The annotation carried out in the framework of McGillivray et al. (2022) does not follow the WoPoss annotation schema.

  2. Cf. https://itreebank.marginalia.it/view/download.php.

  3. See https://www.ucm.es/euroevidmod/ (accessed 20 June 2023).

  4. See https://clmbr.shane.st/modal-typology/ (accessed 20 June 2023).

  5. The relationship between modality and evidentiality is debated. A proper discussion of this point is beyond the scope of this paper.

  6. See https://woposs.unine.ch/search (accessed 20 May 2023).

  7. The scores were calculated using the NLTK Python library. For further details about the formula implemented see https://www.nltk.org/_modules/nltk/metrics/agreement.html.

  8. Translation by T.R. Glover and Gerald H. Rendall (Tertullian & Minucius Felix, 1931, p. 321).

  9. REgular LAnguage for XML Next Generation (RELAX NG) is a schema language for XML (Murata, 2014).

  10. https://github.com/WoPoss-project/WoPoss-corpus.

  11. The element <w> is not exactly equivalent to the notion of token: for example in the case of in praesentiarum mentioned in Sect. 4.2 in relation to example 1, the adverb impraesentiarum is taken as unit of reference, thus we have one <w> element (with the corresponding @pos and @lemma attributes) with the content ‘in praesentiarum’.

  12. For examples of linguistic annotation implemented through FS see Bański and Przepiórkowski (2009); for applications in other domains, see Bermúdez Sabel (2020) for a philological implementation, or Triplette et al. (2018) for a literary one. For a discussion of the advantages of FS see Langendoen and Simons (1995), and Stegmann and Witt (2009).

  13. Omnem adeo mundum, si solem lunam reliqua astra desierit fontium dulcis aqua et aqua marina nutrire, in vim ignis abiturum, Stoicis constans opinio est, quod consumto umore mundus hic omnis ignescet. ‘So too the universe, if sun, moon and stars are deprived of the fountains of fresh water and the water of the seas, will disappear in a blaze of fire. The Stoics firmly maintain that when the moisture is dried out, the universe must all take fire’, translation by T.R. Glover and Gerald H. Rendall (Tertullian & Minucius Felix, 1931, p. 418).

  14. The negation of the modal scope is annotated as an independent component only when it is not inside the scope itself, that is, when there is a negation which is semantically affecting the state of affairs, but it is not syntactically in its scope.

  15. These results are based on the analysis of WoPoss Project (2022b), which, at the time of writing, included the following works: the epigraphic document known as the Senatus consultum (2nd c. BCE), the first book of the Metamorphoses by Ovid (1st c. BCE), Satyricon by Petronius (1st c.), De spectaculis by Tertullian (2nd c.), and Octavius by Minucius Felix (2nd c.).

  16. For the different functions see Schrickx (2011, p. 215); for an overview of the meanings of certe see Marongiu and Dell’Oro (2021).

  17. See the standard 24610-1 Language Resource Management—Feature Structures—Part One: Feature Structure Representation (ISO/TC 37/SC 4, 2007).

  18. Mixed content refers to the presence of both elements and text nodes as children of a given element.

  19. Schematron is a language for making assertions about the presence or absence of patterns in linked XML documents (Jelliffe, 2021).

References

  • Ávila, L. B., Mendes, A., & Hendrickx, I. (2015). Towards a unified approach to modality annotation in Portuguese. In Proceedings of the workshop on models for modality annotation. Retrieved April 7, 2022, from Association for Computational Linguistics. https://aclanthology.org/W15-0301

  • Baker, K., Bloodgood, M., Dorr, B. J., Filardo, N. W., Levin, L., & Piatko, C. (2014). A modality lexicon and its use in automatic tagging (ar**v:1410.4868). ar**v. https://doi.org/10.48550/ar**v.1410.4868

  • Bański, P. (2010). Why TEI stand-off annotation doesn’t quite work: And why you might want to use it nevertheless. In Proceedings of Balisage: The markup conference 2010 (Vol. 5). Presented at the Balisage: The markup conference 2010, Montréal, Canada. https://doi.org/10.4242/BalisageVol5.Banski01

  • Bański, P., & Przepiórkowski, A. (2009). Stand-off TEI annotation: The case of the National Corpus of Polish. In ACL-IJCNLP ‘09: Proceedings of the third linguistic annotation workshop (pp. 64–67). Presented at the third linguistic annotation workshop, Suntec, Singapore: Association for Computational Linguistics. https://doi.org/10.3115/1698381.1698392

  • Bermúdez Sabel, H. (2020). Encoding of variant taxonomies in TEI. Journal of the Text Encoding Initiative. https://doi.org/10.4000/jtei.2676

    Article  Google Scholar 

  • Bermúdez Sabel, H. (2022). FS-validator. XSLT. Retrieved April 13, 2022, from https://github.com/HelenaSabel/FS-Validator

  • Burnard, L. (2014). The structural organization of a TEI document. In What is the text encoding initiative?: How to add intelligent markup to digital resources. OpenEdition Press. Retrieved June 9, 2022, from http://books.openedition.org/oep/681

  • Clackson, J. & Horrocks, G. (2007). The Blackwell history of the Latin language. Oxford: Wiley-Blackwell.

  • Clackson, J. (ed.). (2011). A companion to the Latin language. Oxford: Wiley-Blackwell.

  • Celano, G. (2019). The dependency treebanks for ancient Greek and Latin. In Digital classical philology (pp. 279–298). https://doi.org/10.1515/9783110599572-016

  • Celano, G. G. A. (2021). Opera Latina Adnotata (OLA). Retrieved April 6, 2022, from http://ola.informatik.uni-leipzig.de/en/index.html

  • Coates, J. (1983). The semantics of the modal auxiliaries. Croom Helm.

    Google Scholar 

  • de Marneffe, M.-C., Manning, C. D., Nivre, J., & Zeman, D. (2021). Universal dependencies. Computational Linguistics, 47(2), 255–308. https://doi.org/10.1162/coli_a_00402

    Article  Google Scholar 

  • Dell’Oro, F. (2023). WoPoss guidelines for the annotation of modality. Revised version. Zenodo. https://doi.org/10.5281/zenodo.10427053

  • Du Cange, C. du F., Carpenter, P., Henschel, L. G. A., & Favre, L. (1883–1887 [1678]). Glossarium mediæ et infimæ latinitatis [Glossary of Middle and Low Latin]. Favre.

  • Gast, V., Bierkandt, L., & Rzymski, C. (2015). Annotating modals with GraphAnno, a configurable lightweight tool for multi-level annotation. In Proceedings of the workshop on models for modality annotation. Association for Computational Linguistics. https://aclanthology.org/W15-0303

  • Ghia, E., Kloppenburg, L., Nissim, M., Pietrandrea, P., & Cervoni, V. (2016). A construction-centered approach to the annotation of modality. In H. Bunt (Ed.), Proceedings of the 12th joint ACL-ISO workshop on interoperable semantic annotation (pp. 67–74). ACL, ISO.

  • Glare, P. G. W. (Ed.). (2012). Oxford Latin dictionary (2nd ed., Vols. 1–2). Oxford University Press.

  • Haug, D. T. T., Eckhoff, H. M., & Welo, E. (2014). The theoretical foundations of givenness annotation. In K. Bech & K. G. Eide (Eds.), Information structure and syntactic change in Germanic and Romance languages (pp. 17–52). John Benjamins.

    Chapter  Google Scholar 

  • Haug, D. T. T., & Jøhndal, M. L. (2008). Creating a parallel treebank of the old Indo-European Bible translations. In C. Sporleder, & K. Ribarov (Eds.), Proceedings of the second workshop on language technology for cultural heritage data (LaTeCH 2008) (pp. 27–34).

  • Ide, N. (1998). Corpus encoding standard: SGML guidelines for encoding linguistic Corpora. In Proceedings of the first international language resources and evaluation conference (pp. 463–470).

  • ISO/TC 37/SC 4. (2007). ISO 24610–1:2006, language resource managementFeature structuresPart 1: Feature structure representation. Distributed through American National Standards Institute.

  • Jelliffe, R. (2021). Schematron. Retrieved April 6, 2022, from https://www.schematron.com/home.html

  • Klie, J.-C., Bugert, M., Boullosa, B., Eckart de Castilho, R., & Gurevych, I. (2018). The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th international conference on computational linguistics: System demonstrations (pp. 5–9). Association for Computational Linguistics. http://tubiblio.ulb.tu-darmstadt.de/106270/

  • Kratzer, A. (1981). The notional category of modality. In The notional category of modality (pp. 38–74). De Gruyter. https://doi.org/10.1515/9783110842524-004

  • Krippendorff, D. K. (1980). Content analysis: An introduction to its methodology. Sage Publications Inc.

    Google Scholar 

  • Laboratoire Ligérien de Linguistique. (2017). Modal—modèles de l’annotation de la modalité à l’Oral. https://hdl.handle.net/11403/modal/v1

  • Langendoen, D. T., & Simons, G. F. (1995). A rationale for the TEI recommendations for feature-structure markup. Computers and the Humanities, 29(3), 191–209. https://doi.org/10.1007/BF01830616

    Article  Google Scholar 

  • Lewis, C. T. (1890). An elementary Latin dictionary. Oxford University Press.

    Google Scholar 

  • Lewis, C. T., & Short, C. (1879). A Latin dictionary, founded on Andrews’ edition of Freund’s Latin Dictionary. Revised, enlarged and in great part rewritten by Charlton T. Lewis, PhD. and Charles Short. Clarendon Press.

    Google Scholar 

  • Marongiu, P., & Dell’Oro, F. (2021). “certe”. v.1.0. WoPoss. https://woposs.unine.ch/maps/map-certe.html

  • Matthewson, L. (2016). Modality. In M. Aloni & P. Dekker (Eds.), The Cambridge handbook of formal semantics (pp. 525–559). Cambridge University Press. https://doi.org/10.1017/CBO9781139236157.019

    Chapter  Google Scholar 

  • McGillivray, B., & Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In P. Bennett, M. Durrell, S. Scheible, & R. J. Whitt (Eds.), New methods in historical corpus linguistics (pp. 247–257). Narr.

    Google Scholar 

  • McGillivray, B., Kondakova, D., Burman, A., Dell’Oro, F., Bermúdez Sabel, H., Marongiu, P., & Márquez Cruz, M. (2022). A new corpus annotation framework for Latin diachronic lexical semantics. Journal of Latin Linguistics, 21(1), 47–105. https://doi.org/10.1515/joll-2022-2007

    Article  Google Scholar 

  • Murata, M. (2014). RELAX NG. Retrieved April 6, 2022, from https://relaxng.org/

  • Narrog, H. (2012). Modality, subjectivity, and semantic change: A cross-linguistic perspective. Oxford University Press.

    Book  Google Scholar 

  • Nissim, M., & Pietrandrea, P. (Eds.). (2015). Proceedings of the workshop on models for modality annotation. Association for Computational Linguistics. https://aclanthology.org/W15-03

  • Nuyts, J. (2005). The modal confusion: On terminology and the concepts behind it. In A. Klinge & H. H. Müller (Eds.), Modality. Studies in form and function (pp. 5–38). Equinox Publishing.

    Google Scholar 

  • Nuyts, J. (2019). Things to keep in mind when investigating the diachrony of modal expressions. In Workshop on modality: From theory to encoding. University of Lausanne.

  • Passarotti, M. (2019). The project of the Index Thomisticus Treebank. In M. Berti (Ed.), Digital classical philology. Ancient Greek and Latin in the digital revolution (pp. 299–319). Walter De Gruyter GmbH.

    Chapter  Google Scholar 

  • Pinkster, H. (2014). Attitudinal and illocutionary satellites in Latin. In H. Aertsen, M. Hannay, & R. J. Lyall (Eds.), Words in their places. A estschrift for J. Lachlan Mackenzie (pp. 191–198). Vrije Universiteit Amsterdam.

    Google Scholar 

  • Portner, P. (2009). Modality. Oxford University Press.

    Book  Google Scholar 

  • Przepiórkowski, A., & Bański, P. (2011). Which XML standards for multilevel corpus annotation? In Z. Vetulani (Ed.), Human language technology. Challenges for computer science and linguistics (pp. 400–411). Springer. https://doi.org/10.1007/978-3-642-20095-3_37

    Chapter  Google Scholar 

  • Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th annual meeting of the association for computational linguistics: System demonstrations. https://nlp.stanford.edu/pubs/qi2020stanza.pdf

  • Renear, A., Mylonas, E., & Durand, D. (1996). Refining our notion of what text really is: The problem of overlap** hierarchies. In N. Ide & S. Hockey (Eds.), Research in humanities computing (Vol. 4, pp. 263–280). Clarendon Press.

    Google Scholar 

  • Romary, L. (2015). Standards for language resources in ISO—Looking back at 13 fruitful years. ar**v:1510.07851 [cs]. Retrieved April 7, 2022, from http://arxiv.org/abs/1510.07851

  • Rubinstein, A., Harner, H., Krawczyk, E., Simonson, D., Katz, G., & Portner, P. (2013). Toward fine-grained annotation of modality in text. In Proceedings of the IWCS 2013 workshop on annotation of modal meanings in natural language (WAMM) (pp. 38–46). Association for Computational Linguistics. Retrieved April 7, 2022, from https://aclanthology.org/W13-0306

  • Saurí, R., Verhagen, M. & Pustejovsky, J. (2006). Annotating and recognizing event modality in text. In G. Sutcliffe & R. Goebel (Eds.), FLAIRS Conference (pp. 333–339). AAAI Press.

  • Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., & Tahmasebi, N. 2020. SemEval-2020 Task 1: Unsupervised lexical semantic change detection. In Proceedings of the fourteenth workshop on semantic evaluation (pp. 1–23). Barcelona (online). International Committee for Computational Linguistics.

  • Schrickx, J. (2011). Lateinische Modalpartikeln: “Nempe”, “Quippe”, “Scilicet”, “Videlicet” Und “Nimirum.” Brill Academic Pub.

    Book  Google Scholar 

  • Stegmann, J., & Witt, A. (2009). TEI feature structures as a representation format for multiple annotation and generic XML documents. In Proceedings of Balisage: The markup conference 2009 (Vol. 3). Presented at the Balisage: The markup conference 2009, Montréal, Canada. https://doi.org/10.4242/BalisageVol3.Stegmann01

  • TEI Consortium. (2018). ODD. TEI Wiki. Retrieved March 22, 2022, from https://wiki.tei-c.org/index.php/ODD

  • TEI Consortium. (2021a). Feature structures. TEI P5: Guidelines for electronic text encoding and interchange. Version 4.3.0. Retrieved April 6, 2022, from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html

  • TEI Consortium. (2021b). Feature system declaration. In TEI P5: Guidelines for electronic text encoding and interchange (Vol. Version 4.3.0). Retrieved April 6, 2022, from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html#FD

  • Tertullian & Minucius Felix. (1931). Apology. De Spectaculis. Minucius Felix: Octavius (T. R. Glover, & G. H. Rendall, Trans.). Harvard University Press.

  • Triplette, S., Beshero-Bondar, E., & Bermúdez Sabel, H. (2018). A digital humanities approach to cultural translation in Robert Southey’s Amadis of Gaul. Journal of Translation Studies, 2(1), 35–58.

    Google Scholar 

  • Universal Dependencies. (2021). UD Latin Perseus. Universal Dependencies. Retrieved February 1, 2022, from https://github.com/UniversalDependencies/UD_Latin-Perseus

  • van der Auwera, J., & Plungian, V. A. (1998). Modality’s semantic map. Linguistic Typology, 2, 79–124.

    Google Scholar 

  • Véronis, J. (1998). A study of polysemy judgements and inter-annotator agreement. In Advanced papers of the SENSEVAL workshop, Sussex, UK.

  • WoPoss Project. (2022a). Annotation schemes of the WoPoss Project. XSLT, WoPoss. Retrieved April 13, 2022, from https://github.com/WoPoss-project/annotation-schemes

  • WoPoss Project. (2022b). The WoPoss modality corpus. WoPoss. Retrieved May 20, 2022, from https://github.com/WoPoss-project/WoPoss-corpus

Download references

Acknowledgements

This work was funded by the Swiss National Science Foundation (SNSF N° PP00P1 176778 and N° PP00P1 214102) and is led by Francesca Dell’Oro at the University of Neuchâtel. We wish to thank Jan Nuyts and Paola Pietrandrea for providing us support in the elaboration of our annotation model. We would like to thank the reviewers for the effort and expertise they contributed in reviewing the article.

Funding

The research leading to these results is funded by the Swiss National Science Foundation (SNSF N° PP00P1 176778 and N° PP00P1 214102).

Author information

Authors and Affiliations

Authors

Contributions

This paper was written collaboratively: H. Bermúdez Sabel is mainly responsible for Sects. 1, 4.1 (second part), 5, 6, 7 and 8; F. Dell’Oro is mainly responsible for Sects. 2 (second part), 3 and 4.1 (first part) and the general supervision; P. Marongiu is mainly responsible for Sects. 2 (first part) and 4.2.

Corresponding author

Correspondence to Helena Bermúdez-Sabel.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bermúdez-Sabel, H., Dell’Oro, F. & Marongiu, P. Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus. Lang Resources & Evaluation (2024). https://doi.org/10.1007/s10579-023-09706-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10579-023-09706-8

Keywords

Navigation