Abstract
We introduce a new research resource in the form of a high-quality, domain-specific dataset for analysing the document layout of historical documents. The dataset provides an instance segmentation ground truth with 19 classes based on historical layout structures that stem (a) from the publication production process and the respective genres (life sciences, architecture, art, decorative arts, etc.) and, (b) from selected text registers (such as monograph, trade journal, illustrated magazine). Altogether, the dataset contains more than 52,000 instances annotated by experts. A baseline has been tested with the well-known Mask R-CNN and compared to the state-of-the-art model VSR [55]. Inspired by evaluation practices from the field of Natural Language Processing (NLP), we have developed a new method for evaluating annotation consistency. Our method is based on Krippendorff’s alpha (K-\(\alpha \)), a statistic for quantifying the so-called “inter-annotator-agreement”. In particular, we propose an adaptation of K-\(\alpha \) that treats annotations as a multipartite graph for assessing the agreement of a variable number of annotators. The method is adjustable with regard to evaluation strictness, and it can be used in 2D or 3D as well as for a variety of tasks such as semantic segmentation, instance segmentation, and 3D point cloud segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Format defintion: https://cocodataset.org/#format-data.
- 2.
Weights: https://github.com/hpanwar08/detectron2.
- 3.
Weights: https://huggingface.co/bert-base-uncased.
- 4.
References
The Virtual Laboratory. https://vlp-new.ur.de/
Fliegende Blätter (1845–1944). https://nbn-resolving.org/urn:nbn:de:bsz:16-diglit-35697
Centralblatt der Bauverwaltung (1881–1931). https://digital.zlb.de/viewer/image/14688302_1881/1/
Zeitschrift für Psychologie und Physiologie der Sinnesorgane (1890–1909). https://ia804503.us.archive.org/25/items/bub_gb_2dIbAAAAMAAJ/bub_gb_2dIbAAAAMAAJ.pdf
Das Kunstgewerbe (1890–1895). https://doi.org/10.11588/diglit.18553. http://kunstgewerDbe.uni-hd.de
ABBYY Development Inc.: ABBYY FineReader PDF 15. https://pdf.abbyy.com/de/finereader-pdf/
Artstein, R.: Inter-annotator agreement. In: Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation, pp. 297–313. Springer, Dordrecht (2017). https://doi.org/10.1007/978-94-024-0881-2_11
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (2012)
Baumgartner, J. (ed.): Aufbrüche - Seitenpfade - Abwege: Suchbewegungen und Subkulturen im 20. Jahrhundert; Festschrift für Ulrich Linse. Königshausen & Neumann, Würzburg (2004)
Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey. ACM Comput. Surv. 52(6), 1–36 (2020). https://doi.org/10.1145/3355610. https://dl.acm.org/doi/10.1145/3355610
Bruening, U.: Bauhausbücher. Grafische Synthese - synthetische Grafik. Neue Bauhausbücher, pp. 281–296 (2009)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Clausner, C., Antonacopoulos, A., Pletschacher, S.: ICDAR2019 competition on recognition of documents with complex layouts - RDCL2019, p. 6 (2019)
Clausner, C., Pletschacher, S., Antonacopoulos, A.: Aletheia - an advanced document layout and text ground-truthing system for production environments. In: 2011 International Conference on Document Analysis and Recognition, pp. 48–52 (2011). https://doi.org/10.1109/ICDAR.2011.19
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Dengel, A., Shafait, F.: Analysis of the logical layout of documents. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, Chap. 6. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_6. http://springer.longhoe.net/10.1007/978-0-85729-859-1_6
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805 (2018)
Flach, S., Weigel, S. (eds.): WissensKünste: das Wissen der Künste und die Kunst des Wissens = The Knowledge of the Arts and the Art of Knowledge. VDG, Weimar (2011). http://www.gbv.de/dms/weimar/toc/64247172X_toc.pdf
Froschauer, E.M.: “An die Leser!”: Baukunst darstellen und vermitteln; Berliner Architekturzeitschriften um 1900. Wasmuth, Tübingen (2009)
Giedion, S.: Mechanization takes command a contribution to anonymous history. University of Minnesota (1948)
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commu. Methods Meas. 1(1), 77–89 (2007)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
von Helmholtz, H.: Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik. F. Vieweg, Braunschweig (1863). https://vlp-new.ur.de/records/lit3483
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. ar**v preprint ar**v:1503.02531 2(7) (2015)
Kann, V.: Maximum bounded 3-dimensional matching is max SNP-complete. Inf. Process. Lett. 37(1), 27–35 (1991)
Kay, A.: Tesseract: an open-source optical character recognition engine. Linux J. 2007(159), 2 (2007)
Klee, P.: Pädagogisches Skizzenbuch. Bauhausbücher; 2, Langen, München, 2. aufl. edn. (1925). https://doi.org/10.11588/diglit.26771. http://digi.ub.uni-heidelberg.de/diglit/klee1925
Kofax Inc.: OmniPage Ultimate. https://www.kofax.de/products/omnipage
Koichi, K.: Page segmentation techniques in document analysis. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, Chap. 5. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_5. http://springer.longhoe.net/10.1007/978-0-85729-859-1_5
Krauthausen, K.: Paul Valéry and geometry: instrument, writing model, practice. Preprint/Max-Planck-Institut für Wissenschaftsgeschichte 406, Max-Planck-Inst. für Wissenschaftsgeschichte, Berlin (2010)
Krippendorff, K.: Computing Krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43
Lee, B.C.G., et al.: The newspaper navigator dataset: extracting headlines and visual content from 16 million historic newspaper pages in chronicling America. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management, pp. 3055–3062 (2020)
Li, M., et al.: DocBank: a benchmark dataset for document layout analysis. ar**v preprint ar**v:2006.01038 (2020)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Marinai, S.: Introduction to document analysis and recognition. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. SCI, vol. 90, pp. 1–20. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_1
McCulloh, I., Burck, J., Behling, J., Burks, M., Parker, J.: Leadership of data annotation teams. In: 2018 International Workshop on Social Sensing (SocialSens), pp. 26–31 (2018). https://doi.org/10.1109/SocialSens.2018.00018
McLoughlin, W.G.: Revivals, Awakening and Reform. University of Chicago Press, Chicago (1978)
Nassar, J., Pavon-Harr, V., Bosch, M., McCulloh, I.: Assessing data quality of annotations with Krippendorff alpha for applications in computer vision. ar**v preprint ar**v:1912.10107 (2019)
Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The IMPACT dataset of historical document images. In: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP 2013, Washington, District of Columbia, p. 123. ACM Press (2013). https://doi.org/10.1145/2501115.2501130. http://dl.acm.org/citation.cfm?doid=2501115.2501130
Pattern Recognition & Image Analysis Research Lab: Aletheia document analysis system. https://www.primaresearch.org/tools/Aletheia
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. ar**v:1506.01497 [cs], January 2016. http://arxiv.org/abs/1506.01497
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Ribeiro, V., Avila, S., Valle, E.: Handling inter-annotator agreement for automated skin lesion segmentation. ar**v preprint ar**v:1906.02415 (2019)
Richarz, J., Fink, G.A., et al.: Towards semi-supervised transcription of handwritten historical weather reports. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 180–184. IEEE (2012)
Sekachev, B., et al.: OpenCV/CVAT: v1.1.0, August 2020. https://doi.org/10.5281/zenodo.4009388
Shen, Z., Zhang, K., Dell, M.: A large dataset of historical Japanese documents with complex layouts. ar**v:2004.08686 [cs], April 2020. http://arxiv.org/abs/2004.08686
Stielau, A.: Kunst und Künstler im Blickfeld der satirischen Zeitschriften ‘Fliegende Blätter’ und ‘Punch’. Aachen University (1976)
Wevers, M., Smits, T.: The visual digital turn: using neural networks to study historical images. Digital Scholarship in the Humanities, January 2019. https://doi.org/10.1093/llc/fqy085. https://academic.oup.com/dsh/advance-article/doi/10.1093/llc/fqy085/5296356
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
**e, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. ar**v:1912.13318 [cs], June 2020. https://doi.org/10.1145/3394486.3403172. http://arxiv.org/abs/1912.13318
Zhang, P., et al.: VSR: a unified framework for document layout analysis combining vision, semantics and relations. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 115–130. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_8
Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Acknowledgments
This work was supported by the Thuringian Ministry for Economy, Science and Digital Society/Thüringer Aufbaubank (TMWWDG/TAB). In addition, the following persons should be mentioned, without whom the project would not have been feasible: Sascha Breithaupt, Johannes Hess, Henrik Leisdon, Josephine Tiede and Ina Tscherner. Lastly, we would like to thank Christian Benz and Jan Frederick Eick for their in-depth discussion and feedback and in particular Henning Schmidgen.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1
In this part some example calculations for K-\(\alpha \) are illustrated. It shows different forms of how the metric can be adapted. As a helpful resource for further understanding we recommend reading [33]. There are two aspects that we assume will affect most adaptation possibilities: the creation of the cost matrix, which would change the creation of the reliability data and the handling of missing data when calculating K-\(\alpha \).
Figure 5 is used as an example and the resulting reliability matrix after the graph matching would look as shown in the matrix below. For this matching a IoU threshold of 0.5 is used and the normal matching is used.
Unit 1 | Unit 2 | Unit 3 | Unit 4 | Unit 5 | |
---|---|---|---|---|---|
\(\varOmega _A\) | \(a_1\) | \(a_2\) | \(a_3\) | \(a_4\) | \(\varnothing \) |
\(\varOmega _B\) | \(b_1\) | \(b_2\) | \(b_3\) | \(b_4\) | \(\varnothing \) |
\(\varOmega _C\) | \(\varnothing \) | \(c_2\) | \(c_3\) | \(c_4\) | \(c_1\) |
Three annotator visual examples of matching with \(IoU>0.5\) threshold. Image source: https://commons.wikimedia.org/wiki/File:Smile_at_a_stranger.jpg
1.1 Strict Matching, Don’t Allow Missing Data
After the reliability data is calculated all values \(a_n\), \(b_m\) and \(c_p\) are replaces with their classes and for this case which doesn’t allow missing data \(\varnothing \) is replaced with the filler class 0. Hence, the matrix would look as follow:
Unit 1 | Unit 2 | Unit 3 | Unit 4 | Unit 5 | |
---|---|---|---|---|---|
\(\varOmega _A\) | 1 | 1 | 3 | 2 | 0 |
\(\varOmega _B\) | 1 | 1 | 3 | 2 | 0 |
\(\varOmega _C\) | 0 | 2 | 3 | 2 | 1 |
From here K-\(\alpha \) is calculated the regular way, by first creating the coincidence matrix. Unit 1 contains \(3(3-1)=6\) pairs, 2 matching 1–1 pairs, 2 mismatching 1–0 pairs and 2 mismatching 0–1 pairs, it contributes \(\frac{2}{3-1}=1\) to the \(o_{1,1}\) cell, \(\frac{2}{3-1}=1\) to the \(o_{1,0}\) cell and \(\frac{2}{3-1}=1\) to \(o_{0,1}\) cell. Unit 2 contains \(\frac{3}{3-1}=6\) pairs, 2 matching 1–1 pairs, 2 mismatching 1–2 pairs and 2 mismatching 2–1 pairs, it contributes \(\frac{2}{3-1}=1\) to the \(o_{1,1}\) cell, \(\frac{2}{3-1}=1\) to the \(o_{1,2}\) cell and \(\frac{2}{3-1}=1\) to the \(o_{2,1}\) cell. Unit 3 contains \(\frac{3}{3-1}=6\) pairs, 6 matching 3–3 pairs, \(\frac{6}{3-1}=3\) to the \(o_{3,3}\) cell. Unit 4 contains \(\frac{3}{3-1}=6\) pairs, 6 matching 2–2 pairs, it contributes \(\frac{6}{3-1}=3\) to the \(o_{2,2}\) cell. Unit 5 contains \(\frac{3}{3-1}=6\) pairs, 2 matching 0–0 pairs, 2 mismatching 0–1 pairs and 2 mismatching 1–0 pairs, it contributes \(\frac{2}{3-1}=1\) to the \(o_{0,0}\) cell, \(\frac{2}{3-1}=1\) to the \(o_{0,1}\) cell and \(\frac{2}{3-1}=1\) to the \(o_{1,0}\) cell. As an example the first value in the coincidence matrix \(o_{0,0}\) is the sum of all value in the five units related to \(o_{0,0}\), which is rather straight forward since only unit 5 contains 0–0 pairs, hence \(o_{0,0}=1\). The coincidence matrix is as follows:
0 | 1 | 2 | 3 | ||
---|---|---|---|---|---|
0 | 1 | 2 | 0 | 0 | 3 |
1 | 2 | 2 | 1 | 0 | 5 |
2 | 0 | 1 | 3 | 0 | 4 |
3 | 0 | 0 | 0 | 3 | 3 |
3 | 5 | 4 | 3 | 15 |
Computing K-\(\alpha \) is now done via Eq. 3, which means for our example:
1.2 Strict Matching, but Allow Missing Data
A second possible version build on the same example shown in Fig. 5 that allows missing data, would transfer the reliability data slightly different. Instead of 0 a * will be used indicating missing data, which won’t be included in the calculation of \(\alpha \).
Unit 1 | Unit 2 | Unit 3 | Unit 4 | Unit 5 | |
---|---|---|---|---|---|
\(\varOmega _A\) | 1 | 1 | 3 | 2 | * |
\(\varOmega _B\) | 1 | 1 | 3 | 2 | * |
\(\varOmega _C\) | * | 2 | 3 | 2 | 1 |
Calculating the coincidence matrix would be done in the same way as before for Unit 2, Unit 3 and Unit 4, but Unit 1 and Unit 5 are different. Unit 1 contains \(2(2-1)=2\) pairs, which are 2 matching 1–1 pairs it contributes \(\frac{2}{2-1}=2\) to the \(o_{1,1}\) cell. Since Unit 4 only contains a single entry, no pairable unit can be found. The coincidence matrix would therefore be:
1 | 2 | 3 | ||
---|---|---|---|---|
1 | 3 | 1 | 0 | 4 |
2 | 1 | 3 | 0 | 4 |
3 | 0 | 0 | 3 | 3 |
4 | 4 | 3 | 11 |
This results in a calculation of alpha with the following values:
Appendix 2
Further information to the historical sources:
The selected documents were already available as digitized sources. They all come from publicly accessible digital collections. These are: the digital collections of University Library Heidelberg (“Pädagogisches Skizzenbuch” [29], “Das Kunstgewerbe” [5] and “Fliegende Blätter” [2]), the Internet Archive (“Zeitschrift für Physiologie und Psychologie der Sinnesorgane” [4]), the Virtual Laboratory (“Lehre von den Tonempfindungen” [25]) and the digital collections of the Berlin State Library (“Centralblatt der Bauverwaltung” [3]).
The Pedagogical Sketchbook by Paul Klee is part of the artistic-experimental domain. It is the second volume in the Bauhaus book series. The so-called Bauhaus books are a series of books published from 1925 to 1930 by Walter Gropius and Lazlo Moholy-Nagy. Although the books appeared as a series in the same publishing house (Albert Langen Verlag), the respective layout varied widely [11]. The publication sequence was also irregular. While in 1925 alone eight publications of the series could be published, in 1926 there were only two and in 1927, 1928, 1929 and 1930 one more volume each. The publication of Klee presented not only his artwork but also presented his art theoretical knowledge. At the same time, it presented aspects of his extensive lectures on visual form at the Bauhaus and conveyed his way of thinking and working on this topic.
Both the journal “Physiology und Psychologie der Sinnesorgane” and Hermann von Helmholtz’s publication “Lehre von den Tonempfindungen” are part of the domain of life sciences. The different types of publications (journal and monograph) have different but typical layout components within their domain, which is why they were both integrated into the dataset.
The journal “Das Kunstgewerbe” appeared every fourteen days from 1890 to 1895 and belongs to the domain of applied arts. The individual issues had a length of 10 pages and a manageable number of illustrations, but the pages were often designed with decorative frames and ornaments.
The illustrated magazine “Fliegende Blätter” appeared from 1844 to 1944, at first irregularly several times a month, later regularly once a week. The humorous-satirical publication was richly illustrated and held in high esteem among the German bourgeoisie. At the same time, the “Fliegende Blätter” are significant both artistically and in terms of printing, due to the high quality of its layout [50].
The “Centralblatt der Bauverwaltung” was a professional journal intended to satisfy the need for information in the construction sector. The journal was first published in April 1881 by the publishing house Ernst & Sohn, in 1931 it was merged with the “Zeitschrift für Bauwesen”, in 1944 the publication was discontinued. The Ministry of Public Works acted as publisher until 1919, and from 1920 to 1931 the Prussian Ministry of Finance. The journal was to serve as a supplement to the existing trade journals and, in contrast to these, was to have a faster publication schedule. Information about construction projects and competitions, projects currently being implemented, new technologies and amended legal framework conditions were to reach the readership more quickly than before and also address international developments. At the same time, however, the journal was to be less elaborately designed than the existing trade organs and art journals. Although the Ministry of Public Works was the editor and the structure of the journal was divided into “official” and “non-official” parts, it can nevertheless not be characterized as a purely official journal of authorities [19].
Appendix 3
See Table 4.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tschirschwitz, D., Klemstein, F., Stein, B., Rodehorst, V. (2022). A Dataset for Analysing Complex Document Layouts in the Digital Humanities and Its Evaluation with Krippendorff’s Alpha. In: Andres, B., Bernard, F., Cremers, D., Frintrop, S., Goldlücke, B., Ihrke, I. (eds) Pattern Recognition. DAGM GCPR 2022. Lecture Notes in Computer Science, vol 13485. Springer, Cham. https://doi.org/10.1007/978-3-031-16788-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-16788-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16787-4
Online ISBN: 978-3-031-16788-1
eBook Packages: Computer ScienceComputer Science (R0)