Making Test Corpora for Question Answering More Representative

Walker, Andrew; Starkey, Andrew; Pan, Jeff Z.; Siddharthan, Advaith

doi:10.1007/978-3-319-11382-1_1

Andrew Walker²²,
Andrew Starkey²³,
Jeff Z. Pan²² &
…
Advaith Siddharthan²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8685))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1061 Accesses

Abstract

Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modeling of the Question Answering Task in the YodaQA System

Towards the benchmarking of question generation: introducing the Monserrate corpus

Article 03 June 2021

Techniques, datasets, evaluation metrics and future directions of a question answering system

Article 22 December 2023

References

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Scientific American 284(5), 28–37 (2001)
Article Google Scholar
Bernstein, A., Kaufmann, E., Göhring, A., Kiefer, C.: Querying ontologies: A controlled english interface for end-users. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 112–126. Springer, Heidelberg (2005)
Chapter Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Brill, E., Lin, J., Banko, M., Dumais, S., Ng, A., et al.: Data-intensive question answering. In: Proceedings of the Tenth Text REtrieval Conference, TREC 2001 (2001)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1), 107–117 (1998)
Article Google Scholar
Buscaldi, D., Rosso, P.: Mining knowledge from wikipedia for the question answering task. In: Proceedings of the International Conference on Language Resources and Evaluation (2006)
Google Scholar
Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)
Article MATH Google Scholar
De Marneffe, M.C.: What’s that supposed to mean? Ph.D. thesis, Stanford University (2012)
Google Scholar
De Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, vol. 6, pp. 449–454 (2006)
Google Scholar
Green, Jr., B.F., Wolf, A.K., Chomsky, C., Laughery, K.: Baseball: an automatic question-answerer. Papers Presented at the May 9-11, 1961, western joint IRE-AIEE-ACM Computer Conference, pp. 219–224. ACM (1961)
Google Scholar
Lin, J., Demner-Fushman, D.: Will pyramids built of nuggets topple over? In: Proceedings of the main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 383–390. Association for Computational Linguistics (2006)
Google Scholar
Simmons, R.F.: Answering english questions by computer: a survey. Commun. ACM 8(1), 53–70 (1965), http://doi.acm.org/10.1145/363707.363732
Article Google Scholar
Swartz, A.: Musicbrainz: A semantic web service. IEEE Intelligent Systems 17(1), 76–77 (2002)
Article Google Scholar
Wales, J., Sanger, L.: Wikipedia, the free encyclopedia (2001), http://en.wikipedia.org/w/index.php?title=Wikipedia&oldid=551616049 (accessed April 22, 2013)
Waltz, D.L.: An english language question answering system for a large relational database. Communications of the ACM 21(7), 526–539 (1978)
Article MATH Google Scholar
Woods, W.A.: Progress in natural language understanding: an application to lunar geology. In: Proceedings of the National Computer Conference and Exposition, AFIPS 1973, June 4-8, 1973, pp. 441–450. ACM, New York (1973), http://doi.acm.org/10.1145/1499586.1499695
Chapter Google Scholar
Woods, W.A.: Lunar rocks in natural english: Explorations in natural language question answering. Linguistic Structures Processing 5, 521–569 (1977)
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Science, University of Aberdeen, UK
Andrew Walker, Jeff Z. Pan & Advaith Siddharthan
Engineering, University of Aberdeen, UK
Andrew Starkey

Authors

Andrew Walker
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Starkey
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Z. Pan
View author publications
You can also search for this author in PubMed Google Scholar
Advaith Siddharthan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Google Inc., Brandschenkestraße 110, 8002, Zurich, Switzerland
Evangelos Kanoulas
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstrasse 9-11, 1040, Vienna, Austria
Mihai Lupu
Information School, University of Sheffield, Sheffield, UK
Paul Clough
Department of Computer Science and IT, RMIT University, 3000, Melbourne, VIC, Australia
Mark Sanderson
Department of Computing, Edge Hill University, L39 4QP, Ormskirk, Lancashire, UK
Mark Hall
Vienna University of Technology, Austria
Allan Hanbury
Information School, University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, UK
Elaine Toms

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walker, A., Starkey, A., Pan, J.Z., Siddharthan, A. (2014). Making Test Corpora for Question Answering More Representative. In: Kanoulas, E., et al. Information Access Evaluation. Multilinguality, Multimodality, and Interaction. CLEF 2014. Lecture Notes in Computer Science, vol 8685. Springer, Cham. https://doi.org/10.1007/978-3-319-11382-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-11382-1_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11381-4
Online ISBN: 978-3-319-11382-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Making Test Corpora for Question Answering More Representative

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modeling of the Question Answering Task in the YodaQA System

Towards the benchmarking of question generation: introducing the Monserrate corpus

Techniques, datasets, evaluation metrics and future directions of a question answering system

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Making Test Corpora for Question Answering More Representative

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modeling of the Question Answering Task in the YodaQA System

Towards the benchmarking of question generation: introducing the Monserrate corpus

Techniques, datasets, evaluation metrics and future directions of a question answering system

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation