Automatic question answering using the web: Beyond the Factoid

Soricut, Radu; Brill, Eric

doi:10.1007/s10791-006-7149-y

Automatic question answering using the web: Beyond the Factoid

Published: March 2006

Volume 9, pages 191–206, (2006)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Automatic question answering using the web: Beyond the Factoid

Download PDF

Radu Soricut¹ &
Eric Brill²

871 Accesses
63 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety of complex, non-factoid questions.

References

Agichtein E, Lawrence S and Gravano L (2002) Learning to find answers to questions on the web. ACM Transactions on Internet Technology 4(2):129–162
Google Scholar
Berger AL, Lafferty JD (1999) Information retrieval as statistical translation. In: Proceedings of 1999 ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 99), Berkeley CA, Aug. 1999, pp. 222–229
Berger A, Caruana R, Cohn D, Freitag D and Mittal V (2000) Bridging the lexical chasm: Statistical approaches to answer-finding. In: Proceedings of the 23rd Annual Conference on Research and Development in Information Retrieval (ACM SIGIR), Athens, Greece, pp. 192–199
Brill E, Lin J, Banko M, Dumais S and Ng A (2001) Data-intensive question answering. In: proceedings of the 10th Text REtrieval Conference (TREC-10), Gaithersburg, MD
Brown PF, Pietra SAD, Pietra VJD P and Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263–312
Google Scholar
Burke R, Hammond K, Kulyukin V, Lytinen S, Tomuro N and Schoenberg S (1997) Question answering from frequently-asked question files: Experiences with the FAQ Finder System. Al Magazine 18(2):57–66
Google Scholar
Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1):61–74
Google Scholar
Echihabi A and Marcu D (2003) A noisy-channel approach to question answering. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 03), Sapporo, Japan, July 7–12
Girju R (2003) Automatic detection of causal relations for question answering. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Workshop on “Multilingual Summarization and Question Answering—Machine Learning and Beyond“, Sapporo, Japan, July 7–12
Hermjakob U, Echihabi A and Marcu D (2002) Natural language based reformulation resource and web exploitation for question answering. In: Proceedings of the 11th Text REtrieval Conference(TREC-11), Gaithersburg, MD
Hovy EH, Hermjakob U and Lin C-Y (2001) The use of external knowledge in factoid QA. In: Proceedings of the 10th Text REtrieval Conference(TREC-10), Gaithersburg, MD
Ittycheriah A and Roukos S (2002) IBM’s statistical question answering system-TREC 11. In: Proceedings of the 11th Text REtrieval Conference (TREC-11), Gaithersburg, MD
Kwok CCT, Etzioni O and Weld DS (2001) Scaling question answering to the web. ACM Transactions on Information Systems 19(3):242–262
Article Google Scholar
Lin C-Y and Hovy EH (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of 2003 Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27–June 1, pp. 150–157
Moldovan D, Harabagiu S, Girju R, Morarescu P, Lacatusu F, Novischi A, Badulescu A and Bolohan O (2002) LCC tools for question answering. In Proceedings of the 11th Text REtrieval Conference (TREC-11), Gaithersburg, MD
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 03), Sapporo, Japan, July 7–12
Papineni K, Roukos S, Ward T and Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 02), Philadephia, PA, July 7–12, pp. 311–318
Pasca M and Harabagiu S (2001) The informative role of WordNet in open-domain question answering. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2001), Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, Pennsilvania
Prager JM, Chu-Carroll J and Czuba K (2001) Use of WordNet hypernyms for answering what-is questions. In: Proceedings of the 10th Text REtrieval Conference(TREC-10), Gaithersburg, MD
Radev D, Qi H, Zheng Z, Blair-Goldensohn S, Zhang Z, Fan W and Prager J (2001) Mining the web for answers to natural language questions. In: Proceedings of the 10th International Conference on Information and Knowledge Management, Atlanta, GA, pp. 143–150

Download references

Author information

Authors and Affiliations

Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del key, CA, 90292, USA
Radu Soricut
Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Eric Brill

Authors

Radu Soricut
View author publications
You can also search for this author in PubMed Google Scholar
Eric Brill
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radu Soricut.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soricut, R., Brill, E. Automatic question answering using the web: Beyond the Factoid. Inf Retrieval 9, 191–206 (2006). https://doi.org/10.1007/s10791-006-7149-y

Download citation

Issue Date: March 2006
DOI: https://doi.org/10.1007/s10791-006-7149-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automatic question answering using the web: Beyond the Factoid

Abstract

Article PDF

Similar content being viewed by others

R/quest: A Question Answering System

Modeling of the Question Answering Task in the YodaQA System

AQA: Automatic Question Answering System for Czech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic question answering using the web: Beyond the Factoid

Abstract

Article PDF

Similar content being viewed by others

R/quest: A Question Answering System

Modeling of the Question Answering Task in the YodaQA System

AQA: Automatic Question Answering System for Czech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation