Abstract
In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety of complex, non-factoid questions.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agichtein E, Lawrence S and Gravano L (2002) Learning to find answers to questions on the web. ACM Transactions on Internet Technology 4(2):129–162
Berger AL, Lafferty JD (1999) Information retrieval as statistical translation. In: Proceedings of 1999 ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 99), Berkeley CA, Aug. 1999, pp. 222–229
Berger A, Caruana R, Cohn D, Freitag D and Mittal V (2000) Bridging the lexical chasm: Statistical approaches to answer-finding. In: Proceedings of the 23rd Annual Conference on Research and Development in Information Retrieval (ACM SIGIR), Athens, Greece, pp. 192–199
Brill E, Lin J, Banko M, Dumais S and Ng A (2001) Data-intensive question answering. In: proceedings of the 10th Text REtrieval Conference (TREC-10), Gaithersburg, MD
Brown PF, Pietra SAD, Pietra VJD P and Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263–312
Burke R, Hammond K, Kulyukin V, Lytinen S, Tomuro N and Schoenberg S (1997) Question answering from frequently-asked question files: Experiences with the FAQ Finder System. Al Magazine 18(2):57–66
Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1):61–74
Echihabi A and Marcu D (2003) A noisy-channel approach to question answering. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 03), Sapporo, Japan, July 7–12
Girju R (2003) Automatic detection of causal relations for question answering. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Workshop on “Multilingual Summarization and Question Answering—Machine Learning and Beyond“, Sapporo, Japan, July 7–12
Hermjakob U, Echihabi A and Marcu D (2002) Natural language based reformulation resource and web exploitation for question answering. In: Proceedings of the 11th Text REtrieval Conference(TREC-11), Gaithersburg, MD
Hovy EH, Hermjakob U and Lin C-Y (2001) The use of external knowledge in factoid QA. In: Proceedings of the 10th Text REtrieval Conference(TREC-10), Gaithersburg, MD
Ittycheriah A and Roukos S (2002) IBM’s statistical question answering system-TREC 11. In: Proceedings of the 11th Text REtrieval Conference (TREC-11), Gaithersburg, MD
Kwok CCT, Etzioni O and Weld DS (2001) Scaling question answering to the web. ACM Transactions on Information Systems 19(3):242–262
Lin C-Y and Hovy EH (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of 2003 Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27–June 1, pp. 150–157
Moldovan D, Harabagiu S, Girju R, Morarescu P, Lacatusu F, Novischi A, Badulescu A and Bolohan O (2002) LCC tools for question answering. In Proceedings of the 11th Text REtrieval Conference (TREC-11), Gaithersburg, MD
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 03), Sapporo, Japan, July 7–12
Papineni K, Roukos S, Ward T and Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 02), Philadephia, PA, July 7–12, pp. 311–318
Pasca M and Harabagiu S (2001) The informative role of WordNet in open-domain question answering. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2001), Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, Pennsilvania
Prager JM, Chu-Carroll J and Czuba K (2001) Use of WordNet hypernyms for answering what-is questions. In: Proceedings of the 10th Text REtrieval Conference(TREC-10), Gaithersburg, MD
Radev D, Qi H, Zheng Z, Blair-Goldensohn S, Zhang Z, Fan W and Prager J (2001) Mining the web for answers to natural language questions. In: Proceedings of the 10th International Conference on Information and Knowledge Management, Atlanta, GA, pp. 143–150
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Soricut, R., Brill, E. Automatic question answering using the web: Beyond the Factoid. Inf Retrieval 9, 191–206 (2006). https://doi.org/10.1007/s10791-006-7149-y
Issue Date:
DOI: https://doi.org/10.1007/s10791-006-7149-y