Abstract
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agirre, E., & Lopez de Lacalle, O. (2007). UBC-ALM: combining k-NN with SVD for WSD. In Proceedings of the fourth international workshop on semantic evaluations (pp. 342–345).
Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In 16th international conference on computational linguistics (pp. 16–22), Copenhagen.
Bar-Hillel, Y. (1960). Automatic translation of languages. In F. Alt, D. Booth, & R. E. Meagher (Eds.), Advances in computers. New York: Academic Press.
Cai, J. F., Lee, W. S., & Teh, Y. W. (2007). NUS-ML: improving word sense disambiguation using topic features. In Proceedings of the fourth international workshop on semantic evaluations (pp. 249–252).
Ciaramita, M., & Johnson, M. (2004). Multi-component word sense disambiguation. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text (pp. 97–100), Barcelona.
Cottrell, G. W. (1989). A connectionist approach to word sense disambiguation. Research notes in artificial intelligence. San Mateo: Morgan Kaufmann.
Davis, J., Ong, I., Struyf, J., Burnside, E., Page, D., & Costa, V. S. (2007). Change of representation for statistical relational learning. In International joint conferences on artificial intelligence.
Hand, D. J. (1997). Construction and assessment of classification rules. Chichester: Wiley.
Hirst, G. (1987). Semantic interpretation and the resolution of ambiguity. Studies in natural language processing. Cambridge: Cambridge University Press.
John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the eleventh international conference on machine learning (pp. 121–129). San Mateo: Morgan Kaufmann.
Kohavi, R., & John, G. H. (1995). Automatic parameter selection by minimizing estimated error. In 12th international conference on machine learning. San Francisco: Morgan Kaufmann.
Kramer, S., Lavrac, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Dzeroski & N. Lavrac (Eds.), Relational data mining (pp. 262–291). Berlin: Springer.
Lamjiri, A., Demerdash, O., & Kosseim, F. (2004). Simple features for statistical word sense disambiguation. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text (pp. 133–136), Barcelona.
Landwehr, N., Passerini, A., De Raedt, L., & Frasconi, P. (2006). kFOIL: learning simple relational kernels. In Y. Gil & R. Mooney (Eds.), Proceedings of the twenty-first national conference on artificial intelligence.
Lavrac, N., Dzeroski, S., & Grobelnik, M. (1990). Learning nonrecursive definitions of relations with LINUS (Technical report). Jozef Stefan Institute.
Lesk, M. (1986). Automated sense disambiguation using machine-readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC conference (pp. 24–26), Toronto.
Lin, D. (1993). Principle based parsing without overgeneration. In 31st annual meeting of the association for computational linguistics (pp. 112–120), Columbus.
McRoy, S. (1992). Using multiple knowledge sources for word sense discrimination. Computational Linguistics, 18(1), 1–30.
Mihalcea, R., Chklovski, T., & Kilgariff, A. (2004). The SENSEVAL-3 English lexical sample task. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for semantic analysis of text (pp. 25–28), Barcelona.
Miller, G. A., Beckwith, R. T., Fellbaum, C. D., Gross, D., & Miller, K. (1990). Wordnet: an on-line lexical database. International Journal of Lexicography, 3(4), 235–244.
Mohammad, S., & Pedersen, T. (2004). Complementarity of lexical and simple syntactic features: the syntalex approach to SENSEVAL-3. In SENSEVAL-3: 3rd international workshop on the evaluation of systems for the semantic analysis of text (pp. 159–162), Barcelona.
Muggleton, S. (1994). Inductive logic programming: derivations, successes and shortcomings. SIGART Bulletin, 5(1), 5–11.
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: theory and methods. Journal of Logic Programming, 19(20), 629–679.
Muggleton, S., Lodhi, H., Amini, A., & Sternberg, M. J. E. (2005). Support vector inductive logic programming. In 8th international conference on discovery science (pp. 163–175). Berlin: Springer.
Niu, Z. Y., Ji, D. H., & Tan, C. L. (2007). I2R: three systems for word sense discrimination, Chinese word sense disambiguation, and English word sense disambiguation. In Proceedings of the fourth international workshop on semantic evaluations (pp. 177–182).
Nienhuys-Cheng, S., & de Wolf, R. (1997). Foundations of inductive logic programming. Berlin: Springer.
Paes, A., Zaverucha, G., Page, C. D. Jr., & and Srinivasan, A. (2007). LNCS: Vol. 4455 ILP through propositionalization and stochastic k-term DNF learning. Sense disambiguation using inductive logic programming. Selected papers from the 16th international conference on inductive logic programming. Berlin: Springer, (pp. 379–393).
Parker, J., & Stahel, M. (1998). Password: English dictionary for speakers of Portuguese. São Paulo: Martins Fontes.
Pedersen, T. (2002). A baseline methodology for word sense disambiguation. In 3rd international conference on intelligent text processing and computational linguistics, Mexico City.
Pradhan, S., Loper, E., Dligach, D., & Palmer, M. (2007). SemEval-2007 Task-17: English lexical sample, SRL and all words. In Fourth international workshop on semantic evaluations (pp. 87–92), Prague.
Procter, P. (Ed.). (1978). Longman dictionary of contemporary English. Essex: Longman Group.
Quillian, M. R. (1961). A design for an understanding machine. Colloquium of semantic problems in natural language. Cambridge: Cambridge University Press.
Ratnaparkhi, A. (1996). A maximum entropy part-of-speech tagger. Empirical methods in NLP conference. Philadelphia: University of Pennsylvania Press.
Schutze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–124.
Siegel, S. (1956). Nonparametric statistics for the behavioural sciences. New York: McGraw-Hill.
Specia, L. (2006a). A hybrid relational approach for WSD—first results. In Student research workshop at Coling-ACL (pp. 55–60), Sydney.
Specia, L. (2006b). A hybrid relational approach for WSD—first results. In Proceedings of the COLING/ACL 2006 student research workshop (pp. 55–60).
Specia, L., Nunes, M. G. V., & Stevenson, M. (2005). Exploiting parallel texts to produce a multilingual sense-tagged corpus for word sense disambiguation. In RANLP-05, Borovets (pp. 525–531).
Specia, L., Nunes, M. G. V., & Stevenson, M. (2007a). Learning expressive models for word sense disambiguation. In 45th annual meeting of the association for computational linguistics (pp. 41–48), Prague.
Specia, L., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007b). Word sense disambiguation using inductive logic programming. In LNCS: Vol. 4455 Selected papers from the 16th international conference on inductive logic programming (pp. 409–423). Berlin: Springer.
Specia, L., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007c). USP-IBM-1 and USP-IBM-2: the ILP-based systems for lexical sample WSD in SemEval-2007. In 4th international workshop on semantic evaluations (pp. 442–445), Prague.
Specia, L., Das, G. M., Nunes, M. G. V., Srinivasan, A., & Ramakrishnan, G. (2007d). USP-IBM-1 and USP-IBM-2: the ILP-based systems for lexical sample WSD in SemEval-2007. In Proceedings of the fourth international workshop on semantic evaluations (pp. 442–445).
Srinivasan, A. (1999). The aleph manual. Available at http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
Stevenson, M., & Wilks, Y. (2001). The interaction of knowledge sources for word sense disambiguation. Computational Linguistics, 27(3), 321–349.
Wilks, Y., & Stevenson, M. (1997). Combining independent knowledge sources for word sense disambiguation. In 3rd conference on recent advances in natural language processing (pp. 1–7), Tzigov Chark.
Wilks, Y., & Stevenson, M. (1998). The grammar of sense: using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering, 4(1), 1–9.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics (189–196), Cambridge.
Zelezny, F., Srinivasan, A., & Page, C. D. Jr. (2006). Randomised restarted search in ILP. Machine Learning, 64(1–3), 183–208.
Železný, F. & Lavrač, N. (2006). Propositionalization-based relational subgroup discovery with RSD. Machine Learning, 62(1–2), 33–63.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Filip Zelezny and Nada Lavrac.
A.S. is also an Adjust Professor at the Department of Computer Science and Engineering, University of New South Wales; and a Visiting Professor at the Computing Laboratory, University of Oxford.
Rights and permissions
About this article
Cite this article
Specia, L., Srinivasan, A., Joshi, S. et al. An investigation into feature construction to assist word sense disambiguation. Mach Learn 76, 109–136 (2009). https://doi.org/10.1007/s10994-009-5114-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5114-x