Automatic Term Extraction Based on Perplexity of Compound Words

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

  • 1599 Accesses

Abstract

Many methods of term extraction have been discussed in terms of their accuracy on huge corpora. However, when we try to apply various methods that derive from frequency to a small corpus, we may not be able to achieve sufficient accuracy because of the shortage of statistical information on frequency. This paper reports a new way of extracting terms that is tuned for a very small corpus. It focuses on the structure of compound terms and calculates perplexity on the term unit’s left-side and right-side. The results of our experiments revealed that the accuracy with the proposed method was not that advantageous. However, experimentation with the method combining perplexity and frequency information obtained the highest average-precision in comparison with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 160.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ananiadou, S.: A methodology for automatic term recognition. In: Proceedings of the 15th International Conference on Computational Linguistcs (COLING), pp. 1034–1038 (1994)

    Google Scholar 

  2. Asahara, M., Matsumoto, Y.: Extended Models and Tools for High-performance Part-of-Speech Tagger. In: Proceedings of COLING 2000 (2000)

    Google Scholar 

  3. COMPUTERM 1998 First Workshop on Computational Terminology (1998)

    Google Scholar 

  4. COMPUTERM 2002 Second Workshop on Computational Terminology (2002)

    Google Scholar 

  5. Frantzi, K., Ananiadou, S.: The C-value/NC-value method for ATR. Journal of NLP 6(3), 145–179 (1999)

    Google Scholar 

  6. Kageura, K.: TMREC Task: Overview and Evaluation. In: Proc. of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, pp. 411–440 (1999)

    Google Scholar 

  7. Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3(2), 259–289 (1996)

    Article  Google Scholar 

  8. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  9. Nakagawa, H., Mori, T.: Automatic Term Recognition based on Statistics of Compound Nouns and their Components. Terminology 9(2), 201–219 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yoshida, M., Nakagawa, H. (2005). Automatic Term Extraction Based on Perplexity of Compound Words. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_24

Download citation

  • DOI: https://doi.org/10.1007/11562214_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation