Log in

Building translator-oriented English-Arabic physics glossary from domain corpus

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Recent growth in scientific web-content makes it easier for translators to get the most popular equivalent of a scientific term. However, this process is demanding and time-consuming. The aim of this research is to automate the process by building a bilingual glossary for Physics based on the automatic term extraction approach. A domain-specific corpus was built from six websites with recent Arabic scientific articles that were translated by specialists in their respective fields. Then, a basic computational algorithm was applied to the corpus to mine terms along with their translations. The resulting glossary was evaluated by comparing it to existing physics glossaries and it was found out that it outperforms other glossaries in terms of uniformity, recency and availability of authentic usage examples although it includes relatively smaller number of entries. The compiled corpus-based glossary can be a good tool for lexicographers, translators, interpreters, and academicians. It can be used to update existing glossaries with terms that have proven to be of real value to users. This glossary saves language terminologists the problem of having to coin terms that specialists in the field might never use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The English-Arabic physics glossary that resulted from this study is available on request from the corresponding author, H.A.

Notes

  1. WebBootCaT is a web service for quickly producing corpora for specialist areas, in a range of languages, from the web.

References

  • Al-Abed Al-Haq, F., & Al-Essa, S. (2016). Arabicization of business terms from terminology planning perspective. International Journal of English Linguistics, 6(1), 150.

    Article  Google Scholar 

  • Al-Alami, E. (1990). Allughah al’arabiyah fi mwajahat atta’areeb: Mafhoom atta’areeb (In Arabic). AL-Lisan AL-Arabi, 34, 155–162.

    Google Scholar 

  • Aliens Science. (2020). al fada’eyon (In Arabic). Retrieved January 2, 2021, from https://www.aliens-sci.com/

  • Al-Rehamy, H. H., & Walker, C. (2017). SemCluster: Unsupervised automatic keyphrase extraction using affinity propagation. In UK workshop on computational intelligence (pp. 222–235). Springer.

  • Bounhas, I., Lahbib, W., & Elayeb, B. (2014). Arabic domain terminology extraction: A literature review. In OTM confederated international conferences "On the Move to Meaningful Internet Systems" (pp. 792–799). Springer.

  • Bowker, L., & Pearson, J. (2002). Working with specialized language: A practical guide to using corpora. Routledge.

    Book  Google Scholar 

  • Cairo Academy of the Arabic Language. (2007). Mu’jam Alfeyzia. Retrieved December 20, 2020 from http://www.arabicacademy.org.eg/%D8%A7%D9%84%D8%A5%D8%B5%D8%AF%D8%A7%D8%B1%D8%A7%D8%AA/ctl/FileViewer/mid/431/ItemID/25

  • Chang, F. Bond, Z., & Uchimoto, K. (2008) Extracting bilingual terms from mainly monolingual data.

  • Damascus Academy of the Arabic Language. (2015). Dictionary of physics - Arabic-English.

  • Drouin, P. (2004). Detection of domain specific terminology using corpora comparison. In Proceedings of the fourth international conference on language resources and evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).

  • Egyptian Researchers. (2020). al bahethoon al masryoon (In Arabic). Retrieved January 2, 2020, from https://www.egyres.com/

  • Elmgrab, R. (2016). The creation of terminology in Arabic. American International Journal of Contemporary Research, 6, 75.

    Google Scholar 

  • Fan, X., Shimizu, N., & Nakagawa, H. (2009). Automatic extraction of bilingual terms from a Chinese-Japanese parallel corpus. In Proceedings of the 3rd international universal communication symposium (pp. 41–45). ACM.‏

  • Fawi, F., & Delmonte, R. (2015). Italian Arabic domain terminology extraction from parallel corpora. In CLiC it (p. 130).‏

  • Foo, J. (2012). Computational terminology: Exploring bilingual and monolingual term extraction (Doctoral dissertation, Linkö** University Electronic Press).‏

  • Gaizauskas, R., Paramita, M. L., Barker, E., Pinnis, M., Aker, A., & Solé, M. P. (2015). Extracting bilingual terms from the Web. Terminology, 21(2), 205–236.

    Google Scholar 

  • Iraqi Translation Project. (2020). al mashroo’ al Iraqi let tarjama (In Arabic). Retrieved January 2, 2021, from https://www.iqtp.org/

  • Jordan Academy of Arabic Language. (2016). Almustalahat (In Arabic). Retrieved January 2, 2021, from https://arabic.jo/?p=4568

  • Muresan, A., & Klavans, J. (2002). A method for automatically building and evaluating dictionary resources. In Proceedings of the language resources and evaluation conference (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).

  • NASA in Arabic. (2020). Nasa bil arabi (In Arabic). Retrieved Janauary 2, 2020 from https://nasainarabic.net/main

  • nltk.tokenize package. (2020). NLTK 3.6.2 documentation. Retrieved March 21, 2020, from https://www.nltk.org/api/nltk.tokenize.html

  • Real Sciences. (2020). al uloom al haqeeqyah (In Arabic). Retrieved Janauary 2, 2020, from http://real-sciences.com/

  • Rennie, R., & Law, J. (Eds.). (2019). A dictionary of physics. Oxford University Press.

    Google Scholar 

  • Rigouts Terryn, A., Hoste, V., & Lefever, E. (2018). A gold standard for multilingual automatic term extraction from comparable corpora: Term structure and translation equivalents. In 11th International conference on language resources and evaluation (LREC 2018) (pp. 1803–1808). European Language Resources Association (ELRA).

  • Sabtan, Y. (2016). Bilingual lexicon extraction from Arabic-English parallel corpora with a view to machine translation. Arab World English Journal (AWEJ) Special Issue on Translation. https://doi.org/10.2139/ssrn.2795900

    Article  Google Scholar 

  • Sager, J. C. (1990). Practical course in terminology processing. John Benjamins Publishing Company. https://doi.org/10.1075/z.44

    Book  Google Scholar 

  • Samy, D., Moreno-Sandoval, A., Guirao, J. M., & Alfonseca, E. (2006). Building a parallel multilingual corpus (Arabic-Spanish-English). In LREC (pp. 2176–81).

  • Simon, N. I., & Kešelj, V. (2018). Automatic term extraction in technical domain using part-of-speech and common-word features. In Proceedings of the ACM symposium on document engineering 2018 (pp. 1–4).

  • Syrian Researchers. (2020). al bahethoon assoryoon (In Arabic). Retrieved Janauary 2, 2020, from https://www.syr-res.com/

  • The Arab Organization for Education, Culture and Science. (1989) alma'jm almouhd lmstlhat alfizya'a ala'amah walnououiah (anjliza -frnsa -a'rba).

  • The.sketchengine.co.uk. (2019). The sketch engine. Retrieved December 24, 2019 from https://the.sketchengine.co.uk

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hasna Awwad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Awwad, H., Sawalha, M., Allawzi, A. et al. Building translator-oriented English-Arabic physics glossary from domain corpus. Int J Speech Technol 26, 151–162 (2023). https://doi.org/10.1007/s10772-022-10001-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-10001-0

Keywords

Navigation