Abstract
Recent growth in scientific web-content makes it easier for translators to get the most popular equivalent of a scientific term. However, this process is demanding and time-consuming. The aim of this research is to automate the process by building a bilingual glossary for Physics based on the automatic term extraction approach. A domain-specific corpus was built from six websites with recent Arabic scientific articles that were translated by specialists in their respective fields. Then, a basic computational algorithm was applied to the corpus to mine terms along with their translations. The resulting glossary was evaluated by comparing it to existing physics glossaries and it was found out that it outperforms other glossaries in terms of uniformity, recency and availability of authentic usage examples although it includes relatively smaller number of entries. The compiled corpus-based glossary can be a good tool for lexicographers, translators, interpreters, and academicians. It can be used to update existing glossaries with terms that have proven to be of real value to users. This glossary saves language terminologists the problem of having to coin terms that specialists in the field might never use.
Similar content being viewed by others
Data availability
The English-Arabic physics glossary that resulted from this study is available on request from the corresponding author, H.A.
Notes
WebBootCaT is a web service for quickly producing corpora for specialist areas, in a range of languages, from the web.
References
Al-Abed Al-Haq, F., & Al-Essa, S. (2016). Arabicization of business terms from terminology planning perspective. International Journal of English Linguistics, 6(1), 150.
Al-Alami, E. (1990). Allughah al’arabiyah fi mwajahat atta’areeb: Mafhoom atta’areeb (In Arabic). AL-Lisan AL-Arabi, 34, 155–162.
Aliens Science. (2020). al fada’eyon (In Arabic). Retrieved January 2, 2021, from https://www.aliens-sci.com/
Al-Rehamy, H. H., & Walker, C. (2017). SemCluster: Unsupervised automatic keyphrase extraction using affinity propagation. In UK workshop on computational intelligence (pp. 222–235). Springer.
Bounhas, I., Lahbib, W., & Elayeb, B. (2014). Arabic domain terminology extraction: A literature review. In OTM confederated international conferences "On the Move to Meaningful Internet Systems" (pp. 792–799). Springer.
Bowker, L., & Pearson, J. (2002). Working with specialized language: A practical guide to using corpora. Routledge.
Cairo Academy of the Arabic Language. (2007). Mu’jam Alfeyzia. Retrieved December 20, 2020 from http://www.arabicacademy.org.eg/%D8%A7%D9%84%D8%A5%D8%B5%D8%AF%D8%A7%D8%B1%D8%A7%D8%AA/ctl/FileViewer/mid/431/ItemID/25
Chang, F. Bond, Z., & Uchimoto, K. (2008) Extracting bilingual terms from mainly monolingual data.
Damascus Academy of the Arabic Language. (2015). Dictionary of physics - Arabic-English.
Drouin, P. (2004). Detection of domain specific terminology using corpora comparison. In Proceedings of the fourth international conference on language resources and evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
Egyptian Researchers. (2020). al bahethoon al masryoon (In Arabic). Retrieved January 2, 2020, from https://www.egyres.com/
Elmgrab, R. (2016). The creation of terminology in Arabic. American International Journal of Contemporary Research, 6, 75.
Fan, X., Shimizu, N., & Nakagawa, H. (2009). Automatic extraction of bilingual terms from a Chinese-Japanese parallel corpus. In Proceedings of the 3rd international universal communication symposium (pp. 41–45). ACM.
Fawi, F., & Delmonte, R. (2015). Italian Arabic domain terminology extraction from parallel corpora. In CLiC it (p. 130).
Foo, J. (2012). Computational terminology: Exploring bilingual and monolingual term extraction (Doctoral dissertation, Linkö** University Electronic Press).
Gaizauskas, R., Paramita, M. L., Barker, E., Pinnis, M., Aker, A., & Solé, M. P. (2015). Extracting bilingual terms from the Web. Terminology, 21(2), 205–236.
Iraqi Translation Project. (2020). al mashroo’ al Iraqi let tarjama (In Arabic). Retrieved January 2, 2021, from https://www.iqtp.org/
Jordan Academy of Arabic Language. (2016). Almustalahat (In Arabic). Retrieved January 2, 2021, from https://arabic.jo/?p=4568
Muresan, A., & Klavans, J. (2002). A method for automatically building and evaluating dictionary resources. In Proceedings of the language resources and evaluation conference (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
NASA in Arabic. (2020). Nasa bil arabi (In Arabic). Retrieved Janauary 2, 2020 from https://nasainarabic.net/main
nltk.tokenize package. (2020). NLTK 3.6.2 documentation. Retrieved March 21, 2020, from https://www.nltk.org/api/nltk.tokenize.html
Real Sciences. (2020). al uloom al haqeeqyah (In Arabic). Retrieved Janauary 2, 2020, from http://real-sciences.com/
Rennie, R., & Law, J. (Eds.). (2019). A dictionary of physics. Oxford University Press.
Rigouts Terryn, A., Hoste, V., & Lefever, E. (2018). A gold standard for multilingual automatic term extraction from comparable corpora: Term structure and translation equivalents. In 11th International conference on language resources and evaluation (LREC 2018) (pp. 1803–1808). European Language Resources Association (ELRA).
Sabtan, Y. (2016). Bilingual lexicon extraction from Arabic-English parallel corpora with a view to machine translation. Arab World English Journal (AWEJ) Special Issue on Translation. https://doi.org/10.2139/ssrn.2795900
Sager, J. C. (1990). Practical course in terminology processing. John Benjamins Publishing Company. https://doi.org/10.1075/z.44
Samy, D., Moreno-Sandoval, A., Guirao, J. M., & Alfonseca, E. (2006). Building a parallel multilingual corpus (Arabic-Spanish-English). In LREC (pp. 2176–81).
Simon, N. I., & Kešelj, V. (2018). Automatic term extraction in technical domain using part-of-speech and common-word features. In Proceedings of the ACM symposium on document engineering 2018 (pp. 1–4).
Syrian Researchers. (2020). al bahethoon assoryoon (In Arabic). Retrieved Janauary 2, 2020, from https://www.syr-res.com/
The Arab Organization for Education, Culture and Science. (1989) alma'jm almouhd lmstlhat alfizya'a ala'amah walnououiah (anjliza -frnsa -a'rba).
The.sketchengine.co.uk. (2019). The sketch engine. Retrieved December 24, 2019 from https://the.sketchengine.co.uk
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Awwad, H., Sawalha, M., Allawzi, A. et al. Building translator-oriented English-Arabic physics glossary from domain corpus. Int J Speech Technol 26, 151–162 (2023). https://doi.org/10.1007/s10772-022-10001-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-022-10001-0