Abstract
Statistical machine translation (SMT) has, in recent years, improved the accuracy of automated translations. However, SMT systems often fail to deliver human quality translations especially with complex sentences and distant language pairs. Current SMT systems often focus on translating single sentences with clauses being treated in isolation. leading to a loss of contextual information. Discourse markers (DMs) are vital contextual links between discourse segments and this paper examines the divergences in their usage across English and Mandarin Chinese. We highlight important structural differences in composite sentences extracted from a number of parallel corpora, and show examples of how these cases are dealt with by popular SMT systems. Numerous significant divergences, such as contextual omissions, were observed which can lead to incoherent automatic translations. Our objective is to use these findings to guide a framework proposal to address divergences in DM usage in order to improve SMT output quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zuffery, S., Degand, L.: Annotating the Meaning of Discourse Connectives in Multilingual Corpora. Corpus Linguistics and Linguistic Theory, 1–24 (2013)
Tsou, B., Gao, W., Lai, T., Chan, S.: Applying Machine Learning to Identify Chinese Discourse Markers. In: International Conference on Information, Intelligence and Systems, Chania Crete, Greece (1999)
Hussein, M.: Two Accounts of Discourse Markers in English. University of Damascus, Syria (2002)
Hardmeier, C.: Discourse in Statistical Machine Translation: A Survey and a Case Study. In: Discours – Revue de linguistique, psycholinguistique et informatique, Caen, Presses Universitaires de Caen (2012)
Meyer, T., Webber, B.: Implicitation of Discourse Connectives in (Machine) Translation. In: Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria, pp. 19–26 (2013)
Hardmeier, C., Stymne, S., Tiedemann, J., Nivre, J.: Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation. In: 51st Annual Meeting of the ACL, Sofia, Bulgaria, pp. 193–198 (2013)
Hajlaoni, N., Popsecu-Belis, A.: Translating English Discourse Connectives into Arabic: a Corpus-based analysis and an Evaluation Metric. In: CAASL4 Workshop at AMTA (Fourth Workshop on Computational Approaches to Arabic Script-based Languages), San Diego, CA, pp. 1–8 (2013)
Swan, M., Smith, B.: Learner English, 2nd edn. Cambridge University Press, Cambridge (2004)
Chang, P., Jurafsky, D., Manning, C.: Disambiguating “DE” for Chinese-English Machine Translation. In: 4th Workshop on SMT, Athens, Greece, pp. 215–223 (2009a)
Li, Y.: Sensitive Positions and Chinese Complex Sentences: A Comparative Perspective. Journal of Chinese Language and Computing 18(2), 47–59 (2008)
Po-Ching, Y., Rimmington, D.: A Comprehensive Grammar. Routledge, London (2004)
Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., Yamamoto, S.: Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In: LREC, Las Palmas, Spain, pp. 147–152 (2002)
Cettolo, M., Girardi, C., Federico, M.: WIT3: Web Inventory of Transcribed and Translated Talks. In: EAMT, Trento, Italy, pp. 261–268 (2012)
Eisele, A., Chen, Y.: MultiUN: A Multilingual Corpus from United Nation Documents. In: 7th Conference on International Language Resources and Evaluation, Pages, La Valletta, Malta, pp. 2868–2872 (2010)
Hutchinson, B.: Acquiring the Meaning of Discourse Markers. In: 42nd Meeting of ACL, Main Volume, Barcelona, Spain, pp. 684–691 (2004)
Po-Ching, Y., Rimmington, D.: Chinese: Intermediate Chinese, A Grammar and Workbook. Routledge, London (1998)
Po-Ching, Y., Rimmington, D.: Chinese: An Essential Grammar, 2nd edn. Routledge, London (2010)
Ross, C., Sheng Ma, J.: Modern Mandarin Chinese Grammar. Routledge, London (2006)
The Conjunction (2010), http://www.chineseteachers.com/blog/resource_content.jsp?id=142
Wang, C., Huang, L.: Grammaticalisation of Connectives in Mandarin Chinese: A Corpus-Based Study. Language and Linguistics 7(4), 991–1016 (2006)
Xue, N.: Annotating Discourse Connectives in the Chinese Treebank. In: ACL Workshop on Frontiers in Corpus Annotation 2: Pie in the Sky (2005)
Oxford Chinese Dictionary: English-Chinese Chinese-English. Oxford University Press, UK (2009)
Macmillan Publishers Limited 2009–2014, http://www.macmillandictionary.com/thesaurus-category/british/
Thesauraus.com. Roget’s 21st Century Thesaurus, 3rd edn., http://thesaurus.com/
Olive, J., Christianson, C., McCary, J.: Handbook of Natural Language Processing and Machine Translation. Springer, New York (2011)
**a, F.: The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0). Technical Reports, IRCS Report 00-07. Pennsylvania (2000)
Chang, P., Tseng, H., Jurafsky, D., Manning, C.: Discriminative Reordering with Chinese Grammatical Relations Features. In: 3rd Workshop on Syntax and Structure in Statistical Translation at NACCL HTL, Boulder, Colorado (2009b)
Zhou, L., Gao, W., Li, B., Wei, Z., Wong, K.: Cross-lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Language. In: 24th International Conference on Computational Linguistics (COLING), Mumba, India (2012)
Tu, M., Zhou, Y., Zong, C.: Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT. In: 52nd Annual Meeting of the ACL, Baltimore, USA, June 23-25 (2014)
Guilou, L.: Analysing Lexical Consistency in Translation. In: Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria, pp. 10–18 (2013)
Wong, B., Kit, C.: Extending machine translation Evaluation Metrics with Lexical Cohesion to Document Level. In: 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp. 1060–1068 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Steele, D., Specia, L. (2014). Divergences in the Usage of Discourse Markers in English and Mandarin Chinese. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)