Keywords

1 Introduction

1.1 Machine-First Human-Optimized Model Versus Human Augmentation Model

Translating a large amount of content relying solely on human power is impossible. At the same time, machine translation (MT) results are not perfect (Machine translations must sacrifice quality for speed, convenience, or cost). Rather than using countless man-hours and effort for entire translations or living with unsatisfactory machine-generated results, a machine-first human optimized model has been the status quo. Post-editing is the process whereby humans amend machine-generated translation to achieve an acceptable result [1]. Human linguists have a better understanding of context and utilize better creativity to complement machine power to generate a large amount of contents with reasonable quality in a short period of time.

The main problem of this approach is that human editors can only do minor editing—they can make the final text understandable, but style, terminology, grammar, and syntax might not be perfect. Studies found that most professional translators surveyed rated the post-editing experience negatively. The reasons are “lack of creativity, tediousness of the task, and limited opportunity to create quality.” [2] The reason is that standard post-editing interfaces violate basic precepts of human-computer interaction (HCI) design. It is impossible for human translators to clean up all the problematic translations generated by the MT system [3].

Interactive and adaptive MT systems like the one developed by Lilt (https://labs.lilt.com/) have emerged on the market by taking on the idea of an augmented translator made more productive by machine assistance (machine continuously learns in real-time from human feedback). This type of system tries to put the human back in the center of the translation process.

1.2 Human-Machine Symbiosis and Human-Human Collaboration

Human-machine symbiosis [4] has been identified as one of the primary challenges of HCI research. The ideal vision of human-machine symbiosis is one where humans are coupled to machines in a harmonious way. To achieve this, related technology needs to exhibit characteristics typically associated with human behavior and intelligence [4].

In this study, we study collaborations among human translators to help define potential Human-Machine Interface (HCI) and Human-Machine Teaming (HMT) models for future human-centered machine translation systems in which an AI-based agent serves as a real-time partner. We are interested in finding answers on the following questions: If we have an AI-based agent as a machine translation assistant, what are the main functions of the agent? How does the agent interact with a human translator so they can work holistically as partners? What human deficiencies can be augmented by an AI-based agent and how? What kind of human behaviors should the AI-based agent mimic? What level of expertise should a human translator expect from an AI-based agent?

2 Methodologies

2.1 Translators’ Study Group

Human translators practice in order to gain competency in understanding the source language and generating appropriate target language based on that understanding. One way for translators to improve their skills is to have a group dialog in which they: (1) explain/discuss their understanding of the source text and context; (2) explain/discuss their target language choices; (3) have a dialog/argument about differing opinions, with the goal of building a consensus. The treasure trove of information encoded in these explanations, discussions, and dialogs can be of great use for defining potential Human-machine interfaces and interactions.

We used a data set collected from an online translation study group. The group was composed of 32 members, of which 11 were American Translator Association (ATA) certificated translators (specializing in English to Chinese, or Chinese to English language pairs), and 9 members who either held graduate degrees in translation, had multiple years of experience working for well-respected translation companies, or had experience translating published books. The goal of this study group was to improve translation skills and help the participants prepare for ATA certification exams. The format of the group study was as follows: every member translated an assigned piece, then all versions produced were put into a shared document where everybody could comment on the other members’ results and a back and forth dialog could ensue from multiple comments on the same piece. In this paper, we studied 10 translation pieces (five of them were English → Chinese translation pieces and the other five were Chinese → English translation pieces) and analyzed 292 dialogs when group members discussed the translation results on these pieces.

2.2 Research Questions

A dialog always started with an initial comment. The initial comments would focus on various categories such as:

  1. (1)

    a word, phrase, or terminology

  2. (2)

    grammar

  3. (3)

    sentence structure

  4. (4)

    punctuation

  5. (5)

    logic relation

  6. (6)

    misunderstanding on the source language content

  7. (7)

    faithfulness of target language delivery based on source language understanding

  8. (8)

    quality of target language delivery

  9. (9)

Therefore, the first question we asked is which categories from the above list were discussed more frequently than the others, and why. For example, a human translator can easily identify a misused word, but have a harder time identifying problematic sentence structures due to cognitive constraints and knowledge limits. Consequently, easily identifiable categories would be discussed more and hard to identify categories would be discussed less. From the perspective of human-machine symbiosis, machines can augment human capabilities on these hard tasks by automatically identifying the hard to identify categories for the human.

The next question we asked is whether a discussion (represented by a dialog) is effective to generate better source understanding or better target language delivery. For example, the initial comment raised a question on the usage of a phrase, but there was no follow-up discussion. This indicated a discussion associated with no actual dialog move, or an ineffective dialog move. Then, we summarized all the dialogs and found out what percentage of them were ineffective, effective, or very effective. From the perspective of human-machine symbiosis, we wanted to identify the defects of human-human dialog and see whether machines could augment humans in these problematic areas.

2.3 Dialog Categories

Table 1 lists all the dialog categories defined at the sentence level and above. Table 2 lists all the dialog categories we defined at the word level.

Table 1. Dialog categories.
Table 2. Dialog categories (at the word-level).

2.4 Types of Collaborative Dialog Moves

We identify four types of collaborative dialog moves [5] towards an initial comment: (1) no dialog moves; (2) ineffective dialog moves; (3) simple but effective dialog moves; (4) constructive dialog moves such as argumentative moves (indirect check, challenge, and counterarguments), and constructive moves (adding information, explaining information, evaluating information, transforming information, summarizing information, etc.).

Explanation, in this context, serves as a mechanism through which the translators’ opinions can be connected and brought to the subject matter (e.g., “I used this word because I wanted to make the English translation as close to the style of an English paper’s editor”) and serve the broad function of guiding reasoning (e.g. “MO is usually used in reference to criminal acts, but it can be used within other contexts sometimes. What do you think?”) A well-explained dialogue move is an effective move while an ineffective dialogue move is usually associated with bad explanation (e.g. “I felt at that moment that I should use that word”).

3 Analysis Results and Discussions

3.1 Text Sources for Translation Materials

Table 3 lists the titles and source links for the original text for the English-to-Chinese translation exercises. Table 4 lists the titles and source links for the original text for the Chinese-to-English translation exercises.

Table 3. Titles and links for the English-to-Chinese translation pieces (originals).
Table 4. Titles and links for Chinese-to-English translation pieces (originals).

3.2 Dialog Categories

Table 5 shows the dialog topic statistics for the five English-to-Chinese translation pieces, and since more than half of the dialogs focus on words/phrases, Table 6 shows the specific statistics for the word-level dialog topics regarding the same five English-to-Chinese translation pieces.

Table 5. Statistics on the dialog categories for five English-to-Chinese translation pieces (sentence level and above).
Table 6. Statistics on the dialog categories for five English-to-Chinese translation pieces (word/phrase level).

Table 7 shows the dialog topic statistics for the five Chinese-to-English translation pieces, and again, since more than half of the dialogs focus on words/phrases, Table 8 shows the specific statistics for the word-level dialog topics regarding the same five English-to-Chinese translation pieces. Figures 1 and 2 show percentage comparison at the sentence level and above, and at the word level.

Table 7. Statistics on the dialog categories for five Chinese-to-English translation pieces (sentence level and above).
Table 8. Statistics on the dialog categories for five Chinese-to-English translation pieces (word/phrase level).
Fig. 1.
figure 1

Percentage comparison of various dialog categories between Chinese-to-English and English-to-Chinese translation pieces at sentence level and above.

Fig. 2.
figure 2

Percentage comparison of various dialog categories between Chinese-to-English and English-to-Chinese translation pieces at word level.

We can observe that word-level comments are dominating (>50%), and the next three large categories are confirmation of good translation, source misunderstanding, poor target expression delivery at both the sentence level and above, and at the word level. At the word level, literal translation is also a significant category, but literal translation can also be classified as poor target expression delivery.

3.3 Dialog Types

Figures 3 and 4 show statistics on dialog move types for the English-to-Chinese translation sample pieces and the Chinese-to-English translation sample pieces. For different dialog categories, simple but effective dialog seems to the dominant dialog move types. Constructive dialogs happened sometimes but not very frequently.

Fig. 3.
figure 3

Statistics on dialog move types for the English-to-Chinese translation sample piece.

Fig. 4.
figure 4

Statistics on dialog move types for the Chinese-to-English translation sample piece.

3.4 Interpreting the Results and Implications for HCI/HMT

Based on the results listed in Sect. 3.3, we summarize the observations and the design implications for HCI/HMT in Table 9.

Table 9. Observations made based on collected statistics and their HCI design implications.

3.5 Discussion

A good translator does not want a machine translation engine that can produce translation results on its own and let the human translators edit the results afterwards. What is ideal for them is instead an AI-based translation agent that can serve the role of a real-time partner.

Even though it is important for this intelligent assistant to be able to produce some auto-translated contents on the fly, the assistant should also behave like a human partner (e.g. confirm good translation results or point out major problems with explanations and reasoning); have domain and linguistic knowledge similar to a human expert; have superior cognitive capability and unique analytic perspective to complement human deficiencies; have the ability to perform quick information search and retrieval to support real-time interaction; and have the ability to adapt to different characteristics of the human translator it works with (e.g. native/non-native speaker of certain language; level of knowledge in certain domain). Consequently, the assistant can help the human translator to understand the source content better and deliver better target expressions.

Leveraging machine understanding to complement human understanding and using human inputs to guide machine understanding are key building components for develo** an AI-based machine translation agent.

Another challenging—but potentially rewarding—topic is how to facilitate in-depth constructive dialogs between a human translator and the machine partner. The goal is to stimulate each other and push the understanding forward through a conversation, this is the ideal symbiosis scenario between human understanding and machine understanding.

4 Conclusion

The objective of this study is to define a Human-Machine Teaming (HMT) model for AI-powered human-centered machine translation systems by learning from human-human group discussion. For this purpose, we used a data set collected from an online translation study group composed of expert and experienced translators. The data set had 10 translation pieces (five of them were English → Chinese translation pieces and the other five were Chinese → English translation pieces) and we analyzed 292 dialogs within which group members discussed the translation results on these pieces. We studied dialog categories and dialog moves. For dialog categories, word-level comments dominated the discussion (>50%), and the next three largest categories are confirmation of good translation, source misunderstanding, poor target expression delivery at both the sentence and above level and at the word level. For dialog moves, simple but effective dialog move seems to the dominant dialog move type. Constructive dialog moves, such as argumentative moves (indirect check, challenge, and counterarguments) and constructive moves (adding information, explaining information, evaluating information, transforming information, summarizing information, etc.), happened sometimes but not very frequent. Based on these findings, we derive the HCI/HMT design implications for an AI-based agent: provide better capability beyond the word/phrase level to complement human deficiencies; focus on building algorithms to support better source understanding and target expression delivery; provide quick information search and retrieval to support real-time interaction; provide confirmation to a human partner’s good translation with reasons and explanations; provide help regarding source understanding and target language delivery based on the native language of the human partner; act in the role of a “lead translator” who has better domain and linguistic knowledge, superior cognitive capability, and unique analytic perspective to complement human deficiencies; and perform in-depth constructive dialogs with human partners by stimulating thoughts from each other.