Contextual Machine Translation

Overview

Accurate machine translation of real-world conversational interactions requires an understanding of the context of the interaction. Context includes such factors as where and when the interaction takes place, objects in the real-world environment, the discourse history, the topic and purpose of the conversation, relationships between participants, etc. This type of knowledge is typically not exploited in current statistical machine translation systems. In this pilot project we studied statistical machine translation of realistic meeting conversations. Our main goal was to analyze the impact of contextual modeling on machine translation performance.

Methods

One of the problems of translating conversational interactions is the lack of parallel conversational corpora. For this project we used a subset of the AMI corpus and produced translations (English->German) for the audio transcriptions, resulting in a small parallel corpus that was used for system tuning and evaluation. A baseline statistical MT system was built for this task using out-of-domain corpora. The errors made by this system on the meeting translation task were then analyzed by hand and classified into different error categories. In particular we were interested in errors due to the lack of contextual modeling. The analysis showed that most contextual translation errors are word sense errors cause by an explicit model of the topic or domain. To address these errors we have developed unsupervised translation disambiguation algorithms.

Team Members



Publications

M. Yang & K. Kirchhoff, "Contextual modeling for meeting translation using unsupervised word sense disambiguation", Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010, pp. 1227-1235 [pdf]