Accurate machine translation of real-world conversational interactions requires an understanding of the context of the interaction.
Context includes such factors as where and when the interaction takes place, objects in the real-world environment, the discourse history,
the topic and purpose of the conversation, relationships between participants, etc. This type of knowledge is typically not exploited in current
statistical machine translation systems. In this pilot project we studied statistical machine translation of realistic meeting
conversations. Our main goal was to analyze the impact of contextual modeling on machine translation performance.
One of the problems of translating conversational interactions is the lack of
parallel conversational corpora. For this project we used a subset of
the AMI corpus and produced translations (English->German) for
the audio transcriptions, resulting in a small parallel corpus that was used
for system tuning and evaluation. A baseline statistical MT system was built
for this task using out-of-domain corpora. The errors made by this system on
the meeting translation task were then analyzed by hand and classified into
different error categories. In particular we were interested in errors due to
the lack of contextual modeling. The analysis showed that most contextual
translation errors are word sense errors cause by an explicit model of the
topic or domain. To address these errors we have developed unsupervised
translation disambiguation algorithms.
- Katrin Kirchhoff (PI), Department of Electrical Engineering, UW
- Joyce Chai (PI), Department of Computer Science, Michigan
- Mei Yang, Department of Electrical Engineering, UW
M. Yang & K. Kirchhoff, "Contextual modeling for meeting translation using
unsupervised word sense disambiguation", Proceedings of the 23rd
International Conference on Computational Linguistics (COLING), 2010,
pp. 1227-1235 [pdf]