JT

UW-EE SSLI-LAB Machine Translation Reading Group

The ongoing saga of our continued quest to become experts in machine translation by reviewing and discussing a number of both standard classic (and out of date) and recent statistical (and some non-statistical) techniques for MT (machine translation). We will also cover any other relevant papers from computational linguistics and machine learning. The group meetings will be informal, encouraging creative discussion in any time remaining after the end of reviewing a paper. Also, see below for upcoming calls and related links.

The group is part of SSLI-Lab. The Signal, Speech, and Language Interpolation (SSLI) laboratory the University of Washington, department of Electrical Engineering involves research related to all methods of working with time signals, in particular speech and language, but other forms of such signals as well.

To receive announcements for this group, send mail to katrin@ee.washington.edu and/or bilmes@ee.washington.edu.

If you would like to lead a discussion (and we encourage you to volunteer), please email. While the list of papers below will give you something to choose from, you are encouraged to suggest other relevant papers in this area.


Fall 2005 quarter, we will meet every week in room Sieg-424, on Wednesdays from 3:30-5:30pm

Discussions, Fall 2005

Topic Readings Date/Time/Location Discussion leader Slides Notes
Word Alignment B. Taskar, S. Lacoste-Julien and D. Klein, A Discriminative Matching Approach to Word Alignment Wed, Oct 19th, 3:30-5:30 Sieg-424 Takahiro Shinozaki slides notes
Translation Model D. Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation Wed, Oct 26th, 3:30-5:30 Sieg-424 Karim Filali - -
Reordering S. Kanthak et al. Novel reordering approaches in phrase-based statistical machine translation Wed, Nov 2nd, 3:30-5:30 Sieg-424 Katrin Kirchhoff slides notes
Example-based MT Lavie et al. A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources Wed, Nov 9th, 3:30-5:30 Sieg-424 Marcus Sammer and Ethan Phelps slides notes
Reordering Michael Collins, Philipp Koehn, and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation Wed, Nov 16th, 3:30-5:30 Sieg-424 Sarah Schwarm - -
- - Wed, Nov 23rd, 3:30-5:30 Sieg-424 - - -
- - Wed, Nov 30th, 3:30-5:30 Sieg-424 - - -
- - Wed, Dec 7th, 3:30-5:30 Sieg-424 - - -

Discussions, Spring 2005

Topic Readings Date/Time Location Discussion leader Notes
Example based MT this (from ACL91) and that (from ACL96) (paper 1 and 2 below) Thursday, April 21st, 2005 9:00am Sieg-424 Kevin Duh slides for today.
Algorithms for Syntax-Aware Statistical Machine Translation link (or see Melamed's #14 below) Thursday, April 28th, 2005 9:00am Sieg-424 Jeremy Kahn slides
PCFGs and the Inside/Outside Algorithm Charniak, "Statistical Langauge Learning", Chapters 5-7. (see email for reading). Thursday, May 12th, 2005 9:00am Sieg-424 Jeff Bilmes slides
Continuation of last week - Thursday, May 19th, 2005 9:00am Sieg-424 Jeff Bilmes -
UW/ISI meeting UW/ISI meeting Thursday, May 26th, 2005 9:00am Sieg-424 UW/ISI meeting -
Machine Translation with Inferred Stochastic Finite-State Transducers off-campus link and on-campus link Thursday, June 2nd, 2005 9:00am Sieg-424 Sarah Schwarm PDF slides

Discussions, Winter 2005

Topic Readings Date/Time Location Discussion leader Notes
The Web as a Parallel Corpus, by Resnik and Smith link Thursday, March 3rd, 2005 12:00pm Sieg-424 Kevin Duh -
CANCELLED CANCELLED Thursday, Feb 24th, 2005 12:00pm Sieg-424 Mari Ostendorf/Sarah Schwarm -
Phrase pair rescoring with term weightings for statistical machine translation, by B. Zhao, S. Vogel and A. Waibel link Thursday, Feb 17rd, 2005 12:00pm Sieg-424 Katrin Kirchhoff notes from today.
Towards MRS-Based Norwegian-English MT, Oepen et. al. from TMI 2004. link Thursday, Feb 10rd, 2005 12:00pm Sieg-424 Emily Bender notes from today.
"Improving IBM Word-alignment Model 1" by Robert Moore link Thursday, Feb 3rd, 2005 12:00pm Sieg-424 Karim Filali -
"Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information" by Sonja Niessen and Hermann Ney link (you might need to be on campus to get this, or see here or here or here for alternatives). Thursday, Jan 27, 2005 12:00pm Sieg-424 Jeremy Kahn slides

Discussions, Fall 2004

Topic Readings Date/Time Location Discussion leader Notes
Orange: a Method for Evaluating Automatic Metrics for Machine Translation, link paper from Coling'04 (# 7 below) Wed, Oct 20, 2004 3:30pm Sieg-424 Jeff Bilmes PPT slides from today. notes from today.
Reordering Constraints for Phrase-based Statistitical MT link (# 12 below) Wed, Oct 27, 2004 3:30pm Sieg-424 Karim Filali notes from today. slides from today.
Improving a Statistical MT System with Automatically Learned Rewrite Patterns (# 13 below) ssli-local link, external link Wed, Nov 3rd, 2004 3:30pm Sieg-424 Kevin Duh ppt slides from today's group.
Language Model Adaptation for Statistical Machine Translation via Structured Query Models link or (# 2 below) Wed, Nov 10th, 2004 3:30pm Sieg-424 Sarah Schwarm pdf slides from today.
Confidence Estimation for Machine Translation, Blatz et. al. Coling'04. (# 10 below) link Wed, Nov 17th, 2004 3:30pm Sieg-424 Takahiro Shinozaki ppt slides from today.
MT as object in image recognition Object Recognition as Machine Translation, ... pdf link or here (# 11 below) Wed, Nov 24th, 2004 3:30pm Sieg-424 Katrin Kirchhoff notes and slides from today.
POSTPONED - Wed, Dec 1st, 2004 3:30pm Sieg-424 - -
Review of the July 2004 DARPA TIDES meeting. reading material for today. Wed, Dec 8th, 2004 3:30pm Sieg-424 Mari Ostendorf -

Discussions, Spring 2004

Topic Readings Date/Time Location Discussion leader Notes
Translation Templates for MT link (# 11 below) April 12th, 2004 3:00pm AE-108 Jeremy Kahn notes
- Postponed (thesis conflicts) April 19th, 2004 3:00pm MEB 251 - -
- Postponed (more thesis conflicts) April 26th, 2004 3:00pm MEB 251 - -
- - May 3rd, 2004 3:00pm No meeting HLT-NAACL'04 - -
Minimal Recursion Semantics link1 and link2 May 10th, 2004 3:00pm MEB 251 Emily Bender notes
TBD - May 17th, 2004 3:00pm No meeting ICASSP - -
Improved machine translation performance via parallel sentence extraction from comparable corpora," by D. Munteanu, A. Fraser and D. Marcu, from HLT-NAACL04 link May 24th, 2004 3:00pm MEB 251 Mari Ostendorf -
TBD - May 31st, 2004 3:00pm MEB 251 Katrin Kirchhoff -
TBD - June 7th, 2004 3:00pm MEB 251 Gang Ji Last day of quarter
Next in order: - - - Karim Filali, Kevin, Taka, Sarah -

Discussions, Winter 2004

(note that some of the dates below are during break, they will be rescheduled for next quarter).
Topic Readings Date/Time Location Discussion leader Notes
Word clustering brown class-based Jan 15th, 2004 1:00pm AE-108 Jeff Bilmes notes
Two papers on phrase-based translation Koehn et. al. (#3) and Maccu and Wong (#4) below Jan 22th, 2004 1:00pm AE-108 Sarah Schwarm discussion notes (updated Jan 23, 2004)
Syntax-based translation Yamada and Knight , paper #1 below Jan 29th, 2004 1:00pm AE-108 Kevin Duh discussion notes (updated Jan 30, 2004)
Tree-based alignment D. Gildea (paper #2 below) Feb 5th, 2004 1:00pm AE-108 Katrin Kirchhoff discussion notes (updated Feb 9, 2004)
More on MT Evaluation using string-to-string distance Paper by Leusch, Ueffing, and Ney link (#6 below) Feb 12th, 2004 1:00pm AE-108 Franz Pernkopf discussion notes (updated Feb 12, 2004)
Two papers on dynamic programming based search. For Statistical Machine Translation and Using Monotone Alignments in Statistical Translation (links #7 and #8 below) Feb 19th, 2004 1:00pm AE-108 Kwong Tim Ng -
No Meeting due to UW-ISI Kickoff meeting in LA. - Feb 26th, 2004 1:00pm AE-108 - -
Learning Dependency Transduction Models Paper by Alshawi and Douglas, pdf link (Paper #8 below) March 4thth, 2004 1:00pm AE-108 Karim Filali notes and pdf_notes
Language Modeling Day "Statistical Language Modeling based on Variable-Length Sequences", Computer Speech and Language, (17)27-41, 2003. link March 11th, 2004 1:00pm AE-108 Ivan Bulyko Last meeting of the quarter.
TBD - March 18th, 2004 1:00pm AE-108 Jeremy Kahn -
TBD - March 25th, 2004 1:00pm AE-108 Mari Ostendorf -
TBD - March 32nd, 2004 1:00pm AE-108 Emily Bender -

Discussions in Autumn, 2003

Topic Readings Date/Time Location Discussion leader Notes
Introduction, Overview K. Knight tutorial Oct 13, 2003, 3:30pm EE1-M306 Jeff Bilmes -
The mathematics of statistical MT, by Brown et. al. link Oct 20, 2003, 3:30pm EE1-M306 Luca Giacinto Cazzanti and Jeff Bilmes (finished up to model 3, models 4&5 next week)
Finish math of stat MT Models 4 &5 (Luca) / and Katrin will discuss 2 papers on alignment (papers 4 and 5 below) and one on higher level structure (number 10 below) link1, link2, and link3. Oct 27, 2003, 3:30pm AE-108 Luca Giacinto Cazzanti and Katrin Kirchhoff finish up last week, and continue on. Note, Katrin's papers could be classified as either higher-level structure or alignment.
Och et. al.'s "Improved Alignment Models.." paper, in addition to paper 3 from last week. link1(pdf), link2 Nov 3rd, 2003, 3:30pm AE-108 Katrin Kirchhoff -
Evaluation methods (BLEU point/counterpoint), papers 1, 2, and 3 (and we also did 5) below link1, link2, and link3, link4 Nov 10rd, 2003, 3:30pm AE-108 Mari Ostendorf and Sarah Schwarm discussion notes (updated Nov 14, 2003)
Two papers search methods for MT (2 and 3 below), one by Garcia-Varea et. al. and another by Germann et. al. link1 and link2 Nov 17th, 2003, 3:30pm AE-108 Karim Filali paper 6 below (pdf link) is prerequisite reading. discussion notes (updated Nov 17, 2003)
Grammars of standard and more obscure languages, implications for MT. morphology, languages, and aspect. Nov 24th, 2003, 3:30pm AE-108 Emily M. Bender and Jeremy G Kahn notes and handout
No meeting due to ASRU'2003 - Dec 1st, 2003, 3:30pm AE-108 - -
Papers 1 and 2 below in other knowledge sources. link1 and link2 Dec 8th, 2003, 3:30pm AE-105 Katrin Kirchhoff discussion notes (updated Dec 9, 2003) This is last meeting of quarter.

Partial (and growing) Lists of Suggested Papers

(please send email with any others that you think should be added to the list).

Introductory/Classical Papers in Statistical MT

  1. Chapter from Russell & Norvig's book " AI: A Modern Approach, 2nd Edition". The 2nd editions chapter on MT is very good.
  2. Chapter on MT from Jurafsky & Martin's book "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition"
  3. A Statistical MT Tutorial Workbook, Kevin Knight, unpublished, August 1999 pdf link
  4. P.F. Brown et al., A statistical approach to machine translation, Computational Linguistics, 16, 1990, 79-85 link
  5. S. Della Pietra and M. Epstein and S. Roukos and T. Ward, Fertility models for statistical natural language understanding, Proceedings of ACL, 1997, 168-173 link
  6. The Candide System for Machine Translation (1994), Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry Printz, Lubos Ures link
  7. P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19:263--312, June 1993.link
  8. Kenji Yamada's Intro to Statistical Machine Translation Notes link
  9. Knight and Koehn HLT tutorial: "What's New in Statistical Machine Translation" link

Evaluation of MT quality

  1. K. Papineni and S. Roukos and T. Ward and W. Zhu, Bleu: a method for automatic evaluation of machine translation, 2001, IBM Research Report RC22176 link
  2. Hovy E., King M. & Popescu-Belis A. (2002) - An Introduction to MT Evaluation. LREC 2002 Workshop "Machine Translation Evaluation: Human Evaluators Meet Automated Metrics", Las Palmas de Gran Canaria, Spain, p.1-7. link
  3. Culy, Christopher and Riehemann, Susanne. 2003. The Limits of N-Gram Translation Evaluation Metrics. Proceedings of MT Summit IX. link
  4. FEMTI, the Framework for Machine Translation Evaluation in ISLE, a structured repertoire of methods used to evaluate MT systems. link1 and link2
  5. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics, NIST document. link
  6. A Novel String-to-String Distance Measure With Applications to Machine Translation Evaluation, by Leusch, Ueffing, and Ney. link
  7. Orange: a Method for Evaluating Automatic Metrics for Machine Translation, by Lim and Och, Coling04, link

Using higher level structure

  1. A Syntax-based Statistical Translation Model (2001) Kenji Yamada, Kevin Knight Meeting of the Association for Computational Linguistics link
  2. Modeling with Structures in Statistical Machine Translation (1997) Ye-Yi Wang, Alex Waibel COLING-ACL link
  3. Statistical Phrase-Based Translation, Philipp Koehn, Franz Josef Och, and Daniel Marcu, HLT/NAACL 2003 link
  4. A phrase-based joint probability model for statistical machine translation, D. Marcu, Proceedings of EMNLP, 2002 link
  5. Marcu, D. and Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP.h
  6. Och, F. J., Tillmann, C., and Ney, H. (1999). Improved alignment models for statistical machine translation. In Proc. of the Joint Conf. of Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20-28. link
  7. A Syntax-Based Statistical Translation Model" (K. Yamada and K. Knight), Proc. of the Conference of the Association for Computational Linguistics (ACL), 2001. link
  8. A Decoder for Syntax-Based Statistical MT" (K. Yamada and K. Knight), Proc. of the Conference of the Association for Computational Linguistics (ACL), 2002. link
  9. Syntax-based Language Models for Machine Translation, E. Charniak, K. Knight, and K. Yamada, Proc. MT Summit IX, 2003. link
  10. Wang and Waibel, "Modeling with structures in statistical machine translation", Proceedings of COLING-ACL, 1998 link
  11. Inducing Translation Templates for Example-Based Machine Translation, by Michael Carl. link
  12. Reordering Constraints for Phrase-based Statistitical MT, by Richard Zens, Hermann Ney, Taro Watanabe, and Eiichiro Sumita, from Coling'04 link
  13. Improving a Statistical MT System with Automatically Learned Rewrite Patterns, by Fei Xia and Michael McCord, COLING-2004. link
  14. Dan Melamed (2004). Algorithms for Syntax-Aware Statistical Machine Translation PS PDF Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation (TMI'04), Baltimore, MD.

Search/Decoding/Alignment

  1. Decoding Algorithm in Statistical Machine Translation , Ye-Yi Wang and Alex Waibel Language Technology Institute School of Computer. Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics link
  2. Search algorithms for statistical machine translation based on dynamic programming and pruning techniques, I. Garcia-Varea and F. Casacuberta, Proceedings of the MT Summit VIII Workshop, 2001, link
  3. Germann, U., Jahr, M., Knight, K., Marcu, D., and Yamada, K. (2001). Fast decoding and optimal decoding for machine translation. In Proceedings of ACL 39. link
  4. Vogel, Ney and Tillmann, "HMM-based word alignment in statistical translation", Proceedings of COLING 1996 link
  5. Och and Ney, "A comparison of alignment models for statistical machine translation", Proceedings of COLING 2000 link
  6. An Iterative, DP-based Search Algorithm for Statistical Machine Translation, Ismael Garccia-Varea, Francisco Casacuber, and Hermann Ney, ICSLP'98 link
  7. A DP based Search Using Monotone Alignments in Statistical Translation (1997) C. Tillmann, S. Vogel, H. Ney, A. Zubiaga link
  8. A DP based Search Algorithm for Statistical Machine Translation (1998) S. Niessen, S. Vogel, H. Ney, C. Tillmann link
  9. Accelerated Dp Based Search For Statistical Translation (1997) C. Tillmann, S. Vogel, H. Ney, A. Zubiaga, H. Sawaf link
  10. A Polynomial-Time Algorithm for Statistical Machine Translation Dekai Wu link

Other Knowledge Sources

  1. Knowledge Sources for Word-Level Translation Models, P. Koehn and K. Knight, link
  2. Morpho-Syntactic Analysis for Reordering in Statistical Machine Translation (2001), Sonja Nieen, Hermann Ney link
  3. Ismael Garcia Varea, Franz Josef Och, Hermann Ney, Francisco Casacuberta. "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach". In: "ACL 2001: Proc. of the 39th Annual Meeting of the Association for Computational Linguistics", pp. 204-211, Toulouse, France, July 2001. link<
  4. Hearne, Mary and Way, Andy. 2003. Seeing the Wood for the Trees: Data-Oriented Translation. Proceedings of MT Summit IX. link

Alignment

  1. Improved Alignment Models for Statistical Machine Translation (1999), Franz Josef Och, Christoph Tillmann, Hermann Ney link
  2. Loosely Tree-Based Alignment for Machine Translation, by Dan Gildea link (also see here for a JHU'03 presentation on the subject).

Maxent MT models

  1. A Maximum Entropy/Minimum Divergence Translation Model (2000) George Foster, link

Statistical Language Models

  1. Class-based n-gram models of natural language by P. Brown, V. Della Pietra, P. deSouza, and J. Lai link
  2. Language Model Adaptation for Statistical Machine Translation via Structured Query Models, by Bing Zhao, Matthias Eck and Stephan Vogel link

Other/All of the above/General/Specific

  1. Statistical Multi-Source Translation, Franz Josef Och and Hermann Ney link
  2. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation, Franz Josef Och and Hermann Ney, Proc. 40th Annual Meeting of the ACL, Philadelphia, 2002 link
  3. Minimum Error Rate Training in Statistical Machine Translation pp. 160-167 Franz Josef Och, ACL-2003 link
  4. Franz Josef Och, Hermann Ney. What Can Machine Translation Learn from Speech Recognition? In: "Workshop: MT 2010 - Towards a Road Map for MT", pp. 26-31, Santiago de Compostela, Spain, September 2001. link
  5. Models of Translational Equivalence among Words, I. Dan Melamed link
  6. Towards a Unified Approach to Memory- and Statistical-Based Machine Translation, Daniel Marcu link
  7. Machine Transliteration, (K. Knight and J. Graehl), Computational Linguistics, 24(4), 1998. link
  8. Alshawi, H. and Douglas, S., ``Learning Dependency Transduction Models from Unannotated Examples,'' Phil Trans. of the Royal Society, Series A: Mathematical, Physical and Engineering Sciences, vol. 358, 2000, pp. 1357-1372. link, pdf link
  9. "Machine Transliteration," (K. Knight and J. Graehl), Proc. of the Conference of the Association for Computational Linguistics (ACL), 1997.link
  10. Confidence Estimation for Machine Translation, John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis and Nicola Ueffing, pp. 315-321, Coling 04, link
  11. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, P. Duygulu, K. Barnard, J.F.G. de Freitas and D.A. Forsyth, 7th European Conference on Computer Vision (Best paper award winner). link

Traditional Symbolic AI papers on MT (Example & Interlingual based MT)

  1. Examples and Prospects of Example Based MT E. Sumita, H. Iida, H. Kohyama (ATR). Proc. ACL 1991 link
  2. Example Based MT in the Pangloss system RD Brown (CMU). Proc. ACL 1996 link

Relavant Calls for papers

The following sites list recent and future CFPs in the area of machine translation

Related links


Maintained by Jeff Bilmes/Katrin Kirchhoff
This material is based upon work supported by the National Science Foundation under Grant No. 0326276. Last updated: $Date: 2006/02/21 02:24:13 $