Automatic Speech Recognition and Computational Modeling of
Much of my work has been on acoustic-phonetic modeling in
automatic speech recognition. I am interested in how to find suitable
representations of acoustic-phonetic categories, how to model
phenomena such as pronunciation variation and coarticulation, and how
to best exploit training data. In the past I have primarily explored
articulatory feature representations for both ASR and automatic language
identification. More recently, I have developed an interest in semi-supervised
graph-based machine learning methods for aoustic
modeling (more info here ).
Statistical Language Modeling
Motivated by the problem of language modeling for
morphologically rich languages, I have been working on the
development of Factored Language Models (FLMs), which
represent words as sets of feature vectors and derive more
robust probability estimates by way of a Generalized Backoff
procedure that makes use of the feature structure. A
relatively short description can be
found here ; a more
extensive tutorial can be found
here . FLMs have been integrated into
the SRI language modeling toolkit
and have been applied to language modeling for a variety of
languages. Our own work has mostly focused
My students and I have been active in the development of machine translation systems for spoken and written language.
In our research we have explored morphological and factored language models for statistical machine translation.
More recently I have become interested in modeling the global situational
and discourse context of a document or an interaction to improve translation performance
(see our past project on Contextual Machine Translation ) and in
machine translation for applications in the health domain (see our current TransPHorm project).
Multilingual Speech and Language Processing
I am interested in all forms of statistical speech and language processing for non-English languages. My past work in this area includes automatic speech recognition for Arabic, language identification on a variety of languages, language modeling for Turkish and Arabic, machine translation for Spanish, Italian, German, Finnish, French, Arabic, and Chinese, lexicon development for dialectal Arabic, etc., and many other projects.
Speech and Language Processing for Biomedical and Health Applications
Recently I have developed an interest in applying statistical speech
and language processing to improve access to health information and
to biomedical research problems. My main project in this area is the TransPHorm
project. I have also worked on applying natural language processing
representations to the problem of peptide identification.
Human and Machine Learning in Speech Processing
Another pet interest of mine is how human learners (i.e. infants) acquire speech, how machines can be made to learn models of speech, and in the parallels and differences between the two processes.
Click on the links below to find our more about my past and current research projects.