Robust Speech Recognition

Recognition under observation uncertainty
Statistical Speech Processing
Blind Source Separation
Audiovisual Speech Recognition

Recognition under observation uncertainty

To recognize speech reliably in noisy or reverberant environments, it is important that the acoustic distortions are modeled as precisely as possible, which we achieve with the help of multichannel information and by analyzing the spectro-temporal evolution of the microphone signal(s).

The distortion model is used firstly to improve the audio signal quality itself (Statistical Speech Processing). Despite this quality enhancement, residual errors remain in the signal, and, depending on the speech processing techniques, some new artifacts may be introduced.

To obtain the best possible speech recognition results despite the sub-optimal signal quality, we transmit an estimate of the remaining errors to the recognition engine. This allows the recognizer to lay more focus on the more reliable components of the signal, and to reduce the influence of distorted components accordingly.

Several algorithms for realizing such a tight coupling between speech processing and speech recognition are described in Chapter_UncertaintyOfObservation.pdf (in: „Robust Speech Recognition of Uncertain or Missing Data - Theory and Applications“, Springer Verlag, July 2011).