Robust Speech Recognition

Implementation
Recognition under observation uncertainty
Statistical Speech Processing
Blind Source Separation
Audiovisual Speech Recognition

Blind Source Separation

When multiple talkers are speaking simultaneously, blind source separation can be used to segregate their speech signals.

For this purpose, multiple microphone signals are recorded synchronously. These can be interpreted as weighted sums of all speech signals, each of which is convolved with the respective room transfer function. By assuming all sources to be statistically independent, it is often possible to infer the relevant characteristics of these room transfer functions, and to use this knowledge for obtaining estimates of all isolated speaker signals.

The quality of the separation depends on the relative positions of speakers and microphones, on possible background noise and on the reverberation time of the room. In anechoic chambers, best results are obtained:

Mixture 1 Mixture 2
Separation Result 1 Separation Result 2

while the separation in more realistic conditions, as e.g. in a driving car:

Mixture 3 Mixture 4
Separation Result 3 Separation Result 4

are the subject of ongoing work.

Details, also on coupling source separation and speech recognition, are described in Chapter_ICA.pdf (in: „Robust Speech Recognition of Uncertain or or Missing Data - Theory and Applications“, Springer Verlag, July 2011).