Statistical Modelling

Recent progress in speech enhancement relies to a large extent on improved statistical modeling. In recent years we have developed estimators based on supergaussian (leptocurtic) probability densities. When the span of correlation of a speech signal is larger than the length of the window used for spectral analysis the spectral coefficients are not necessarily Gaussian distributed. Minimum Mean Square Error estimators as well as soft-decision weighting functions were derived for various combinations of Gaussian, Laplacian, and Gamma densities of speech and noise coefficients. Compared to the well known Gaussian solutions these estimators lead to improved performance.

The two Figures below show a histogram (shaded areas) of the real part of DFT coefficients for undisturbed speech and three model densities. The dotted, the dashed, and the solid lines depict the Gaussian, the Laplacian, and the Gamma densities, respectively. The Figure on the right hand side shows an enlargement in the range of positive DFT values. We conclude that the Gaussian density does not provide a good fit to the observed data. Apparently, the speech data follows a heavy-tailed distribution.

real part of speech DFT coefficient

Compared to the Wiener filter, i.e., the Gaussian model, estimators based on supergaussian densities show a number of interesting properties. The Figure below shows the gain function for three different a priori SNR values. For an SNR of 0 dB we observe that the supergaussian model results in less attenuation when the input amplitudes are large. In this case it is highly likely that speech is present and therefore less attenuation is beneficial. For small input amplitudes the supergaussian estimator provide more attenuation and thus a larger noise reduction.

real part of noise DFT coefficient


Martin, R.: Speech Enhancement based on Minimum Mean Square Error Estimation and Supergaussian Priors, IEEE Trans. Speech and Audio Processing, vol. 13, No. 5, pp. 845-856, 2005

Breithaupt, C; Martin, R.: MMSE Estimation of Magnitude-Squared DFT Coefficients with Supergaussian Priors. Proc. IEEE Intl. Conf. Acoustics, Speech, Signal Processing (ICASSP), 2003.

Martin, R. and Breithaupt, C.: Speech Enhancement in the DFT Domain Using Laplacian Speech Priors, Proc. Intl. Workshop on Acoustic Echo and Noise Control, pp. 87-90, 2003

Martin, R.: Speech Enhancement Using MMSE Short Time Spectral Estimation with Gamma Distributed Speech Priors, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 253-256, 2002