Model-driven detection of Clean Speech Patches in Noise

Jonathan Laidler, Martin CookeNeil D. Lawrence
, 2007.

Abstract

Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.

Cite this Paper


BibTeX
@InProceedings{pmlr-v-laidler-model07, title = {Model-driven detection of Clean Speech Patches in Noise}, author = {Jonathan Laidler and Martin Cooke and Neil D. Lawrence}, year = {}, editor = {}, url = {http://inverseprobability.com/publications/laidler-model07.html}, abstract = {Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.} }
Endnote
%0 Conference Paper %T Model-driven detection of Clean Speech Patches in Noise %A Jonathan Laidler %A Martin Cooke %A Neil D. Lawrence %B %C Proceedings of Machine Learning Research %D %E %F pmlr-v-laidler-model07 %I PMLR %J Proceedings of Machine Learning Research %P -- %U http://inverseprobability.com %V %W PMLR %X Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.
RIS
TY - CPAPER TI - Model-driven detection of Clean Speech Patches in Noise AU - Jonathan Laidler AU - Martin Cooke AU - Neil D. Lawrence BT - PY - DA - ED - ID - pmlr-v-laidler-model07 PB - PMLR SP - DP - PMLR EP - L1 - UR - http://inverseprobability.com/publications/laidler-model07.html AB - Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination. ER -
APA
Laidler, J., Cooke, M. & Lawrence, N.D.. (). Model-driven detection of Clean Speech Patches in Noise. , in PMLR :-

Related Material