# Model-driven detection of Clean Speech Patches in Noise

Jonathan Laidler, Plusnet, Sheffield
Martin Cooke, University of the Basque Country
Neil D. Lawrence, University of Sheffield

in Proceedings of Interspeech 2007

#### Abstract

Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.

  @InProceedings{laidler-model07, title = {Model-driven detection of Clean Speech Patches in Noise}, author = {Jonathan Laidler and Martin Cooke and Neil D. Lawrence}, booktitle = {Proceedings of Interspeech 2007}, year = {2007}, month = {00}, edit = {https://github.com/lawrennd//publications/edit/gh-pages/_posts/2007-08-01-laidler-model07.md}, url = {http://inverseprobability.com/publications/laidler-model07.html}, abstract = {Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.}, key = {Laidler:model07}, note = {}, linkpdf = {ftp://ftp.dcs.shef.ac.uk/home/neil/LaidlerInterspeech2007.pdf}, group = {speech separation, glimpsing, model-driven, spectro-temporal patches} }
 %T Model-driven detection of Clean Speech Patches in Noise %A Jonathan Laidler and Martin Cooke and Neil D. Lawrence %B %C Proceedings of Interspeech 2007 %D %F laidler-model07 %P -- %R %U http://inverseprobability.com/publications/laidler-model07.html %X Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination. 
 TY - CPAPER TI - Model-driven detection of Clean Speech Patches in Noise AU - Jonathan Laidler AU - Martin Cooke AU - Neil D. Lawrence BT - Proceedings of Interspeech 2007 PY - 2007/08/01 DA - 2007/08/01 ID - laidler-model07 SP - EP - L1 - ftp://ftp.dcs.shef.ac.uk/home/neil/LaidlerInterspeech2007.pdf UR - http://inverseprobability.com/publications/laidler-model07.html AB - Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination. ER - 
 Laidler, J., Cooke, M. & Lawrence, N.D.. (2007). Model-driven detection of Clean Speech Patches in Noise. Proceedings of Interspeech 2007 :-