Model-driven detection of Clean Speech Patches in Noise

[edit]

Jonathan Laidler, Plusnet, Sheffield
Martin Cooke, University of the Basque Country
Neil D. Lawrence, University of Sheffield

in Proceedings of Interspeech 2007

Related Material

Abstract

Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.


@InProceedings{laidler-model07,
  title = 	 {Model-driven detection of Clean Speech Patches in Noise},
  author = 	 {Jonathan Laidler and Martin Cooke and Neil D. Lawrence},
  booktitle = 	 {Proceedings of Interspeech 2007},
  year = 	 {2007},
  month = 	 {00},
  edit = 	 {https://github.com/lawrennd//publications/edit/gh-pages/_posts/2007-08-01-laidler-model07.md},
  url =  	 {http://inverseprobability.com/publications/laidler-model07.html},
  abstract = 	 {Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.},
  key = 	 {Laidler:model07},
  note = 	 {},
  linkpdf = 	 {ftp://ftp.dcs.shef.ac.uk/home/neil/LaidlerInterspeech2007.pdf},
  group = 	 {speech separation, glimpsing, model-driven, spectro-temporal patches}
 

}
%T Model-driven detection of Clean Speech Patches in Noise
%A Jonathan Laidler and Martin Cooke and Neil D. Lawrence
%B 
%C Proceedings of Interspeech 2007
%D 
%F laidler-model07	
%P --
%R 
%U http://inverseprobability.com/publications/laidler-model07.html
%X Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.
TY  - CPAPER
TI  - Model-driven detection of Clean Speech Patches in Noise
AU  - Jonathan Laidler
AU  - Martin Cooke
AU  - Neil D. Lawrence
BT  - Proceedings of Interspeech 2007
PY  - 2007/08/01
DA  - 2007/08/01	
ID  - laidler-model07	
SP  - 
EP  - 
L1  - ftp://ftp.dcs.shef.ac.uk/home/neil/LaidlerInterspeech2007.pdf
UR  - http://inverseprobability.com/publications/laidler-model07.html
AB  - Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.
ER  -

Laidler, J., Cooke, M. & Lawrence, N.D.. (2007). Model-driven detection of Clean Speech Patches in Noise. Proceedings of Interspeech 2007 :-