Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters

James Hensman; Neil D. Lawrence; Magnus Rattray

doi:doi:10.1186/1471-2105-14-252

edit

Back to publications

Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters

James Hensman, Neil D. Lawrence, Magnus Rattray

BMC Bioinformatics, 14(252), 2013.

Abstract

Background

Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications.

Results

We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method\u2019s capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method\u2019s ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications.

Conclusion

The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors’ website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

Links

Cite this Paper

BibTeX


@Article{Hensman-hierarchical13,
  title = 	 {Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters},
  author = 	 {Hensman, James and Lawrence, Neil D. and Rattray, Magnus},
  journal =      {BMC Bioinformatics},
  year = 	 {2013},
  volume = 	 {14},
  number =       {252},
  doi = 	 {doi:10.1186/1471-2105-14-252},
  pdf = 	 {https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-14-252.pdf},
  url = 	 {/publications/hierarchical-bayesian-modelling-of-gene-expression-time-series-across-irregularly-sampled-replicates-and-clusters.html},
  abstract = 	 {**Background**

Time course data from microarrays and high-throughput
sequencing experiments require simple, computationally efficient and powerful statistical
models to extract meaningful biological signal, and for tasks such as data fusion
and clustering. Existing methodologies fail to capture either the temporal or replicated
nature of the experiments, and often impose constraints on the data collection process,
such as regularly spaced samples, or similar sampling schema across replications.

**Results**

We propose hierarchical Gaussian processes as a general model of gene expression time-series,
with application to a variety of problems. In particular, we illustrate the method\u2019s
capacity for missing data imputation, data fusion and clustering.The method can
impute data which is missing both systematically and at random: in a hold-out test
on real data, performance is significantly better than commonly used imputation
methods. The method\u2019s ability to model inter- and intra-cluster variance leads
to more biologically meaningful clusters. The approach removes the necessity for
evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset
with irregular replications.

**Conclusion**

The hierarchical Gaussian
process model provides an excellent statistical basis for several gene-expression
time-series tasks. It has only a few additional parameters over a regular GP, has
negligible additional complexity, is easily implemented and can be integrated into
several existing algorithms. Our experiments were implemented in python, and are
available from the authors' website: .
}
}

Endnote

%0 Journal Article
%T Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters
%A James Hensman
%A Neil D. Lawrence
%A Magnus Rattray
%J BMC Bioinformatics
%D 2013	
%F Hensman-hierarchical13
%R doi:10.1186/1471-2105-14-252
%U /publications/hierarchical-bayesian-modelling-of-gene-expression-time-series-across-irregularly-sampled-replicates-and-clusters.html
%V 14
%N 252
%X **Background**

Time course data from microarrays and high-throughput
sequencing experiments require simple, computationally efficient and powerful statistical
models to extract meaningful biological signal, and for tasks such as data fusion
and clustering. Existing methodologies fail to capture either the temporal or replicated
nature of the experiments, and often impose constraints on the data collection process,
such as regularly spaced samples, or similar sampling schema across replications.

**Results**

We propose hierarchical Gaussian processes as a general model of gene expression time-series,
with application to a variety of problems. In particular, we illustrate the method\u2019s
capacity for missing data imputation, data fusion and clustering.The method can
impute data which is missing both systematically and at random: in a hold-out test
on real data, performance is significantly better than commonly used imputation
methods. The method\u2019s ability to model inter- and intra-cluster variance leads
to more biologically meaningful clusters. The approach removes the necessity for
evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset
with irregular replications.

**Conclusion**

The hierarchical Gaussian
process model provides an excellent statistical basis for several gene-expression
time-series tasks. It has only a few additional parameters over a regular GP, has
negligible additional complexity, is easily implemented and can be integrated into
several existing algorithms. Our experiments were implemented in python, and are
available from the authors' website: .

RIS


TY  - JOUR
TI  - Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters
AU  - James Hensman
AU  - Neil D. Lawrence
AU  - Magnus Rattray
DA  - 2013/08/20	
ID  - Hensman-hierarchical13
VL  - 14
IS  - 252
DO  - doi:10.1186/1471-2105-14-252
L1  - https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-14-252.pdf
UR  - /publications/hierarchical-bayesian-modelling-of-gene-expression-time-series-across-irregularly-sampled-replicates-and-clusters.html
AB  - **Background**

Time course data from microarrays and high-throughput
sequencing experiments require simple, computationally efficient and powerful statistical
models to extract meaningful biological signal, and for tasks such as data fusion
and clustering. Existing methodologies fail to capture either the temporal or replicated
nature of the experiments, and often impose constraints on the data collection process,
such as regularly spaced samples, or similar sampling schema across replications.

**Results**

We propose hierarchical Gaussian processes as a general model of gene expression time-series,
with application to a variety of problems. In particular, we illustrate the method\u2019s
capacity for missing data imputation, data fusion and clustering.The method can
impute data which is missing both systematically and at random: in a hold-out test
on real data, performance is significantly better than commonly used imputation
methods. The method\u2019s ability to model inter- and intra-cluster variance leads
to more biologically meaningful clusters. The approach removes the necessity for
evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset
with irregular replications.

**Conclusion**

The hierarchical Gaussian
process model provides an excellent statistical basis for several gene-expression
time-series tasks. It has only a few additional parameters over a regular GP, has
negligible additional complexity, is easily implemented and can be integrated into
several existing algorithms. Our experiments were implemented in python, and are
available from the authors' website: .

ER  -

APA


Hensman, J., Lawrence, N.D. & Rattray, M.. (2013). Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters. BMC Bioinformatics 14(252) doi:doi:10.1186/1471-2105-14-252 Available from /publications/hierarchical-bayesian-modelling-of-gene-expression-time-series-across-irregularly-sampled-replicates-and-clusters.html.