Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation

Zhenwen Dai; Mudassar Iqbal; Neil D. Lawrence; Magnus Rattray

doi:10.1093/bioinformatics/btx508

edit

Back to publications

Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation

Zhenwen Dai, Mudassar Iqbal, Neil D. Lawrence, Magnus Rattray

Bioinformatics, 23:3776-3783, 2017.

Abstract

Motivation

Regulation of gene expression in prokaryotes involves complex co-regulatory mechanisms involving large numbers of transcriptional regulatory proteins and their target genes. Uncovering these genome-scale interactions constitutes a major bottleneck in systems biology. Sparse latent factor models, assuming activity of transcription factors (TFs) as unobserved, provide a biologically interpretable modelling framework, integrating gene expression and genome-wide binding data, but at the same time pose a hard computational inference problem. Existing probabilistic inference methods for such models rely on subjective filtering and suffer from scalability issues, thus are not well-suited for realistic genome-scale applications.

Results

We present a fast Bayesian sparse factor model, which takes input gene expression and binding sites data, either from ChIP-seq experiments or motif predictions, and outputs active TF-gene links as well as latent TF activities. Our method employs an efficient variational Bayes scheme for model inference enabling its application to large datasets which was not feasible with existing MCMC-based inference methods for such models. We validate our method on synthetic data against a similar model in the literature, employing MCMC for inference, and obtain comparable results with a small fraction of the computational time. We also apply our method to large-scale data from Mycobacterium tuberculosis involving ChIP-seq data on 113 TFs and matched gene expression data for 3863 putative target genes. We evaluate our predictions using an independent transcriptomics experiment involving over-expression of TFs.

Availability and implementation

An easy-to-use Jupyter notebook demo of our method with data is available at https://github.com/zhenwendai/SITAR.

Supplementary information

Supplementary data are available at Bioinformatics online.

Links

Cite this Paper

BibTeX


@Article{Dai-sparse18,
  title = 	 {Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation},
  author = 	 {Dai, Zhenwen and Iqbal, Mudassar and Lawrence, Neil D. and Rattray, Magnus},
  journal =      {Bioinformatics},
  pages = 	 {3776--3783},
  year = 	 {2017},
  volume = 	 {23},
  doi = 	 {10.1093/bioinformatics/btx508},
  pdf = 	 {https://academic.oup.com/bioinformatics/article-pdf/33/23/3776/22031627/btx508.pdf},
  url = 	 {/publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.html},
  abstract = 	 {Motivation

Regulation of gene expression in prokaryotes involves complex
co-regulatory mechanisms involving large numbers of transcriptional
regulatory proteins and their target genes. Uncovering these
genome-scale interactions constitutes a major bottleneck in systems
biology. Sparse latent factor models, assuming activity of
transcription factors (TFs) as unobserved, provide a biologically
interpretable modelling framework, integrating gene expression and
genome-wide binding data, but at the same time pose a hard
computational inference problem. Existing probabilistic inference
methods for such models rely on subjective filtering and suffer from
scalability issues, thus are not well-suited for realistic
genome-scale applications.

Results

We present a fast Bayesian sparse factor model, which takes input
gene expression and binding sites data, either from ChIP-seq
experiments or motif predictions, and outputs active TF-gene links
as well as latent TF activities. Our method employs an efficient
variational Bayes scheme for model inference enabling its
application to large datasets which was not feasible with existing
MCMC-based inference methods for such models. We validate our method
on synthetic data against a similar model in the literature,
employing MCMC for inference, and obtain comparable results with a
small fraction of the computational time. We also apply our method
to large-scale data from Mycobacterium tuberculosis involving
ChIP-seq data on 113 TFs and matched gene expression data for 3863
putative target genes. We evaluate our predictions using an
independent transcriptomics experiment involving over-expression of
TFs.

Availability and implementation

An easy-to-use Jupyter notebook demo of our method with data is
available at https://github.com/zhenwendai/SITAR.

Supplementary information

Supplementary data are available at Bioinformatics online.
}
}

Endnote

%0 Journal Article
%T Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation
%A Zhenwen Dai
%A Mudassar Iqbal
%A Neil D. Lawrence
%A Magnus Rattray
%J Bioinformatics
%D 2017	
%F Dai-sparse18
%P 3776--3783
%R 10.1093/bioinformatics/btx508
%U /publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.html
%V 23
%X Motivation

Regulation of gene expression in prokaryotes involves complex
co-regulatory mechanisms involving large numbers of transcriptional
regulatory proteins and their target genes. Uncovering these
genome-scale interactions constitutes a major bottleneck in systems
biology. Sparse latent factor models, assuming activity of
transcription factors (TFs) as unobserved, provide a biologically
interpretable modelling framework, integrating gene expression and
genome-wide binding data, but at the same time pose a hard
computational inference problem. Existing probabilistic inference
methods for such models rely on subjective filtering and suffer from
scalability issues, thus are not well-suited for realistic
genome-scale applications.

Results

We present a fast Bayesian sparse factor model, which takes input
gene expression and binding sites data, either from ChIP-seq
experiments or motif predictions, and outputs active TF-gene links
as well as latent TF activities. Our method employs an efficient
variational Bayes scheme for model inference enabling its
application to large datasets which was not feasible with existing
MCMC-based inference methods for such models. We validate our method
on synthetic data against a similar model in the literature,
employing MCMC for inference, and obtain comparable results with a
small fraction of the computational time. We also apply our method
to large-scale data from Mycobacterium tuberculosis involving
ChIP-seq data on 113 TFs and matched gene expression data for 3863
putative target genes. We evaluate our predictions using an
independent transcriptomics experiment involving over-expression of
TFs.

Availability and implementation

An easy-to-use Jupyter notebook demo of our method with data is
available at https://github.com/zhenwendai/SITAR.

Supplementary information

Supplementary data are available at Bioinformatics online.

RIS


TY  - JOUR
TI  - Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation
AU  - Zhenwen Dai
AU  - Mudassar Iqbal
AU  - Neil D. Lawrence
AU  - Magnus Rattray
DA  - 2017/08/26	
ID  - Dai-sparse18
VL  - 23
SP  - 3776
EP  - 3783
DO  - 10.1093/bioinformatics/btx508
L1  - https://academic.oup.com/bioinformatics/article-pdf/33/23/3776/22031627/btx508.pdf
UR  - /publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.html
AB  - Motivation

Regulation of gene expression in prokaryotes involves complex
co-regulatory mechanisms involving large numbers of transcriptional
regulatory proteins and their target genes. Uncovering these
genome-scale interactions constitutes a major bottleneck in systems
biology. Sparse latent factor models, assuming activity of
transcription factors (TFs) as unobserved, provide a biologically
interpretable modelling framework, integrating gene expression and
genome-wide binding data, but at the same time pose a hard
computational inference problem. Existing probabilistic inference
methods for such models rely on subjective filtering and suffer from
scalability issues, thus are not well-suited for realistic
genome-scale applications.

Results

We present a fast Bayesian sparse factor model, which takes input
gene expression and binding sites data, either from ChIP-seq
experiments or motif predictions, and outputs active TF-gene links
as well as latent TF activities. Our method employs an efficient
variational Bayes scheme for model inference enabling its
application to large datasets which was not feasible with existing
MCMC-based inference methods for such models. We validate our method
on synthetic data against a similar model in the literature,
employing MCMC for inference, and obtain comparable results with a
small fraction of the computational time. We also apply our method
to large-scale data from Mycobacterium tuberculosis involving
ChIP-seq data on 113 TFs and matched gene expression data for 3863
putative target genes. We evaluate our predictions using an
independent transcriptomics experiment involving over-expression of
TFs.

Availability and implementation

An easy-to-use Jupyter notebook demo of our method with data is
available at https://github.com/zhenwendai/SITAR.

Supplementary information

Supplementary data are available at Bioinformatics online.

ER  -

APA


Dai, Z., Iqbal, M., Lawrence, N.D. & Rattray, M.. (2017). Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation. Bioinformatics 23:3776-3783 doi:10.1093/bioinformatics/btx508 Available from /publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.html.