# Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities

Guido Sanguinetti, University of Edinburgh
Neil D. Lawrence, University of Sheffield
Magnus Rattray, University of Manchester

Bioinformatics 22, pp 2275-2281

### Errata

• Equation (5) the normal density for $y_n(t)$ should have existing term $+ \mu_n(t)$ for the mean.
Thanks to: Kevin Sharp
• Equation (7), the two terms that involve $\mathbf{K}$ should be outside the sum over $n$. There should be no $T$ in the term $Tq\log(\alpha^2)$.
Thanks to: Junfeng Chen
• Equation (8), $\alpha^2$ should be $\alpha^{-2}$.
Thanks to: Junfeng Chen
• Equation (10), second line, there should be $T$ before the $\sigma^{-2}$.
Thanks to: Junfeng Chen
• Equation (11), sums should be normalised, first one by $Nq$, second one by $NT$.
Thanks to: Junfeng Chen

#### Abstract

Motivation: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\ \ Results: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\ \ Availability: MATLAB code is available from http://umber.sbs.man.ac.uk/resources/puma.

  @Article{sanguinetti-chipvar06, title = {Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities}, journal = {Bioinformatics}, author = {Guido Sanguinetti and Neil D. Lawrence and Magnus Rattray}, pages = {2275}, year = {2006}, volume = {22}, number = {22}, month = {00}, edit = {https://github.com/lawrennd//publications/edit/gh-pages/_posts/2006-01-01-sanguinetti-chipvar06.md}, url = {http://inverseprobability.com/publications/sanguinetti-chipvar06.html}, abstract = {**Motivation**: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\ \ **Results**: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\ \ **Availability**: MATLAB code is available from .}, key = {Sanguinetti-chipvar06}, doi = {10.1093/bioinformatics/btl473}, linkpdf = {http://bioinformatics.oxfordjournals.org/cgi/reprint/btl473v1}, linksoftware = {http://inverseprobability.com/chipvar/}, group = {shefml,puma,gene networks} }
 %T Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities %A Guido Sanguinetti and Neil D. Lawrence and Magnus Rattray %B %C Bioinformatics %D %F sanguinetti-chipvar06 %J Bioinformatics %P 2275--2281 %R 10.1093/bioinformatics/btl473 %U http://inverseprobability.com/publications/sanguinetti-chipvar06.html %V 22 %N 22 %X **Motivation**: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\ \ **Results**: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\ \ **Availability**: MATLAB code is available from . 
 TY - CPAPER TI - Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities AU - Guido Sanguinetti AU - Neil D. Lawrence AU - Magnus Rattray PY - 2006/01/01 DA - 2006/01/01 ID - sanguinetti-chipvar06 SP - 2275 EP - 2281 DO - 10.1093/bioinformatics/btl473 L1 - http://bioinformatics.oxfordjournals.org/cgi/reprint/btl473v1 UR - http://inverseprobability.com/publications/sanguinetti-chipvar06.html AB - **Motivation**: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\ \ **Results**: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\ \ **Availability**: MATLAB code is available from . ER - 
 Sanguinetti, G., Lawrence, N.D. & Rattray, M.. (2006). Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities. Bioinformatics 22(22):2275-2281