Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Nicoló FusiOliver StegleNeil D. Lawrence
, 2011.

Abstract

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v-fusi-accurate11, title = {Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects}, author = {Nicoló Fusi and Oliver Stegle and Neil D. Lawrence}, year = {}, editor = {}, url = {http://inverseprobability.com/publications/fusi-accurate11.html}, abstract = {Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.} }
Endnote
%0 Conference Paper %T Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects %A Nicoló Fusi %A Oliver Stegle %A Neil D. Lawrence %B %C Proceedings of Machine Learning Research %D %E %F pmlr-v-fusi-accurate11 %I PMLR %J Proceedings of Machine Learning Research %P -- %R 10101/npre.2011.5995.1 %U http://inverseprobability.com %V %W PMLR %X Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.
RIS
TY - CPAPER TI - Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects AU - Nicoló Fusi AU - Oliver Stegle AU - Neil D. Lawrence BT - PY - DA - ED - ID - pmlr-v-fusi-accurate11 PB - PMLR SP - DP - PMLR EP - DO - 10101/npre.2011.5995.1 L1 - UR - http://inverseprobability.com/publications/fusi-accurate11.html AB - Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. ER -
APA
Fusi, N., Stegle, O. & Lawrence, N.D.. (). Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects. , in PMLR :-

Related Material