# Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Nicoló Fusi, Microsoft Research, New England
Oliver Stegle, European Bioinformatics Institute
Neil D. Lawrence, University of Sheffield

#### Abstract

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.

  @TechReport{fusi-accurate11, title = {Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects}, author = {Nicoló Fusi and Oliver Stegle and Neil D. Lawrence}, year = {2011}, institution = {Nature Precedings}, month = {00}, edit = {https://github.com/lawrennd//publications/edit/gh-pages/_posts/2011-01-01-fusi-accurate11.md}, url = {http://inverseprobability.com/publications/fusi-accurate11.html}, abstract = {Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.}, key = {Fusi:accurate11}, doi = {10101/npre.2011.5995.1}, OPTgroup = {} }
 %T Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects %A Nicoló Fusi and Oliver Stegle and Neil D. Lawrence %B %D %F fusi-accurate11 %P -- %R 10101/npre.2011.5995.1 %U http://inverseprobability.com/publications/fusi-accurate11.html %X Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. 
 TY - CPAPER TI - Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects AU - Nicoló Fusi AU - Oliver Stegle AU - Neil D. Lawrence PY - 2011/01/01 DA - 2011/01/01 ID - fusi-accurate11 SP - EP - DO - 10101/npre.2011.5995.1 UR - http://inverseprobability.com/publications/fusi-accurate11.html AB - Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\ \ Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\ \ We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. ER - 
 Fusi, N., Stegle, O. & Lawrence, N.D.. (2011). Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects.:-