[edit]

# Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data

Guido Sanguinetti, Marta Milo, Magnus Rattray, Neil D. Lawrence, 21(19):3748-3754, 2005.

#### Abstract

**Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from . The microarray data are deposited in the NCBI database.

#### Cite this Paper

BibTeX

```
@InProceedings{pmlr-v-sanguinetti-accounting05,
title = {Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data},
author = {Guido Sanguinetti and Marta Milo and Magnus Rattray and Neil D. Lawrence},
pages = {3748--3754},
year = {},
editor = {},
volume = {21},
number = {19},
url = {http://inverseprobability.com/publications/sanguinetti-accounting05.html},
abstract = {**Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from
```. The microarray data are deposited in the NCBI database.}
}

Endnote

```
%0 Conference Paper
%T Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data
%A Guido Sanguinetti
%A Marta Milo
%A Magnus Rattray
%A Neil D. Lawrence
%B
%C Proceedings of Machine Learning Research
%D
%E
%F pmlr-v-sanguinetti-accounting05
%I PMLR
%J Proceedings of Machine Learning Research
%P 3748--3754
%R 10.1093/bioinformatics/bti617
%U http://inverseprobability.com
%V
%N 19
%W PMLR
%X **Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from
```. The microarray data are deposited in the NCBI database.

RIS

```
TY - CPAPER
TI - Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data
AU - Guido Sanguinetti
AU - Marta Milo
AU - Magnus Rattray
AU - Neil D. Lawrence
BT -
PY -
DA -
ED -
ID - pmlr-v-sanguinetti-accounting05
PB - PMLR
SP - 3748
DP - PMLR
EP - 3754
DO - 10.1093/bioinformatics/bti617
L1 -
UR - http://inverseprobability.com/publications/sanguinetti-accounting05.html
AB - **Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from
```. The microarray data are deposited in the NCBI database.
ER -

APA

`Sanguinetti, G., Milo, M., Rattray, M. & Lawrence, N.D.. (). Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data. `*, in PMLR* (19):3748-3754

#### Related Material

BibTeX

```
@InProceedings{/sanguinetti-accounting05,
title = {Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data},
author = {Guido Sanguinetti and Marta Milo and Magnus Rattray and Neil D. Lawrence},
pages = {3748--3754},
year = {},
editor = {},
volume = {21},
number = {19},
url = {http://inverseprobability.com/publications/sanguinetti-accounting05.html},
abstract = {**Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from
```. The microarray data are deposited in the NCBI database.}
}

Endnote

```
%0 Conference Paper
%T Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data
%A Guido Sanguinetti
%A Marta Milo
%A Magnus Rattray
%A Neil D. Lawrence
%B
%C Proceedings of Machine Learning Research
%D
%E
%F /sanguinetti-accounting05
%I PMLR
%J Proceedings of Machine Learning Research
%P 3748--3754
%R 10.1093/bioinformatics/bti617
%U http://inverseprobability.com
%V
%N 19
%W PMLR
%X **Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from
```. The microarray data are deposited in the NCBI database.

RIS

```
TY - CPAPER
TI - Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data
AU - Guido Sanguinetti
AU - Marta Milo
AU - Magnus Rattray
AU - Neil D. Lawrence
BT -
PY -
DA -
ED -
ID - /sanguinetti-accounting05
PB - PMLR
SP - 3748
DP - PMLR
EP - 3754
DO - 10.1093/bioinformatics/bti617
L1 -
UR - http://inverseprobability.com/publications/sanguinetti-accounting05.html
AB - **Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
**Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
**Availability:** The software used in the paper is available from
```. The microarray data are deposited in the NCBI database.
ER -

APA

`Sanguinetti, G., Milo, M., Rattray, M. & Lawrence, N.D.. (). Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data. `*, in PMLR* (19):3748-3754