Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Neil D. Lawrence
,  6:1783-1816, 2005.

Abstract

Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v-lawrence-pnpca05, title = {Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models}, author = {Neil D. Lawrence}, pages = {1783--1816}, year = {}, editor = {}, volume = {6}, url = {http://inverseprobability.com/publications/lawrence-pnpca05.html}, abstract = {Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.} }
Endnote
%0 Conference Paper %T Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models %A Neil D. Lawrence %B %C Proceedings of Machine Learning Research %D %E %F pmlr-v-lawrence-pnpca05 %I PMLR %J Proceedings of Machine Learning Research %P 1783--1816 %U http://inverseprobability.com %V %W PMLR %X Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.
RIS
TY - CPAPER TI - Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models AU - Neil D. Lawrence BT - PY - DA - ED - ID - pmlr-v-lawrence-pnpca05 PB - PMLR SP - 1783 DP - PMLR EP - 1816 L1 - UR - http://inverseprobability.com/publications/lawrence-pnpca05.html AB - Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets. ER -
APA
Lawrence, N.D.. (). Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models. , in PMLR :1783-1816

Related Material