# Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Neil D. Lawrence, University of Sheffield

### Errata

• Page 9: just before Section 5. The inequality should be $N>>d$ not \$N<Thanks to: Shaobo Hou

#### Abstract

Summarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling.

  @TechReport{lawrence-gplvmtech04, title = {Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models}, author = {Neil D. Lawrence}, year = {2004}, institution = {Department of Computer Science, University of Sheffield}, number = {CS-04-08}, month = {00}, edit = {https://github.com/lawrennd//publications/edit/gh-pages/_posts/2004-01-01-lawrence-gplvmtech04.md}, url = {http://inverseprobability.com/publications/lawrence-gplvmtech04.html}, abstract = {Summarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling.}, key = {Lawrence:gplvmTech04}, linkpsgz = {ftp://ftp.dcs.shef.ac.uk/home/neil/nlpca.ps.gz}, linksoftware = {http://inverseprobability.com/gplvm/}, group = {shefml,dimensional reduction} }
 %T Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models %A Neil D. Lawrence %B %D %F lawrence-gplvmtech04 %P -- %R %U http://inverseprobability.com/publications/lawrence-gplvmtech04.html %N CS-04-08 %X Summarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling. 
 TY - CPAPER TI - Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models AU - Neil D. Lawrence PY - 2004/01/01 DA - 2004/01/01 ID - lawrence-gplvmtech04 SP - EP - UR - http://inverseprobability.com/publications/lawrence-gplvmtech04.html AB - Summarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling. ER - 
 Lawrence, N.D.. (2004). Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models.(CS-04-08):-