Neil Lawrence's Publications

Bayesian learning via neural Schrödinger–Föllmer flows

Wed, 23 Nov 2022 00:00:00 +0000

In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control. We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics. Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.

Challenges in Machine Learning Deployment: A Survey of Case Studies

Sat, 30 Apr 2022 00:00:00 +0000

In recent years, machine learning has transitioned from a field of academic research interest to a field capable of solving real-world business problems. However, the deployment of machine learning models in production systems can present a number of issues and concerns. This survey reviews published reports of deploying machine learning solutions in a variety of use cases, industries and applications and extracts practical considerations corresponding to stages of the machine learning deployment workflow. By mapping found challenges to the steps of the machine learning deployment workflow we show that practitioners face issues at each stage of the deployment process. The goal of this paper is to lay out a research agenda to explore approaches addressing these challenges.

Differentially Private Regression and Classification with Sparse Gaussian Processes

Fri, 08 Oct 2021 00:00:00 +0000

A continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published cloaking method, an approach that tackles the problem of providing privacy for the outputs of a training set. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and Gaussian processes in a practical manner.

Solving Schrödinger Bridges via Maximum Likelihood

Tue, 31 Aug 2021 00:00:00 +0000

The Schrödinger bridge problem (SBP) finds the most likely stochastic evolution between two probability distributions given a prior stochastic evolution. As well as applications in the natural sciences, problems of this kind have important applications in machine learning such as dataset alignment and hypothesis testing. Whilst the theory behind this problem is relatively mature, scalable numerical recipes to estimate the Schrödinger bridge remain an active area of research. Our main contribution is the proof of equivalence between solving the SBP and an autoregressive maximum likelihood estimation objective. This formulation circumvents many of the challenges of density estimation and enables direct application of successful machine learning techniques. We propose a numerical procedure to estimate SBPs using Gaussian process and demonstrate the practical usage of our approach in numerical simulations and experiments.

Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis

Thu, 03 Jun 2021 00:00:00 +0000

Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.

Decision-making with Uncertainty

Wed, 02 Dec 2020 00:00:00 +0000

In emergency situations like the coronavirus pandemic, decisions must be made quickly, with only partial information. But good decisions are still possible using risk–benefit analysis.

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

Sun, 26 Apr 2020 00:00:00 +0000

We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient.

Bottom-up Data Trusts: Disturbing the 'One Size Fits All' Approach to Data Governance

Tue, 01 Oct 2019 00:00:00 +0000

From the friends we make to the foods we like, via our shopping and sleeping habits, most aspects of our quotidian lives can now be turned into machine-readable data points. For those able to turn these data points into models predicting what we will do next, this data can be a source of wealth. For those keen to replace biased, fickle human decisions, this data—sometimes misleadingly—offers the promise of automated, increased accuracy. For those intent on modifying our behaviour, this data can help build a puppeteer's strings. As we move from one way of framing data governance challenges to another, salient answers change accordingly. Just like the wealth redistribution way of framing those challenges tends to be met with a property-based, 'it's *our* data' answer, when one frames the problem in terms of manipulation potential, dignity-based, human rights answers rightly prevail (via fairness and transparency-based answers to contestability concerns). Positive data-sharing aspirations tend to be raised within altogether different conversations from those aimed at addressing the above concerns. Our data Trusts proposal challenges these boundaries.

Variational Information Distillation for Knowledge Transfer

Sat, 15 Jun 2019 00:00:00 +0000

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer.

Transferring Knowledge across Learning Processes

Fri, 26 Apr 2019 00:00:00 +0000

In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at at higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate each task with a manifold on which the training process travels from initialization to final parameters and construct a meta-learning objective that minimizes the expected length of this path. Our framework leverages only information obtained during training and can be computed on the fly at negligible cost. We demonstrate that our framework outperforms competing methods, both in meta-learning and transfer learning, on a set of computer vision tasks. Finally, we demonstrate that Leap can transfer knowledge across learning processes in demanding reinforcement learning environments (Atari) that involve millions of gradient steps.

Intrinsic Gaussian Processes on Complex Constrained Domains

Fri, 19 Apr 2019 00:00:00 +0000

We propose a class of intrinsic Gaussian processes (GPs) for interpolation, regression and classification on manifolds with a primary focus on complex constrained domains or irregularly shaped spaces arising as subsets or submanifolds of $\Re$, $\Re^2$, $\Re^3$ and beyond. For example, intrinsic GPs can accommodate spatial domains arising as complex subsets of Euclidean space. Intrinsic GPs respect the potentially complex boundary or interior conditions as well as the intrinsic geometry of the spaces. The key novelty of the approach proposed is to utilize the relationship between heat kernels and the transition density of Brownian motion on manifolds for constructing and approximating valid and computationally feasible covariance kernels. This enables intrinsic GPs to be practically applied in great generality, whereas existing approaches for smoothing on constrained domains are limited to simple special cases. The broad utilities of the intrinsic GP approach are illustrated through simulation studies and data examples.

Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems

Mon, 08 Oct 2018 00:00:00 +0000

This paper is concerned with learning and stochastic control in physical systems that contain unknown input signals. These unknown signals are modeled as Gaussian processes (GP) with certain parameterized covariance structures. The resulting latent force models can be seen as hybrid models that contain a first-principle physical model part and a nonparametric GP model part. We briefly review the statistical inference and learning methods for this kind of models, introduce stochastic control methodology for these models, and provide new theoretical observability and controllability results for them.

Structured Variationally Auto-encoded Optimization

Tue, 03 Jul 2018 00:00:00 +0000

We tackle the problem of optimizing a black-box objective function defined over a highly-structured input space. This problem is ubiquitous in science and engineering. In machine learning, inferring the structure of a neural network or the Automatic Statistician (AS), where the optimal kernel combination for a Gaussian process is selected, are two important examples. We use the \as as a case study to describe our approach, that can be easily generalized to other domains. We propose an Structure Generating Variational Auto-encoder (SG-VAE) to embed the original space of kernel combinations into some low-dimensional continuous manifold where Bayesian optimization (BO) ideas are used. This is possible when structural knowledge of the problem is available, which can be given via a simulator or any other form of generating potentially good solutions. The right exploration-exploitation balance is imposed by propagating into the search the uncertainty of the latent space of the SG-VAE, that is computed using variational inference. The key aspect of our approach is that the SG-VAE can be used to bias the search towards relevant regions, making it suitable for transfer learning tasks. Several experiments in various application domains are used to illustrate the utility and generality of the approach described in this work.

Differentially Private Regression with Gaussian Processes

Sat, 31 Mar 2018 00:00:00 +0000

A major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Here we combine the provable privacy guarantees of the differential privacy framework with the flexibility of Gaussian processes (GPs). We propose a method using GPs to provide differentially private (DP) regression. We then improve this method by crafting the DP noise covariance structure to efficiently protect the training data, while minimising the scale of the added noise. We find that this cloaking method achieves the greatest accuracy, while still providing privacy guarantees, and offers practical DP for regression over multi-dimensional inputs. Together these methods provide a starter toolkit for combining differential privacy and GPs.

The Emergence of Organizing Structure in Conceptual Representation

Tue, 09 Jan 2018 00:00:00 +0000

Both scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form—where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach can learn intuitive organizations, including a tree for animals and a ring for the color circle, it assumes a strong inductive bias that considers only these particular forms, and each form is explicitly provided as initial knowledge. Here we introduce a new computational model of how organizing structure can be discovered, utilizing a broad hypothesis space with a preference for sparse connectivity. Given that the inductive bias is more general, the model's initial knowledge shows little qualitative resemblance to some of the discoveries it supports. As a consequence, the model can also learn complex structures for domains that lack intuitive description, as well as predict human property induction judgments without explicit structural forms. By allowing form to emerge from sparsity, our approach clarifies how both the richness and flexibility of human conceptual organization can coexist.

Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Tue, 05 Dec 2017 00:00:00 +0000

Often in machine learning, data are collected as a combination of multiple conditions, e.g., the voice recordings of multiple persons, each labeled with an ID. How could we build a model that captures the latent information related to these conditions and generalize to a new one with few data? We present a new model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and that allows to jointly model multiple conditions for regression and generalize to a new condition with a few data points at test time. LVMOGP infers the posteriors of Gaussian processes together with a latent space representing the information about different conditions. We derive an efficient variational inference method for LVMOGP, of which the computational complexity is as low as sparse Gaussian processes. We show that LVMOGP significantly outperforms related Gaussian process methods on various tasks with both synthetic and real data.

Efficient Inference for Sparse Latent Variable Models of Transcriptional Regulation

Sat, 26 Aug 2017 00:00:00 +0000

Motivation Regulation of gene expression in prokaryotes involves complex co-regulatory mechanisms involving large numbers of transcriptional regulatory proteins and their target genes. Uncovering these genome-scale interactions constitutes a major bottleneck in systems biology. Sparse latent factor models, assuming activity of transcription factors (TFs) as unobserved, provide a biologically interpretable modelling framework, integrating gene expression and genome-wide binding data, but at the same time pose a hard computational inference problem. Existing probabilistic inference methods for such models rely on subjective filtering and suffer from scalability issues, thus are not well-suited for realistic genome-scale applications. Results We present a fast Bayesian sparse factor model, which takes input gene expression and binding sites data, either from ChIP-seq experiments or motif predictions, and outputs active TF-gene links as well as latent TF activities. Our method employs an efficient variational Bayes scheme for model inference enabling its application to large datasets which was not feasible with existing MCMC-based inference methods for such models. We validate our method on synthetic data against a similar model in the literature, employing MCMC for inference, and obtain comparable results with a small fraction of the computational time. We also apply our method to large-scale data from Mycobacterium tuberculosis involving ChIP-seq data on 113 TFs and matched gene expression data for 3863 putative target genes. We evaluate our predictions using an independent transcriptomics experiment involving over-expression of TFs. Availability and implementation An easy-to-use Jupyter notebook demo of our method with data is available at https://github.com/zhenwendai/SITAR. Supplementary information Supplementary data are available at Bioinformatics online.

Preferential Bayesian Optimization

Mon, 17 Jul 2017 00:00:00 +0000

Bayesian optimization (BO) has emerged during the last few years as an effective approach to optimize black-box functions where direct queries of the objective are expensive. We consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) and that allows to find the optimum of a latent function that can only be queried through pairwise comparisons, so-called duels. PBO extend the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the the winner of each duel by means of Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments in which we show how the way correlations are modeled is the key ingredient to drastically reduce the number of comparisons to find the optimum of the latent function of interest.

Living Together: Mind and Machine Intelligence

Wed, 24 May 2017 00:00:00 +0000

In this commentary we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to *embodiment factors*. We define embodiment factors as the ratio between an entity's ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is not sentient intelligence we should fear but non-sentient intelligence.

Data Readiness Levels

Fri, 05 May 2017 00:00:00 +0000

Application of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing values, inconvenient storage mechanisms, intellectual property, security and privacy. All these aspects obstruct the sharing and interconnection of data, and the eventual interpretation of data through machine learning or other approaches. In project reporting, a major challenge is in encapsulating these problems and enabling goals to be built around the processing of data. Project overruns can occur due to failure to account for the amount of time required to curate and collate. But to understand these failures we need to have a common language for assessing the readiness of a particular data set. This position paper proposes the use of data readiness levels: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.

Manifold Alignment Determination: finding correspondences across different data views

Thu, 12 Jan 2017 00:00:00 +0000

We present Manifold Alignment Determination (MAD), an algorithm for learning alignments between data points from multiple views or modalities. The approach is capable of learning correspondences between views as well as correspondences between individual data-points. The proposed method requires only a few aligned examples from which it is capable to recover a global alignment through a probabilistic model. The strong, yet flexible regularization provided by the generative model is sufficient to align the views. We provide experiments on both synthetic and real data to highlight the benefit of the proposed approach.

Topslam: Waddington Landscape Recovery for Single Cell Experiments

Mon, 20 Jun 2016 00:00:00 +0000

We present an approach to estimating the nature of the Waddington (or epigenetic) landscape that underlies a population of individual cells. Through exploiting high resolution single cell transcription experiments we show that cells can be located on a landscape that reflects their differentiated nature. Our approach makes use of probabilistic non-linear dimensionality reduction that respects the topology of our estimated epigenetic landscape. In simulation studies and analyses of real data we show that the approach, known as , outperforms previous attempts to understand the differentiation landscape. Hereby, the novelty of our approach lies in the correction of distances *before* extracting ordering information. This gives the advantage over other attempts, which have to correct for extracted time lines by post processing or additional data.

Differentially Private Gaussian Processes

Thu, 02 Jun 2016 00:00:00 +0000

A major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Differential privacy is a framework which allows algorithms to have provable privacy guarantees. Gaussian processes are a widely used approach for dealing with uncertainty in functions. This paper explores differentially private mechanisms for Gaussian processes. We compare binning and adding noise before regression with adding noise post-regression. For the former we develop a new kernel for use with binned data. For the latter we show that using inducing inputs allows us to reduce the scale of the added perturbation. We find that, for the datasets used, adding noise to a binned dataset has superior accuracy. Together these methods provide a starter toolkit for combining differential privacy and Gaussian processes.

Chained Gaussian Processes

Mon, 02 May 2016 00:00:00 +0000

Gaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to an linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models “Chained Gaussian Processes”: the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions.

Recurrent Gaussian Processes

Mon, 02 May 2016 00:00:00 +0000

We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available.

GLASSES: Relieving The Myopia Of Bayesian Optimisation

Mon, 02 May 2016 00:00:00 +0000

We present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search. The majority of global optimisation approaches in use are myopic, in only considering the impact of the next function value; the non-myopic approaches that do exist are able to consider only a handful of future evaluations. Our novel algorithm, GLASSES, permits the consideration of dozens of evaluations into the future. This is done by approximating the ideal look-ahead loss function, which is expensive to evaluate, by a cheaper alternative in which the future steps of the algorithm are simulated beforehand. An Expectation Propagation algorithm is used to compute the expected value of the loss. We show that the far-horizon planning thus enabled leads to substantive performance gains in empirical tests.

Variationally Auto-Encoded Deep Gaussian Processes

Mon, 02 May 2016 00:00:00 +0000

We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.

Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis

Sun, 17 Apr 2016 00:00:00 +0000

Detecting Periodicities with Gaussian processes

Wed, 13 Apr 2016 00:00:00 +0000

We consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matérn family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in the arabidopsis genome.

Batch Bayesian Optimization via Local Penalization

Fri, 01 Jan 2016 00:00:00 +0000

The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These could either be computational or physical facets of the process being optimized. Batch methods, however, require the modeling of the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. We investigate this issue and propose a highly effective heuristic based on an estimate of the function's Lipschitz constant that captures the most important aspect of this interaction---local repulsion---at negligible computational overhead. A penalized acquisition function is used to collect batches of points minimizing the non-parallelizable computational effort. The resulting algorithm compares very well, in run-time, with much more elaborate alternatives.

Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes

Fri, 01 Jan 2016 00:00:00 +0000

The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to over-fitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.

Genome-wide Modeling of Transcription Kinetics Reveals Patterns of RNA Production Delays

Mon, 05 Oct 2015 00:00:00 +0000

Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor $\alpha$ activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.

A Reverse-Engineering Approach to Dissect Post-translational Modulators of transcription Factor's Activity from Transcriptional Data

Thu, 03 Sep 2015 00:00:00 +0000

Background Transcription factors (TFs) act downstream of the major signalling pathways functioning as master regulators of cell fate. Their activity is tightly regulated at the transcriptional, post-transcriptional and post-translational level. Proteins modifying TF activity are not easily identified by experimental high-throughput methods. Results We developed a computational strategy, called Differential Multi-Information (DMI), to infer post-translational modulators of a transcription factor from a compendium of gene expression profiles (GEPs). DMI is built on the hypothesis that the modulator of a TF (i.e. kinase/phosphatases), when expressed in the cell, will cause the TF target genes to be co-expressed. On the contrary, when the modulator is not expressed, the TF will be inactive resulting in a loss of co-regulation across its target genes. DMI detects the occurrence of changes in target gene co-regulation for each candidate modulator, using a measure called Multi-Information. We validated the DMI approach on a compendium of 5,372 GEPs showing its predictive ability in correctly identifying kinases regulating the activity of 14 different transcription factors. Conclusions DMI can be used in combination with experimental approaches as high-throughput screening to efficiently improve both pathway and target discovery. An on-line web-tool enabling the user to use DMI to identify post-transcriptional modulators of a transcription factor of interest can be found at http://dmi.tigem.it.

Semi-described and Semi-supervised Learning with Gaussian Processes

Sun, 12 Jul 2015 00:00:00 +0000

Propagating input uncertainty through non-linear Gaussian process (GP) mappings is intractable. This hinders the task of training GPs using uncertain and partially observed inputs. In this paper we refer to this task as 'semi-described learning'. We then introduce a GP framework that solves both, the semi-described and the semi-supervised learning problems (where missing values occur in the outputs). Auto-regressive state space simulation is also recognised as a special case of semi-described learning. To achieve our goal we develop variational methods for handling semi-described inputs in GPs, and couple them with algorithms that allow for imputing the missing values while treating the uncertainty in a principled, Bayesian manner. Extensive experiments on simulated and real-world data study the problems of iterative forecasting and regression/classification with missing values. The results suggest that the principled propagation of uncertainty stemming from our framework can significantly improve performance in these tasks.

Malaria surveillance with multiple data sources using Gaussian process models

Tue, 09 Dec 2014 00:00:00 +0000

A statistical framework for monitoring the health of a population should ideally be able to combine data from a wide variety of sources, such as remote sensing, telecoms, and official health records, in a principled manner. Gaussian process regression is commonly used to visualise disease incidence by interpolating values across a map; in this article, we show how it can be extended to deal with many different types of information by introducing a flexible covariance structure across data sources. Combining many data sources in a single model provides a number of practical advantages, such as the ability to automatically determine the importance of each data source through likelihood optimisation, and to deal with missing values. We show the basic idea with an application of malaria density modeling across Uganda using administrative records and remote sensing vegetation index data, and then go on to describe further extensions such as the incorporation of human mobility data extracted from mobile phone call detail records (CDRs).

Gaussian Process Models with Parallelization and GPU acceleration

Sat, 18 Oct 2014 00:00:00 +0000

In this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the sparse Gaussian process formulation. Additionally, the computational bottleneck is implemented with GPU acceleration for further speed up. Combining both techniques allows applying Gaussian process models to millions of datapoints. The efficiency of our algorithm is demonstrated with a synthetic dataset. Its source code has been integrated into our popular software library GPy.

Consistent Mapping of Government Malaria Records Across a Changing Territory Delimitation

Mon, 22 Sep 2014 00:00:00 +0000

Background Health Management Information Systems (HMIS) are a crucial tool for supporting planning and decision-making. The benefits of such systems will depend on the quality of the data they provide and on the response capacity of the decision-makers [1]. The analysis of malaria incidence records of the HMIS, in Uganda, faces two main complications. First, artificial trends induced by a non-negligible and variable rate of non-reporting hospitals. Second, lack of comparability across time, due to changes in the districts boundaries. Materials and methods We propose a method for estimating the incidence of malaria for the different district definitions across time. Although we have information for the whole country across many years, this task requires making estimates for periods where data is not available for a specific district delimitation. We provide disease maps based on HMIS information by exploiting the relationship with its environmental drivers. Our approach relies on the Gaussian process framework. In particular, we use multiple output kernel techniques [2] to achieve consistency between the totals and subtotals of incidence records at different levels of territory aggregations. In the case of map generation, this approach allows us to combine information from different sources at different spatial resolution. We use the HMIS malaria records from 2003 to 2013. The records consist of weekly information aggregated within districts. The information available also includes the number of hospitals reporting each week. We use the normalized difference vegetation index and land surface temperature measurements, both commonly used for identifying suitable habitats for mosquito breeding [3]. Results For recently created districts, our method allows comparability between the current malaria incidence and periods before they started reporting to the HMIS. The probabilistic model defined allows HMIS users to generate samples from a incidence distribution to develop further analysis. We also generate disease maps by combining administrative records with remote sensing data.

Warped Linear Mixed Models for the Genetic Analysis of Transformed Phenotypes

Fri, 19 Sep 2014 00:00:00 +0000

Linear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction.

Variational Inference for Uncertainty on the Inputs of Gaussian Process Models

Sun, 14 Sep 2014 00:00:00 +0000

The Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximized over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximizing an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from iid observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to automatically select the dimensionality of the nonlinear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain inputs and semi-supervised Gaussian processes. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.

Metrics for Probabilistic Geometries

Wed, 23 Jul 2014 00:00:00 +0000

We investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.

Inference of RNA Polymerase II Transcription Dynamics from Chromatin Immunoprecipitation Time Course Data

Wed, 14 May 2014 00:00:00 +0000

Gene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2). The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ER) and FOXA1 binding in their proximal promoter regions.

Fast Nonparametric Clustering of Structured Time-Series

Fri, 18 Apr 2014 00:00:00 +0000

In this publication, we combine two Bayesian nonparametric models: the Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP model is to introduce a variation on the GP prior which enables us to model structured time-series data, i.e. data containing groups where we wish to model inter- and intra-group variability. Our innovation in the DP model is an implementation of a new fast collapsed variational inference procedure which enables us to optimize our variational approximation significantly faster than standard VB approaches. In a biological time series application we show how our model better captures salient features of the data, leading to better consistency with existing biological classifications, while the associated inference algorithm provides a significant speed-up over EM-based variational inference.

Tilted Variational Bayes

Wed, 02 Apr 2014 00:00:00 +0000

We present a novel method for approximate inference. Using some of the constructs from expectation propagation (EP), we derive a lower bound of the marginal likelihood in a similar fashion to variational Bayes (VB). The method combines some of the benefits of VB and EP: it can be used with light-tailed likelihoods (where traditional VB fails), and it provides a lower bound on the marginal likelihood. We apply the method to Gaussian process classification, a situation where the Kullback-Leibler divergence minimized in traditional VB can be infinite, and to robust Gaussian process regression, where the inference process is dramatically simplified in comparison to EP. Code to reproduce all the experiments can be found at .

Hybrid Discriminative-Generative Approaches with Gaussian Processes

Wed, 02 Apr 2014 00:00:00 +0000

Machine learning practitioners are often faced with a choice between a discriminative and a generative approach to modelling. Here, we present a model based on a hybrid approach that breaks down some of the barriers between the discriminative and generative points of view, allowing continuous dimensionality reduction of hybrid discrete-continous data, discriminative classification with missing inputs and manifold learning informed by class labels.

Nested Variational Compression in Deep Gaussian Processes

Wed, 01 Jan 2014 00:00:00 +0000

Deep Gaussian processes provide a flexible approach to probabilistic modelling of data using either supervised or unsupervised learning. For tractable inference approximations to the marginal likelihood of the model must be made. The original approach to approximate inference in these models used variational compression to allow for approximate variational marginalization of the hidden variables leading to a lower bound on the marginal likelihood of the model [Damianou and Lawrence, 2013]. In this paper we extend this idea with a nested variational compression. The resulting lower bound on the likelihood can be easily parallelized or adapted for stochastic variational inference.

Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters

Tue, 20 Aug 2013 00:00:00 +0000

**Background** Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. **Results** We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method\u2019s capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method\u2019s ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. **Conclusion** The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: .

Gaussian Processes for Big Data

Thu, 11 Jul 2013 00:00:00 +0000

We introduce stochastic variational inference for Gaussian process models. This enables the application of Gaussian process (GP) models to data sets containing millions of data points. We show how GPs can be variationally decomposed to depend on a set of globally relevant inducing variables which factorize the model in the necessary manner to perform variational inference. Our approach is readily extended to models with non-Gaussian likelihoods and latent variable models based around Gaussian processes. We demonstrate the approach on a simple toy problem and two real world data sets.

The Bigraphical Lasso

Sun, 26 May 2013 00:00:00 +0000

The i.i.d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data points. We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. A prominent product in spectral graph theory, this structure has appealing properties for regression, enhanced sparsity and interpretability. To deal with the parameter explosion we introduce L1 penalties and fit the model through a flip-flop algorithm that results in a linear number of lasso regressions.

Linear Latent Force Models Using Gaussian Processes

Mon, 13 May 2013 00:00:00 +0000

Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.

Deep Gaussian Processes

Mon, 29 Apr 2013 00:00:00 +0000

In this paper we introduce deep Gaussian process (GP) models. Deep GPs are a deep belief network based on Gaussian process mappings. The data is modeled as the output of a multivariate GP. The inputs to that Gaussian process are then governed by another GP. A single layer model is equivalent to a standard GP or the GP latent variable model (GP-LVM). We perform inference in the model by approximate variational marginalization. This results in a strict lower bound on the marginal likelihood of the model which we use for model selection (number of layers and nodes per layer). Deep belief networks are typically applied to relatively large data sets using stochastic gradient descent for optimization. Our fully Bayesian treatment allows for the application of deep models even when data is scarce. Model selection by our variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.

Unravelling the enigma of selective vulnerability in neurodegeneration: motor neurons resistant to degeneration in ALS show distinct gene expression characteristics and decreased susceptibility to excitotoxicity

Thu, 04 Apr 2013 00:00:00 +0000

A consistent clinical feature of amyotrophic lateral sclerosis (ALS) is the sparing of eye movements and the function of external sphincters, with corresponding preservation of motor neurons in the brainstem oculomotor nuclei, and of Onuf\u2019s nucleus in the sacral spinal cord. Studying the differences in properties of neurons that are vulnerable and resistant to the disease process in ALS may provide insights into the mechanisms of neuronal degeneration, and identify targets for therapeutic manipulation. We used microarray analysis to determine the differences in gene expression between oculomotor and spinal motor neurons, isolated by laser capture microdissection from the midbrain and spinal cord of neurologically normal human controls. We compared these to transcriptional profiles of oculomotor nuclei and spinal cord from rat and mouse, obtained from the GEO omnibus database. We show that oculomotor neurons have a distinct transcriptional profile, with significant differential expression of 1,757 named genes ($q < 0.001$). Differentially expressed genes are enriched for the functional categories of synaptic transmission, ubiquitin-dependent proteolysis, mitochondrial function, transcriptional regulation, immune system functions, and the extracellular matrix. Marked differences are seen, across the three species, in genes with a function in synaptic transmission, including several glutamate and GABA receptor subunits. Using patch clamp recording in acute spinal and brainstem slices, we show that resistant oculomotor neurons show a reduced AMPA-mediated inward calcium current, and a higher GABA-mediated chloride current, than vulnerable spinal motor neurons. The findings suggest that reduced susceptibility to excitotoxicity, mediated in part through enhanced GABAergic transmission, is an important determinant of the relative resistance of oculomotor neurons to degeneration in ALS.

Detecting Regulatory Gene-Environment Interactions with Unmeasured Environmental Factors

Wed, 03 Apr 2013 00:00:00 +0000

Motivation: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits.

Results: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it’s not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability.

Availability: Software available at http://pmbio.github.io/envGPLVM/.

Contact: [oliver.stegle@ebi.ac.uk](oliver.stegle@ebi.ac.uk), [nicolo.fusi@sheffield.ac.uk](nicolo.fusi@sheffield.ac.uk)

Fast variational inference in the Conjugate Exponential family

Tue, 04 Dec 2012 00:00:00 +0000

We present a general method for deriving collapsed variational inference algorithms for probabilistic models in the conjugate exponential family. Our method unifies many existing approaches to collapsed variational inference. Our collapsed variational inference leads to a new lower bound on the marginal likelihood. We exploit the information geometry of the bound to derive much faster optimization methods based on conjugate gradients for these models. Our approach is very general and is easily applied to any model where the mean field update equations have been derived. Empirically we show significant speed-ups for probabilistic models optimized using our bound.

Mining Regulatory Network Connections by Ranking Transcription Factor Target Genes Using Time Series Expression Data

Sat, 08 Sep 2012 00:00:00 +0000

Reverse engineering the gene regulatory network is challenging because the amount of available data is very limited compared to the complexity of the underlying network. We present a technique addressing this problem through focussing on a more limited problem: inferring direct targets of a transcription factor from short expression time series. The method is based on combining Gaussian process priors and ordinary differential equation models allowing inference on limited potentially unevenly sampled data. The method is implemented as an R/Bioconductor package, and it is demonstrated by ranking candidate targets of the p53 tumour suppressor.

Manifold Relevance Determination

Tue, 26 Jun 2012 00:00:00 +0000

In this paper we present a fully Bayesian latent variable model which exploits conditional nonlinear (in)-dependence structures to learn an efficient latent representation. The latent space is factorized to represent shared and private information from multiple views of the data. In contrast to previous approaches, we introduce a relaxation to the discrete segmentation and allow for a “softly” shared latent space. Further, Bayesian techniques allow us to automatically estimate the dimensionality of the latent spaces. The model is capable of capturing structure underlying extremely high dimensional spaces. This is illustrated by modelling unprocessed images with tenths of thousands of pixels. This also allows us to directly generate novel images from the trained model by sampling from the discovered latent spaces. We also demonstrate the model by prediction of human pose in an ambiguous setting. Our Bayesian framework allows us to perform disambiguation in a principled manner by including latent space priors which incorporate the dynamic nature of the data.

Kernels for Vector-Valued Functions: A Review

Tue, 19 Jun 2012 00:00:00 +0000

Kernel methods are among the most popular techniques in machine learning. From a regularization perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a probabilistic perspec- tive they are the key in the context of Gaussian processes, where the kernel function is known as the covariance function. Traditionally, kernel methods have been used in supervised learning problems with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partially by frameworks like multitask learning. In this monograph, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.

Modeling Meiotic Chromosomes Indicates a Size Dependent Contribution of Telomere Clustering and Chromosome Rigidity to Homologue Juxtaposition

Thu, 03 May 2012 00:00:00 +0000

Meiosis is the cell division that halves the genetic component of diploid cells to form gametes or spores. To achieve this, meiotic cells undergo a radical spatial reorganisation of chromosomes. This reorganisation is a prerequisite for the pairing of parental homologous chromosomes and the reductional division, which halves the number of chromosomes in daughter cells. Of particular note is the change from a centromere clustered layout (Rabl configuration) to a telomere clustered conformation (bouquet stage). The contribution of the bouquet structure to homologous chromosome pairing is uncertain. We have developed a new in silico model to represent the chromosomes of Saccharomyces cerevisiae in space, based on a worm-like chain model constrained by attachment to the nuclear envelope and clustering forces. We have asked how these constraints could influence chromosome layout, with particular regard to the juxtaposition of homologous chromosomes and potential nonallelic, ectopic, interactions. The data support the view that the bouquet may be sufficient to bring short chromosomes together, but the contribution to long chromosomes is less. We also find that persistence length is critical to how much influence the bouquet structure could have, both on pairing of homologues and avoiding contacts with heterologues. This work represents an important development in computer modeling of chromosomes, and suggests new explanations for why elucidating the functional significance of the bouquet by genetics has been so difficult.

Overlapping Mixtures of Gaussian Processes for the Data Association Problem

Wed, 04 Apr 2012 00:00:00 +0000

In this work we introduce a mixture of GPs to address the data association problem, i.e., to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a non-standard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings.

Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies

Thu, 05 Jan 2012 00:00:00 +0000

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at .

Genome-wide occupancy links Hoxa2 to Wnt-$\beta$-catenin signaling in mouse embryonic development

Mon, 02 Jan 2012 00:00:00 +0000

The regulation of gene expression is central to developmental programs and largely depends on the binding of sequence-specific transcription factors with cis-regulatory elements in the genome. Hox transcription factors specify the spatial coordinates of the body axis in all animals with bilateral symmetry, but a detailed knowledge of their molecular function in instructing cell fates is lacking. Here, we used chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) to identify Hoxa2 genomic locations in a time and space when it is actively instructing embryonic development in mouse. Our data reveals that Hoxa2 has large genome coverage and potentially regulates thousands of genes. Sequence analysis of Hoxa2-bound regions identifies high occurrence of two main classes of motifs, corresponding to Hox and Pbx–Hox recognition sequences. Examination of the binding targets of Hoxa2 faithfully captures the processes regulated by Hoxa2 during embryonic development; in addition, it uncovers a large cluster of potential targets involved in the Wnt-signaling pathway. In vivo examination of canonical Wnt-$eta$-catenin signaling reveals activity specifically in Hoxa2 domain of expression, and this is undetectable in Hoxa2 mutant embryos. The comprehensive mapping of Hoxa2-binding sites provides a framework to study Hox regulatory networks in vertebrate developmental processes.

Identifying Targets of Multiple Co-regulated Transcription Factors from Expression Time-series by Bayesian Model Comparison

Sun, 01 Jan 2012 00:00:00 +0000

**Background** Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-speciﬁc networks involving a few interacting transcription factors (TFs) and all of their target genes. **Results** We present a computational framework for Bayesian statistical inference of target genes of multiple interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models that describe transcription of target genes taking into account combinatorial regulation. The method consists of a training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our methodology to identify targets of five TFs regulating Drosophila melanogaster mesoderm development. We find that confident predicted links between TFs and targets are significantly enriched for supporting ChIP-chip binding events and annotated TF-gene interations. Our method statistically significantly outperforms existing alternatives. **Conclusions** Our results show that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity. Introducing data from several different experimental perturbations significantly increases the accuracy.

Residual Component Analysis

Sun, 01 Jan 2012 00:00:00 +0000

Probabilistic principal component analysis (PPCA) seeks a low dimensional representation of a data set in the presence of independent spherical Gaussian noise, $\Sigma = \sigma^2\mathbf{I}$. The maximum likelihood solution for the model is an eigenvalue problem on the sample covariance matrix. In this paper we consider the situation where the data variance is already partially explained by other factors, e.g. conditional dependencies between the covariates, or temporal correlations leaving some residual variance. We decompose the residual variance into its components through a generalised eigenvalue problem, which we call residual component analysis (RCA). We explore a range of new algorithms that arise from the framework, including one that factorises the covariance of a Gaussian density into a low-rank and a sparse-inverse component. We illustrate the ideas on the recovery of a protein-signaling network, a gene expression time-series data set and the recovery of the human skeleton from motion capture 3-D cloud data.

Gaussian Processes for Big Data with Stochastic Variational Inference

Sun, 01 Jan 2012 00:00:00 +0000

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

Sun, 01 Jan 2012 00:00:00 +0000

We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.

Efficient Inference in Matrix-Variate Gaussian Models with i.i.d. Observation Noise

Mon, 12 Dec 2011 00:00:00 +0000

Inference in matrix-variate Gaussian models has major applications for multi- output prediction and joint learning of row and column covariances from matrix- variate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders.

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Thu, 11 Aug 2011 00:00:00 +0000

'What's going to happen next?' Time series data hold the answers, and Bayesian methods represent the cutting edge in learning what they have to say. This ambitious book is the first unified treatment of the emerging knowledge-base in Bayesian time series techniques. Exploiting the unifying framework of probabilistic graphical models, the book covers approximation schemes, both Monte Carlo and deterministic, and introduces switching, multi-object, non-parametric and agent-based models in a variety of application environments. It demonstrates that the basic framework supports the rapid creation of models tailored to specific applications and gives insight into the computational complexity of their implementation. The authors span traditional disciplines such as statistics and engineering and the more recently established areas of machine learning and pattern recognition. Readers with a basic understanding of applied probability, but no experience with time series analysis, are guided from fundamental concepts to the state-of-the-art in research and practice.

Linear Latent Force Models Using Gaussian Processes

Wed, 13 Jul 2011 00:00:00 +0000

Kernels for Vector-Valued Functions: a Review

Thu, 30 Jun 2011 00:00:00 +0000

Kernel methods are among the most popular techniques in machine learning. From a frequentist/discriminative perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a Bayesian/generative perspective they are the key in the context of Gaussian processes, where the kernel function is also known as the covariance function. Traditionally, kernel methods have been used in supervised learning problem with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partly by frameworks like multitask learning. In this paper, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.

Residual Component Analysis

Tue, 21 Jun 2011 00:00:00 +0000

Probabilistic principal component analysis (PPCA) seeks a low dimensional representation of a data set in the presence of independent spherical Gaussian noise, $\Sigma = (\sigma^2)\mathbf{I}$. The maximum likelihood solution for the model is an eigenvalue problem on the sample covariance matrix. In this paper we consider the situation where the data variance is already partially explained by other factors, e.g. covariates of interest, or temporal correlations leaving some residual variance. We decompose the residual variance into its components through a generalized eigenvalue problem, which we call residual component analysis (RCA). We show that canonical covariates analysis (CCA) is a special case of our algorithm and explore a range of new algorithms that arise from the framework. We illustrate the ideas on a gene expression time series data set and the recovery of human pose from silhouette.

Spectral Dimensionality Reduction via Maximum Entropy

Tue, 14 Jun 2011 00:00:00 +0000

We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We directly maximize the likelihood and show results that are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set.

Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Thu, 02 Jun 2011 00:00:00 +0000

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation. We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.

A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression

Fri, 20 May 2011 00:00:00 +0000

**Background** \ The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process.\ \ **Results** \ We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach significantly outperforms the current state of the art.\ \ **Conclusions**\ \ Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of micro-array time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.

Computationally Efficient Convolved Multiple Output Gaussian Processes

Sun, 01 May 2011 00:00:00 +0000

Recently there has been an increasing interest in regression methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different efficient approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in school exams score prediction, pollution prediction and gene expression data

tigre: Transcription Factor Inference through Gaussian Process Reconstruction of Expression for Bioconductor

Mon, 07 Feb 2011 00:00:00 +0000

**Summary**: tigre is an R/Bioconductor package for inference of transcription factor activity and ranking candidate target genes from gene expression time series. The underlying methodology is based on Gaussian process inference on a differential equation model that allows the use of short, unevenly sampled, time series. The method has been designed with efficient parallel implementation in mind, and the package supports parallel operation even without additional software. **Availability**: The tigre package is included in Bioconductor since release 2.6 for R 2.11. The package and a user's guide are available at http://www.bioconductor.org. **Contact**: antti.honkela@hiit.fi; m.rattray@sheffield.ac.uk; n.lawrence@dcs.shef.ac.uk

Gaussian Process Inference for Differential Equation Models of Transcriptional Regulation

Sat, 01 Jan 2011 00:00:00 +0000

Variational Gaussian Process Dynamical Systems

Sat, 01 Jan 2011 00:00:00 +0000

High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences.

A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction

Fri, 22 Oct 2010 00:00:00 +0000

We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We develop new algorithms that directly maximize the likelihood and show that these new algorithms are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.

TFInfer: a tool for probabilistic inference of transcription factor activities

Fri, 15 Oct 2010 00:00:00 +0000

Summary: TFInfer is a novel open access, standalone tool for genome-wide inference of transcription factor activities from gene expression data. Based on an earlier MATLAB version, the software has now been extended in a number of ways. It has been significantly optimised in terms of performance, and it was given novel functionality, by allowing the user to model both time series and data from multiple independent conditions. With a full documentation and intuitive graphical user interface, together with an in-built data base of yeast and Escherichia coli transcription factors, the software does not require any mathematical or computational expertise to be used effectively.

Availability:

Contact: [gsanguin@staffmail.ed.ac.uk](gsanguin@staffmail.ed.ac.uk)

Model-based Method for Transcription Factor Target Identification with Limited Data

Tue, 27 Apr 2010 00:00:00 +0000

We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.

Bayesian Gaussian Process Latent Variable Model

Wed, 31 Mar 2010 00:00:00 +0000

We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.

Efficient Multioutput Gaussian Processes through Variational Inducing Kernels

Wed, 31 Mar 2010 00:00:00 +0000

Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Álvarez and Lawrence @Alvarez:convolved08 recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias @Titsias:variational09 to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.

Elementary properties of CaV1.3 Ca2+ channels expressed in mouse cochlear inner hair cells

Wed, 20 Jan 2010 00:00:00 +0000

Mammalian cochlear inner hair cells (IHCs) are specialized to process developmental signals during immature stages and sound stimuli in adult animals. These signals are conveyed onto auditory afferent nerve fibres. Neurotransmitter release at IHC ribbon synapses is controlled by L-type CaV1.3 Ca2+ channels, the biophysics of which are still unknown in native mammalian cells. We have investigated the localization and elementary properties of Ca2+ channels in immature mouse IHCs under near-physiological recording conditions. CaV1.3 Ca2+ channels at the cell pre-synaptic site co-localize with about half of the total number of ribbons present in immature IHCs. These channels activated at relatively hyperpolarized membrane potentials (about -70 mV), showed a relatively short first latency and weak inactivation, which would allow IHCs to generate and accurately encode spontaneous Ca2+ action potential activity characteristic of these immature cells. The CaV1.3 Ca2+ channels showed a very low open probability (about 0.15 at -20 mV: near the peak of an action potential). Comparison of elementary and macroscopic Ca2+ currents indicated that very few Ca2+ channels are associated with each docked vesicle at IHC ribbon synapses. Finally, we found that the open probability of Ca2+ channels, but not their opening time, was voltage dependent. This finding provides a possible correlation between presynaptic Ca2+ channel properties and the characteristic frequency/amplitude of EPSCs in auditory afferent fibres.

Introduction to Learning and Inference in Computational Systems Biology

Fri, 01 Jan 2010 00:00:00 +0000

Computational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.

Gaussian Processes for Missing Species in Biochemical Systems

Fri, 01 Jan 2010 00:00:00 +0000

A Brief Introduction to Bayesian Inference

Fri, 01 Jan 2010 00:00:00 +0000

Switched Latent Force Models for Movement Segmentation

Fri, 01 Jan 2010 00:00:00 +0000

Latent force models encode the interaction between multiple related dynamical systems in the form of a kernel or covariance function. Each variable to be modeled is represented as the output of a differential equation and each differential equation is driven by a weighted sum of latent functions with uncertainty given by a Gaussian process prior. In this paper we consider employing the latent force model framework for the problem of determining robot motor primitives. To deal with discontinuities in the dynamical systems or the latent driving force we intro- duce an extension of the basic latent force model, that switches between different latent functions and potentially different dynamical systems. This creates a versatile representation for robot movements that can capture discrete changes and non-linearities in the dynamics. We give illustrative examples on both synthetic data and for striking movements recorded using a Barrett WAM robot as haptic input device. Our inspiration is robot motor primitives, but we expect our model to have wide application for dynamical systems including models for human motion capture data and systems biology.

Variational Inducing Kernels for Sparse Convolved Multiple Output Gaussian Processes

Wed, 16 Dec 2009 00:00:00 +0000

Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Álvarez and Lawrence (2009) recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.

puma: a Bioconductor package for Propagating Uncertainty in Microarray Analysis

Thu, 09 Jul 2009 00:00:00 +0000

Background

Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied.

Results

puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation. Conclusions For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.

Non-Linear Matrix Factorization with Gaussian Processes

Mon, 01 Jun 2009 00:00:00 +0000

A popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous state-of-the-art performance.

Latent Force Models

Wed, 15 Apr 2009 00:00:00 +0000

Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modeling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from computational biology, motion capture and geostatistics.

Backing Off: Hierarchical Decomposition of Activity for 3D Novel Pose Recovery

Thu, 01 Jan 2009 00:00:00 +0000

For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, lowdimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models

Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Thu, 01 Jan 2009 00:00:00 +0000

Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our methdo involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible *without* solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and elay differential equations, and provide a comprehensive comparison with current state of the art methods.

Sparse Convolved Multiple Output Gaussian Processes

Thu, 01 Jan 2009 00:00:00 +0000

Recently there has been an increasing interest in methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different sparse approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in pollution prediction, school exams score prediction and gene expression data.

Efficient Sampling for Gaussian Process Inference using Control Variables

Mon, 08 Dec 2008 00:00:00 +0000

Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by continuously minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation.

Sparse Convolved Gaussian Processes for Multi-output Regression

Mon, 08 Dec 2008 00:00:00 +0000

We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network.

Gaussian Process Modelling of Latent Chemical Species: Applications to Inferring Transcription Factor Activities

Fri, 15 Aug 2008 00:00:00 +0000

Motivation: Inference of latent chemical species in biochemical interaction networks is a key problem in estimation of the structure and parameters of the genetic, metabolic and protein interaction networks that underpin all biological processes. We present a framework for Bayesian marginalisation of these latent chemical species through Gaussian process priors.

Results: We demonstrate our general approach on three different biological examples of single input motifs, including both activation and repression of transcription. We focus in particular on the problem of inferring transcription factor activity when the concentration of active protein cannot easily be measured. We show how the uncertainty in the inferred transcription factor activity can be integrated out in order to derive a likelihood function that can be used for the estimation of regulatory model parameters. An advantage of our approach is that we avoid the use of a coarse-grained discretization of continuous-time functions, which would lead to a large number of additional parameters to be estimated. We develop efficient exact and approximate inference schemes, which are much more efficient than competing sampling-based schemes and therefore provide us with a practical toolkit for model-based inference.

Availability: The software and data for recreating all the experiments in this paper is available in MATLAB from

Contact: Neil Lawrence

Ambiguity Modeling in Latent Spaces

Sat, 09 Aug 2008 00:00:00 +0000
We are interested in the situation where we have two or more representations of an underlying phenomenon. In particular we are interested in the scenario where the representation are complementary. This implies that a single individual representation is not sufficient to fully discriminate a specific instance of the underlying phenomenon, it also means that each representation is an ambiguous representation of the other complementary spaces. In this paper we present a latent variable model capable of consolidating multiple complementary representations. Our method extends canonical correlation analysis by introducing additional latent spaces that are specific to the different representations, thereby explaining the full variance of the observations. These additional spaces, explaining representation specific variance, separately model the variance in a representation ambiguous to the other. We develop a spectral algorithm for fast computation of the embeddings and a probabilistic model (based on Gaussian processes) for validation and inference. The proposed model has several potential application areas, we demonstrate its use for multi-modal regression on a benchmark human pose estimation data set.

Topologically-Constrained Latent Variable Models

Sat, 05 Jul 2008 00:00:00 +0000
In dimensionality reduction approaches, the data are typically embedded in a Euclidean latent space. However for some data sets this is inappropriate. For example, in human motion data we expect latent spaces that are cylindrical or a toroidal, that are poorly captured with a Euclidean space. In this paper, we present a range of approaches for embedding data in a non-Euclidean latent space. Our focus is the Gaussian Process latent variable model. In the context of human motion modeling this allows us to (a) learn models with interpretable latent directions enabling, for example, style/content separation, and (b) generalise beyond the data set enabling us to learn transitions between motion styles even though such transitions are not present in the data.

Gaussian Process Latent Variable Models For Human Pose Estimation

Tue, 01 Jan 2008 00:00:00 +0000
We describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) \[1\] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.

Variational Optimisation by Marginal Matching

Fri, 07 Dec 2007 00:00:00 +0000

Model-driven detection of Clean Speech Patches in Noise

Wed, 01 Aug 2007 00:00:00 +0000
Listeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.

Learning for Larger Datasets with the Gaussian Process Latent Variable Model

Sun, 11 Mar 2007 00:00:00 +0000
In this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GP-LVM). We review three techniques and discuss how they may be implemented in the context of the GP-LVM. Each approach is then implemented on a well known benchmark data set and compared with earlier attempts to sparsify the model.

Modelling transcriptional regulation using Gaussian Processes

Mon, 01 Jan 2007 00:00:00 +0000
Modelling the dynamics of transcriptional processes in the cell requires the knowledge of a number of key biological quantities. While some of them are relatively easy to measure, such as mRNA decay rates and mRNA abundance levels, it is still very hard to measure the active concentration levels of the transcription factor proteins that drive the process and the sensitivity of target genes to these concentrations. In this paper we show how these quantities for a given transcription factor can be inferred from gene expression levels of a set of known target genes. We treat the protein concentration as a latent function with a Gaussian Process prior, and include the sensitivities, mRNA decay rates and baseline expression levels as hyperparameters. We apply this procedure to a human leukemia dataset, focusing on the tumour repressor p53 and obtaining results in good accordance with recent biological studies.

Hierarchical Gaussian Process Latent Variable Models

Mon, 01 Jan 2007 00:00:00 +0000
The Gaussian process latent variable model (GP-LVM) is a powerful approach for probabilistic modelling of high dimensional data through dimensional reduction. In this paper we extend the GP-LVM through hierarchies. A hierarchical model (such as a tree) allows us to express conditional independencies in the data as well as the manifold structure. We first introduce Gaussian process hierarchies through a simple dynamical model, we then extend the approach to a more complex hierarchy which is applied to the visualisation of human motion data sets.

WiFi-SLAM Using Gaussian Process Latent Variable Models

Mon, 01 Jan 2007 00:00:00 +0000
WiFi localization, the task of determining the physical location of a mobile device from wireless signal strengths, has been shown to be an accurate method of indoor and outdoor localization and a powerful building block for location-aware applications. However, most localization techniques require a training set of signal strength readings labeled against a ground truth location map, which is prohibitive to collect and maintain as maps grow large. In this paper we propose a novel technique for solving the WiFi SLAM problem using the Gaussian Process Latent Variable Model (GP-LVM) to determine the latent-space locations of unlabeled signal strength data. We show how GP-LVM, in combination with an appropriate motion dynamics model, can be used to reconstruct a topological connectivity graph from a signal strength sequence which, in combination with the learned Gaussian Process signal strength model, can be used to perform efficient localization.

Gaussian Process Latent Variable Models for Fault Detection

Mon, 01 Jan 2007 00:00:00 +0000
The Gaussian process latent variable model (GPLVM) is a novel unsupervised approach to nonlinear low dimensional embedding proposed by Lawrence (2005). This paper presents the development of a framework for the implementation of the GPLVM for fault detection. A series of experiments have been carried out comparing and combining the GPLVM to the conventional and widely used linear dimension reduction technique of principal component analysis (PCA). The inclusion of the GPLVM for the visualisation and data analysis, led to a considerable improvement in the classification results

Local Distance Preservation in the GP-LVM through Back Constraints

Sun, 25 Jun 2006 00:00:00 +0000
The Gaussian process latent variable model (GP-LVM) is a generative approach to non-linear low dimensional embedding, that provides a smooth probabilistic mapping from latent to data space. It is also a non-linear generalization of probabilistic PCA (PPCA) @Tipping:probpca99. While most approaches to non-linear dimensionality methods focus on preserving local distances in data space, the GP-LVM focusses on exactly the opposite. Being a smooth mapping from latent to data space, it focusses on keeping things apart in latent space that are far apart in data space. In this paper we first provide an overview of dimensionality reduction techniques, placing the emphasis on the kind of distance relation preserved. We then show how the GP-LVM can be generalized, through back constraints, to additionally preserve local distances. We give illustrative experiments on common data sets.

Missing Data in Kernel PCA

Mon, 19 Jun 2006 00:00:00 +0000
Kernel Principal Component Analysis (KPCA) is a widely used technique for visualisation and feature extraction. Despite its success and flexibility, the lack of a probabilistic interpretation means that some problems, such as handling missing or corrupted data, are very hard to deal with. In this paper we exploit the probabilistic interpretation of linear PCA together with recent results on latent variable models in Gaussian Processes in order to introduce an objective function for KPCA. This in turn allows a principled approach to the missing data problem. Furthermore, this new approach can be extended to reconstruct corrupted test data using fixed kernel feature extractors. The experimental results show strong improvements over widely used heuristics.

Large Scale Learning with the Gaussian Process Latent Variable Model

Fri, 17 Feb 2006 00:00:00 +0000
In this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GP-LVM). We review three techniques and discuss how they may be implemented in the context of the GP-LVM. We briefly consider a GPR toy problem to highlight the strenghts and weaknesses of the different approaches before studying the perfomance of these techniques on a benchmark visualisation data set.

Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis

Wed, 01 Feb 2006 00:00:00 +0000
In this paper we consider a novel Bayesian interpretation of Fisher’s discriminiant analysis. We relate Rayleigh’s coefficient to a noise model that minimizes a cost based on the most probable class centres and that abandons the ‘regression to the labels’ assumption used by other algorithms. This yields a direction of discrimination equivalent to Fisher’s discriminant. We use Bayes’ rule to infer the posterior distribution for the direction of discrimination and in this process, priors and constraining distributions are incorporated to reach the desired result. Going further, with the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher’s discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through the optimisation of the marginal log-likelihood of the data. An added bonus of the new formulation is that it enables us to link the regularisation coefficient with the generalisation error.

The Gaussian Process Latent Variable Model

Fri, 27 Jan 2006 00:00:00 +0000
The Gaussian process latent variable model (GP-LVM) is a recently proposed probabilistic approach to obtaining a reduced dimension representation of a data set. In this tutorial we motivate and describe the GP-LVM, giving reviews of the model itself and some of the concepts behind it.

A Probabilistic Model to Integrate Chip and Microarray Data

Tue, 10 Jan 2006 00:00:00 +0000

Identifying submodules of cellular regulatory networks

Sun, 01 Jan 2006 00:00:00 +0000
Recent high throughput techniques in molecular biology have brought about the possibility of directly identifying the architecture of regulatory networks on a genome-wide scale. However, the computational task of estimating fine-grained models on a genome-wide scale is daunting. Therefore, it is of great importance to be able to reliably identify submodules of the network that can be effectively modelled as independent subunits. In this paper we present a procedure to obtain submodules of a cellular network by using information from gene-expression measurements. We integrate network architecture data with genome-wide gene expression measurements in order to determine which regulatory relations are actually confirmed by the expression data. We then use this information to obtain non-trivial submodules of the regulatory network using two distinct algorithms, a naive exhaustive algorithm and a spectral algorithm based on the eigendecomposition of an affinity matrix. We test our method on two yeast biological data sets, using regulatory information obtained from chromatin immunoprecipitation.

Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities

Sun, 01 Jan 2006 00:00:00 +0000
**Motivation**: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\ \ **Results**: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\ \ **Availability**: MATLAB code is available from .

Propagating Uncertainty in Microarray Data Analysis

Sun, 01 Jan 2006 00:00:00 +0000
Microarray technology is associated with many sources of experimental uncertainty. In this review we discuss a number of approaches for dealing with this uncertainty in the processing of data from microarray experiments. We focus here on the analysis of high-density oligonucleotide arrays, such as the popular Affymetrix GeneChip® array, which contain multiple probes for each target. This set of probes can be used to determine an estimate for the target concentration and can also be used to determine the experimental uncertainty associated with this measurement. This measurement uncertainty can then be propagated through the downstream analysis using probabilistic methods. We give examples showing how these credibility intervals can be used to help identify differential expression, to combine information from replicated experiments and to improve the performance of principal component analysis.

Probe-level Measurement Error Improves Accuracy in Detecting Differential Gene Expression

Sun, 01 Jan 2006 00:00:00 +0000
**Motivation:** Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level variance provides useful information which can help in the identification of differentially expressed genes.\ \ **Results:** We propose a Bayesian method to include probe-level variances into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for effcient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational effciency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in data set and a mouse time-course data set. Results show that the inclusion of probelevel measurement error improves accuracy in detecting differential gene expression.\ \ **Availability:** The methods described in this paper have been implemented in an R package *pplr* that is currently available from .\ \ **Contact:** Magnus Rattray

Gaussian Processes and the Null-Category Noise Model

Sun, 01 Jan 2006 00:00:00 +0000
With Gaussian process classifiers (GPC) we aim to predict the posterior probability of the class labels given an input data point, $p(y_i|x_i)$. In general we find that this posterior distribution is unaffected by unlabeled data points during learning. Support vector machines are strongly related to GPCs, but one notable difference is that the decision boundary in an SVM can be influenced by unlabeled data. The source of this discrepancy is the SVM’s margin: a characteristic which is not shared with the GPC. The presence of the marchin allows the support vector machine to seek low data density regions for the decision boundary, effectively allowing it to incorporate the cluster assumption (see Chapter 6). In this chapter we present the *null category noise model*. A probabilistic equivalent of the margin. By combining this noise model with a GPC we are able to incorporated the cluster assumption without explicitly modeling the input data density distributions and without a special choice of kernel.

Fast Variational Inference for Gaussian Process Models through KL-Correction

Sun, 01 Jan 2006 00:00:00 +0000
Variational inference is a exible approach to solving problems of intractability in Bayesian models. Unfortunately the convergence of variational methods is often slow. We review a recently suggested variational approach for approximate inference in Gaussian process (GP) models and show how convergence may be dramatically improved through the use of a positive correction term to the standard variational bound. We refer to the modied bound as a KL-corrected bound. The KL-corrected bound is a lower bound on the true likelihood, but an upper bound on the original variational bound. Timing comparisons between optimisation of the two bounds show that optimisation of the new bound consistently improves the speed of convergence.

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Tue, 01 Nov 2005 00:00:00 +0000
Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.

A Hybrid MaxEnt/HMM Based ASR System

Sun, 04 Sep 2005 00:00:00 +0000
The aim of this work is to develop a practical framework, which extends the classical Hidden Markov Model (HMM) for continuous speech recognition based on the Maximum Entropy (MaxEnt) principle. The MaxEnt models can estimate the posterior probabilities directly as with Hybrid NN/HMM connectionist speech recogniton systems. In particular, a new acoustic modelling based on discriminative MaxEnt models is formulated and is being developed to replace the generative Gaussian Mixture Models (GMM) commonly used to model acoustic variability. Initial experimental results using the TIMIT phone task are reported.

A Tractable Probabilistic Model for Affymetrix Probe-level Analysis across Multiple Chips

Thu, 14 Jul 2005 00:00:00 +0000
**Motivation:** Affymetrix GeneChip arrays are currently the most widely used microarray technology. Many summarisation methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large data sets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and that it does not account for specific binding to the mismatch (MM) probes. **Results:** We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark data sets and a real time-course data set, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions. **Availability:** Both mgMOS and the new model multi-mgMOS have been implemented in an R package that is currently available from .

Variational inference for Student-$t$ models: Robust Bayesian interpolation and generalised component analysis

Sat, 01 Jan 2005 00:00:00 +0000
We demonstrate how a variational approximation scheme enables effective inference of key parameters in probabilisitic signal models which employ the Student-t distribution. Using the two scenarios of previous termrobustnext term interpolation and independent component analysis (ICA) as examples, we illustrate the key feature of the approach: that the form of the noise distribution in the interpolation case, and the source distributions in the ICA case, can be inferred from the data concurrent with all other model parameters.

Automatic Determination of the Number of Clusters Using Spectral Algorithms

Sat, 01 Jan 2005 00:00:00 +0000

Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data

Sat, 01 Jan 2005 00:00:00 +0000
**Motivation:** Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\ \ **Results:** We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\ \ **Availability:** The software used in the paper is available from . The microarray data are deposited in the NCBI database.

Semi-supervised Learning via Gaussian Processes

Sat, 01 Jan 2005 00:00:00 +0000
We present a probabilistic approach to learning a Gaussian Process classifier in the presence of unlabeled data. Our approach involves a “null category noise model” (NCNM) inspired by ordered categorical noise models. The noise model reflects an assumption that the data density is lower between the class-conditional densities. We illustrate our approach on a toy problem and present comparative results for the semi-supervised classification of handwritten digits.

MOCAP Toolbox for MATLAB

Sat, 01 Jan 2005 00:00:00 +0000

Extensions of the Informative Vector Machine

Sat, 01 Jan 2005 00:00:00 +0000
The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.

Variational Inference in Gaussian Processes via Probabilistic Point Assimilation

Sat, 01 Jan 2005 00:00:00 +0000
We introduce a novel variational approach for approximate inference in Gaussian process (GP) models. The key advantages of our approach are the ease with which different noise models can be incorporated and improved speed of convergence. We refer to the algorithm as probabilistic point assimilation (PPA). We introduce the algorithm firstly using the ‘weight space’ view and then through its Gaussian process formulation. We illustrate the approach on several benchmark data sets.

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Fri, 13 Aug 2004 00:00:00 +0000
Summarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling.

Reducing the Variability in cDNA Microarray Image Processing by Bayesian Inference

Thu, 22 Jan 2004 00:00:00 +0000
**Motivation:** Gene expression levels are obtained from microarray experiments through the extraction of pixel intensities from a scanned image of the slide. It is widely acknowledged that variabilities can occur in expression levels extracted from the same images by different users with the same software packages. These inconsistencies arise due to differences in the refinement of the placement of the microarray ‘grids’. We introduce a novel automated approach to the refinement of grid placements that is based upon the use of Bayesian inference for determining the size, shape and positioning of the microarray ‘spots’, capturing uncertainty that can be passed to downstream analysis.\ \ **Results:** Our experiments demonstrate that variability between users can be significantly reduced using the approach. The automated nature of the approach also saves hours of researchers’ time normally spent in refining the grid placement.\ \ **Availability:** A MATLAB implementation of the algorithm and an image of the slide used in our experiments, as well as the code necessary to recreate them are available for non-commercial use from .

Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant Analysis

Thu, 01 Jan 2004 00:00:00 +0000
In this paper we consider a Bayesian interpretation of Fisher's discriminant. By relating Rayleigh's coefficient to a likelihood function and through the choice of a suitable prior we use Bayes' rule to infer a posterior distribution over projections. Through the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher's discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through optimisation of the marginalised likelihood of the data.

Matching Kernels through Kullback-Leibler Divergence Minimisation

Thu, 01 Jan 2004 00:00:00 +0000
In this paper we study the general constrained minimisation of Kullback-Leibler (KL) divergences between two zero mean Gaussian distributions. We reduce the problem to an equivalent minimisation involving the eigenvectors of the two kernel matrices, and provide explicit solutions in some cases. We then focus, as an example, on the important case of constraining the approximating matrix to be block diagonal. We prove a stability result on the approximating matrix, and speculate on how these results may be used to give further theoretical foundation to widely used techniques such as spectral clustering.

Learning to Learn with the Informative Vector Machine

Thu, 01 Jan 2004 00:00:00 +0000
This paper describes an efficient method for learning the parameters of a Gaussian process (GP). The parameters are learned from multiple tasks which are assumed to have been drawn independently from the same GP prior. An efficient algorithm is obtained by extending the informative vector machine (IVM) algorithm to handle the multi-task learning case. The multi-task IVM (MT-IVM) saves computation by greedily selecting the most informative examples from the separate tasks. The MT-IVM is also shown to be more efficient than sub-sampling on an artificial data-set and more effective than the traditional IVM in a speaker dependent phoneme recognition task.

The Informative Vector Machine: A Practical Probabilistic Alternative to the Support Vector Machine

Thu, 01 Jan 2004 00:00:00 +0000
We present a practical probabilistic alternative to the popular support vector machine (SVM). The algorithm is an approximation to a Gaussian process, and is probabilistic in the sense that it maintains the process variance that is implied by the use of a kernel function, which the SVM discards. We show that these variances may be tracked and made use of selection of an active set which gives a sparse representation for the model. For an active set size of $d$ our algorithm exhibits $O(d^{2}N)$ computational complexity and $O(dN)$ storage requirements. It has already been shown that the approach is comptetive with the SVM in terms of performance and running time, here we give more details of the approach and demonstrate that kernel parameters may also be learned in a practical and effective manner.

Gaussian Process Models for Visualisation of High Dimensional Data

Thu, 01 Jan 2004 00:00:00 +0000
In this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance functions which allow non-linear mappings. This more general Gaussian process latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our non-linear algorithm can be *further* kernelised leading to 'twin kernel PCA' in which a *mapping between feature spaces* occurs.

Acoustic Space Dimensionality Selection and Combination using the Maximum Entropy Principle

Thu, 01 Jan 2004 00:00:00 +0000
In this paper we propose a discriminative approach to acoustic space dimensionality selection based on maximum entropy modelling. We form a set of constraints by composing the acoustic space with the space of phone classes, and use a continuous feature formulation of maximum entropy modelling to select an optimal feature set. The suggested approach has two steps: (1) the selection of the best acoustic space that efficiently and economically represents the acoustic data and its variability; (2) the combination of selected acoustic features in the maximum entropy framework to estimate the posterior probabilities over the phonetic labels given the acoustic input. Specific contributions of this paper include a parameter estimation algorithm (generalized improved iterative scaling) that enables the use of negative features, the parameterization of constraint functions using Gaussian mixture models, and experimental results using the TIMIT database.

A Probabilistic Model for the Extraction of Expression Levels from Oligonucleotide Arrays

Mon, 01 Dec 2003 00:00:00 +0000
In this work we present a probabilistic model to estimate summaries of Affymetrix GeneChip probe level data. Comparisons with two different models were made both on a publicly available dataset and on a study performed in our laboratory, showing that our model performs better for consistency of fold change.

Bayesian Processing of Microarray Images

Wed, 17 Sep 2003 00:00:00 +0000
Gene expression measurements quantify the level of mRNA produced from each gene. Two principal methods exist for producing slides for extracting these levels: photolithography and spotted arrays. One difficulty with the spotted array format is determining the size and location of the spots on the array. In this paper we present a Bayesian approach to processing images produced by these arrays that seeks posterior distributions over the size and positions of the spots. This enables us to estimate expression ratios and their variances. Exact inference for the model we specify is intractable; we develop an approximate inference technique which combines importance sampling with variational inference. Our technique has already been shown to be more consistent than both manual processing and another automated technique @Lawrence:variability03. Here we present large-scale results for twenty-four microarray slides each representing 5760 genes and show the dramatic effects of incorporating variance in our downsteam analysis. Software based on this algorithm is available for academic use.

Generalised Component Analysis

Fri, 23 May 2003 00:00:00 +0000
Principal component analysis is a well known approach for determining the principal sub-space of a data-set. Independent component analysis is a widely utilised technique for recovering the linearly embedded independent components of a data-set. In this paper we develop an algorithm that, for super-Gaussian sources, extracts the direction and number of independent components of a data-set and determines the principal sub-space of the remaining components. This is achieved through the use of a latent variable model. We refer to the approach as Generalised Component Analysis and demonstrate its ability to both extract independent and principal components, as well as to determine the number of independent components, on toy and real word data-sets.

Variational Inference for Visual Tracking

Wed, 01 Jan 2003 00:00:00 +0000
The likelihood models used in probabilistic visual tracking applications are often complex non-linear and/or non-Gaussian functions, leading to analytically intractable inference. Solutions then require numerical approximation techniques, of which the particle filter is a popular choice. Particle filters, however, degrade in performance as the dimensionality of the state space increases and the support of the likelihood decreases. As an alternative to particle filters this paper introduces a variational approximation to the tracking recursion. The variational inference is intractable in itself, and is combined with an efficient importance sampling procedure to obtain the required estimates. The algorithm is shown to compare favourably with particle filtering techniques on a synthetic example and two real tracking problems. The first involves the tracking of a designated object in a video sequence based on its colour properties, whereas the second involves contour extraction in a single image.

A Variational Approach to Robust Bayesian Interpolation

Wed, 01 Jan 2003 00:00:00 +0000
This paper details a robust Bayesian interpolation procedure for linear-in-the-parameter models. Robustness is achieved via a Student-$t$ noise model, defined hierarchically in terms of an inverse-Gamma prior distribution over individual Gaussian observation variances. Variational techniques are exploited to update this prior in light of the data, while also inferring all other model variables. The key to this approach is flexibility; it can infer Gaussian noise where appropriate but can adapt to accommodate heavier-tailed distributions in the presence of outliers.

Fast Forward Selection to Speed Up Sparse Gaussian Process Regression

Wed, 01 Jan 2003 00:00:00 +0000
We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the “support” patterns at random, yet it can outperform random selection on hard curve fitting tasks. More importantly, it leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically. We demonstrate the model selection capabilities of the algorithm in a range of experiments. In line with the development of our method, we present a simple view on sparse approximations for GP models and their underlying assumptions and show relations to other methods.

Fast Sparse Gaussian Process Methods: The Informative Vector Machine

Wed, 01 Jan 2003 00:00:00 +0000
We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretical principles, previously suggested for active learning. In contrast to most previous work on sparse GPs, our goal is not only to learn sparse predictors (which can be evaluated in $O(d)$ rather than $O(n)$, $d<

Variational Inference Guide

Wed, 18 Dec 2002 00:00:00 +0000
This report is a brief introduction to variational inference for Bayesian models from the perspective of the Expectation Maximisation (EM) algorithm @Dempster:EM77. We start with an overview of the EM algorithm from the perspective of variational inference and then we show how approximate inference may also be performed. We discuss briefly when variational inference may be used and finally we mention the variational importance sampler as an alternative approach.

Optimising Synchronisation Times for Mobile Devices

Tue, 01 Jan 2002 00:00:00 +0000
With the increasing number of users of mobile computing devices (e.g. personal digital assistants) and the advent of third generation mobile phones, wireless communications are becoming increasingly important. Many applications rely on the device maintaining a *replica* of a data-structure which is stored on a server, for example news databases, calendars and e-mail. In this paper we explore the question of the optimal strategy for synchronising such replicas. We utilise probabilistic models to represent how the data-structures evolve and to model user behaviour. We then formulate objective functions which can be minimised with respect to the synchronisation timings. We demonstrate, using two real world data-sets, that a user can obtain more up-to-date information using our approach.

Sparse Bayesian Learning: The Informative Vector Machine

Tue, 01 Jan 2002 00:00:00 +0000

A Comparison of State-of-the-Art Classification Techniques with Application to Cytogenetics

Sun, 01 Apr 2001 00:00:00 +0000
Several state-of-the-art techniques: a neural network, Bayesian neural network, support vector machine and naive Bayesian classifier are experimentally evaluated in discriminating fluorescence in-situ hybridization (FISH) signals. Highly accurate classification of signals from real data and artefacts of two cytogenetic probes (colours) is required for detecting abnormalities in the data. More than 3,100 FISH signals are classified by the techniques into colour and as real or artefact with accuracies of around 98% and 88%, respectively. The results of the comparison also show a trade-off between simplicity represented by the naive Bayesian classifier and high classification performance represented by the other techniques.

Probabilistic Modelling of Replica Divergence

Mon, 01 Jan 2001 00:00:00 +0000
It is common in distributed systems to replicate data. In many cases this data evolves in a consistent fashion and this evolution can be modelled. A *probabilistic model* of the evolution allows us to estimate the divergence of the replicas and can be used by the application to alter its behaviour, for example to control synchronisation times, to determine the propagation of writes, and to convey to the user information about how much the data may have evolved. In this paper, we describe how the evolution of the data may be modelled and outline how the probabilistic model may be utilised in various applications, concentrating on a news database example.

The Structure of Neural Network Posteriors

Mon, 01 Jan 2001 00:00:00 +0000
Exact inference in Bayesian neural networks is non analytic to compute and as a result approximate approaches such as the evidence procedure, Monte-Carlo sampling and variational inference have been proposed. In this paper we explore the structure of the posterior distributions in such a model through a new approximating distribution based on *mixtures* of Gaussian distributions and show how it may be implemented.

Node Relevance Determination

Mon, 01 Jan 2001 00:00:00 +0000
Hierarchical Bayesian inference in parameterised models offers an approach for controlling complexity. In this paper we utilise a novel prior for the leaning of a model’s structure. We call the prior *node relevance determination*. It is applicable in a range of models including sigmoid belief networks and Boltzmann machines. We demonstrate how the approach may be applied to determine structure in a multi-layer perceptron.

Estimating a Kernel Fisher Discriminant in the Presence of Label Noise

Mon, 01 Jan 2001 00:00:00 +0000
Data noise is present in many machine learning problems domains, some of these are well studied but others have received less attention. In this paper we propose an algorithm for constructing a kernel Fisher discriminant (KFD) from training examples with *noisy labels*. The approach allows to associate with each example a probability of the label being flipped. We utilise an expectation maximization (EM) algorithm for updating the probabilities. The E-step uses class conditional probabilities estimated as a by-product of the KFD algorithm. The M-step updates the flip probabilities and determines the parameters of the discriminant. We have applied the approach to two real-world data-sets. The results show the feasibility of the approach.

Variational Learning for Multi-layer networks of Linear Threshold Units

Mon, 01 Jan 2001 00:00:00 +0000
Linear threshold units were originally proposed as models of biological neurons. They were widely studied in the context of the perceptron @Rosenblatt:book62. Due to the difficulties of finding a general algorithm for networks with hidden nodes, they never passed into general use. We derive an algorithm in the context of graphical models and show how it may be applied in multi-layer networks of linear threshold units. We demonstrate the algorithm through three well known datasets.

A Sparse Bayesian Compression Scheme — The Informative Vector Machine

Mon, 01 Jan 2001 00:00:00 +0000
Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.

Variational Learning for Multi-layer networks of Linear Threshold Units

Sat, 05 Feb 2000 00:00:00 +0000
Linear threshold units were originally proposed as models of biological neurons. They were widely studied in the context of the perceptron @Rosenblatt:book62. Due to the difficulties of finding a general algorithm for networks with hidden nodes, they never passed into general use. We derive an algorithm in the context of graphical models and show how it may be applied in multi-layer networks of linear threshold units. We demonstrate the algorithm through three well known datasets.

Variational Bayesian Independent Component Analysis

Sat, 01 Jan 2000 00:00:00 +0000
Blind separation of signals through the info-max algorithm may be viewed as maximum likelihood learning in a latent variable model. In this paper we present an alternative approach to maximum likelihood learning in these models, namely Bayesian inference. It has already been shown how Bayesian inference can be applied to determine latent dimensionality in principal component analysis models @Bishop:bayesPCA98. In this paper we derive a similar approach for removing unecessary source dimensions in an independent component analysis model. We present results on a toy data-set and on some artificially mixed images.

A Variational Bayesian Committee of Neural Networks

Fri, 17 Sep 1999 00:00:00 +0000
Exact inference in Bayesian neural networks is non analytic to compute and as a result approximate approaches such as the evidence procedure, Monte-Carlo sampling and variational inference have been proposed. In this paper we present a general overview of the Bayesian approach with a particular emphasis on the variational procedure. We then present a new approximating distribution based on *mixtures* of Gaussian distributions and show how it may be implemented. We present results on a simple toy problem and on two real world data-sets.

Mixture Representations for Inference and Learning in Boltzmann Machines

Thu, 01 Jan 1998 00:00:00 +0000
Boltzmann machines are undirected graphical models with two-state stochastic variables, in which the logarithms of the clique potentials are quadratic functions of the node states. They have been widely studied in the neural computing literature, although their practical applicability has been limited by the difficulty of finding an effective learning algorithm. One well-established approach, known as mean field theory, represents the stochastic distribution using a factorized approximation. However, the corresponding learning algorithm often fails to find a good solution. We conjecture that this is due to the implicit uni-modality of the mean field approximation which is therefore unable to capture multi-modality in the true distribution. In this paper we use variational methods to approximate the stochastic distribution using multi-modal *mixtures* of factorized distributions. We present results for both inference and learning to demonstrate the effectiveness of this approach.

Markovian inference in belief networks

Thu, 01 Jan 1998 00:00:00 +0000
Bayesian belief networks can represent the complicated probabilistic processes that form natural sensory inputs. Once the parameters of the network have been learned,nonlinear inferences about the input can be made by computing the posterior distribution over the hidden units (e.g., depth in stereo vision) given the input. Computing the posterior distribution exactly is not practical in richly-connected networks, but it turns out that by using a variational (a.k.a., mean field) method, it is easy to find a product-form distribution that approximates the true posterior distribution. This approximation assumes that the hidden variables are independent given the current input. In this paper, we explore a more powerful variational technique that models the posterior distribution using a Markov chain. We compare this method with inference using mean fields and mixtures of mean fields in randomly generated networks.

Approximating Posterior Distributions in Belief Networks using Mixtures

Thu, 01 Jan 1998 00:00:00 +0000
Exact inference in densely connected Bayesian networks is computationally intractable, and so there is considerable interest in developing effective approximation schemes. One approach which has been adopted is to bound the log likelihood using a mean-field approximating distribution. While this leads to a tractable algorithm, the mean field distribution is assumed to be factorial and hence unimodal. In this paper we demonstrate the feasibility of using a richer class of approximating distributions based on *mixtures* of mean field distributions. We derive an efficient algorithm for updating the mixture parameters and apply it to the problem of learning in sigmoid belief networks. Our results demonstrate a systematic improvement over simple mean field theory as the number of mixture components is increased.