Neil Lawrence's PublicationsPublications from Neil Lawrence.
http://inverseprobability.com/publications/
Mon, 24 Aug 2020 23:18:01 +0000Mon, 24 Aug 2020 23:18:01 +0000Jekyll v3.9.0Structured Variationally Auto-encoded OptimizationWe tackle the problem of optimizing a black-box objective function
defined over a highly-structured input space. This problem is
ubiquitous in science and engineering. In machine learning,
inferring the structure of a neural network or the Automatic
Statistician (AS), where the optimal kernel combination for a
Gaussian process is selected, are two important examples. We use the
\as as a case study to describe our approach, that can be easily
generalized to other domains. We propose an Structure Generating
Variational Auto-encoder (SG-VAE) to embed the original space of
kernel combinations into some low-dimensional continuous manifold
where Bayesian optimization (BO) ideas are used. This is possible
when structural knowledge of the problem is available, which can be
given via a simulator or any other form of generating potentially
good solutions. The right exploration-exploitation balance is
imposed by propagating into the search the uncertainty of the latent
space of the SG-VAE, that is computed using variational
inference. The key aspect of our approach is that the SG-VAE can be
used to bias the search towards relevant regions, making it suitable
for transfer learning tasks. Several experiments in various
application domains are used to illustrate the utility and
generality of the approach described in this work.
Tue, 03 Jul 2018 00:00:00 +0000
http://inverseprobability.com/publications/lu18c.html
http://inverseprobability.com/publications/lu18c.htmlDifferentially Private Regression with Gaussian ProcessesA major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Here we combine the provable privacy guarantees of the differential privacy framework with the flexibility of Gaussian processes (GPs). We propose a method using GPs to provide differentially private (DP) regression. We then improve this method by crafting the DP noise covariance structure to efficiently protect the training data, while minimising the scale of the added noise. We find that this cloaking method achieves the greatest accuracy, while still providing privacy guarantees, and offers practical DP for regression over multi-dimensional inputs. Together these methods provide a starter toolkit for combining differential privacy and GPs.Sat, 31 Mar 2018 00:00:00 +0000
http://inverseprobability.com/publications/smith18a.html
http://inverseprobability.com/publications/smith18a.htmlThe Emergence of Organizing Structure in Conceptual RepresentationBoth scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form—where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach can learn intuitive organizations, including a tree for animals and a ring for the color circle, it assumes a strong inductive bias that considers only these particular forms, and each form is explicitly provided as initial knowledge. Here we introduce a new computational model of how organizing structure can be discovered, utilizing a broad hypothesis space with a preference for sparse connectivity. Given that the inductive bias is more general, the model's initial knowledge shows little qualitative resemblance to some of the discoveries it supports. As a consequence, the model can also learn complex structures for domains that lack intuitive description, as well as predict human property induction judgments without explicit structural forms. By allowing form to emerge from sparsity, our approach clarifies how both the richness and flexibility of human conceptual organization can coexist.Tue, 09 Jan 2018 00:00:00 +0000
http://inverseprobability.com/publications/the-emergence-of-organizing-structure-in-conceptual-representation.html
http://inverseprobability.com/publications/the-emergence-of-organizing-structure-in-conceptual-representation.htmlLake:emergence18Fast variational inference in the Conjugate Exponential familyOften in machine learning, data are collected as a combination of multiple conditions, e.g., the voice recordings of multiple persons, each labeled with an ID. How could we build a model that captures the latent information related to these conditions and generalize to a new one with few data? We present a new model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and that allows to jointly model multiple conditions for regression and generalize to a new condition with a few data points at test time. LVMOGP infers the posteriors of Gaussian processes together with a latent space representing the information about different conditions. We derive an efficient variational inference method for LVMOGP, of which the computational complexity is as low as sparse Gaussian processes. We show that LVMOGP significantly outperforms related Gaussian process methods on various tasks with both synthetic and real data.Tue, 05 Dec 2017 00:00:00 +0000
http://inverseprobability.com/publications/efficient-modelling-of-latent-information-in-supervised-learning-using-gaussian-processes.html
http://inverseprobability.com/publications/efficient-modelling-of-latent-information-in-supervised-learning-using-gaussian-processes.htmlDai:supervised17Efficient inference for sparse latent variable models of transcriptional regulationBoth scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form—where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach can learn intuitive organizations, including a tree for animals and a ring for the color circle, it assumes a strong inductive bias that considers only these particular forms, and each form is explicitly provided as initial knowledge. Here we introduce a new computational model of how organizing structure can be discovered, utilizing a broad hypothesis space with a preference for sparse connectivity. Given that the inductive bias is more general, the model's initial knowledge shows little qualitative resemblance to some of the discoveries it supports. As a consequence, the model can also learn complex structures for domains that lack intuitive description, as well as predict human property induction judgments without explicit structural forms. By allowing form to emerge from sparsity, our approach clarifies how both the richness and flexibility of human conceptual organization can coexist.Sat, 26 Aug 2017 00:00:00 +0000
http://inverseprobability.com/publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.html
http://inverseprobability.com/publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.htmlDai:sparse18Preferential Bayesian OptimizationBayesian optimization (BO) has emerged during the last few years as an effective approach to optimize black-box functions where direct queries of the objective are expensive. We consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) and that allows to find the optimum of a latent function that can only be queried through pairwise comparisons, so-called duels. PBO extend the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the the winner of each duel by means of Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments in which we show how the way correlations are modeled is the key ingredient to drastically reduce the number of comparisons to find the optimum of the latent function of interest.Mon, 17 Jul 2017 00:00:00 +0000
http://inverseprobability.com/publications/gonzalez17a.html
http://inverseprobability.com/publications/gonzalez17a.htmlLiving Together: Mind and Machine IntelligenceIn this commentary we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to *embodiment factors*. We define embodiment factors as the ratio between an entity's ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is not sentient intelligence we should fear but non-sentient intelligence.
Wed, 24 May 2017 00:00:00 +0000
http://inverseprobability.com/publications/living-together-mind-and-machine-intelligence.html
http://inverseprobability.com/publications/living-together-mind-and-machine-intelligence.htmlpublicationsData Readiness LevelsApplication of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing values, inconvenient storage mechanisms, intellectual property, security and privacy. All these aspects obstruct the sharing and interconnection of data, and the eventual interpretation of data through machine learning or other approaches.
In project reporting, a major challenge is in encapsulating these problems and enabling goals to be built around the processing of data. Project overruns can occur due to failure to account for the amount of time required to curate and collate. But to understand these failures we need to have a common language for assessing the readiness of a particular data set. This position paper proposes the use of data readiness levels: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.
Fri, 05 May 2017 00:00:00 +0000
http://inverseprobability.com/publications/data-readiness-levels.html
http://inverseprobability.com/publications/data-readiness-levels.htmlLawrence-readiness16Manifold Alignment Determination: finding correspondences across different data viewsWe present Manifold Alignment Determination (MAD), an algorithm for learning alignments between data points from multiple views or modalities. The approach is capable of learning correspondences between views as well as correspondences between individual data-points. The proposed method requires only a few aligned examples from which it is capable to recover a global alignment through a probabilistic model. The strong, yet flexible regularization provided by the generative model is sufficient to align the views. We provide experiments on both synthetic and real data to highlight the benefit of the proposed approach.Thu, 12 Jan 2017 00:00:00 +0000
http://inverseprobability.com/publications/manifold-alignment-determination.html
http://inverseprobability.com/publications/manifold-alignment-determination.htmlDamianou:mad16Topslam: Waddington Landscape Recovery for Single Cell ExperimentsWe present an approach to estimating the nature of the Waddington (or epigenetic) landscape that underlies a population of individual cells. Through exploiting high resolution single cell transcription experiments we show that cells can be located on a landscape that reflects their differentiated nature. Our approach makes use of probabilistic non-linear dimensionality reduction that respects the topology of our estimated epigenetic landscape. In simulation studies and analyses of real data we show that the approach, known as , outperforms previous attempts to understand the differentiation landscape. Hereby, the novelty of our approach lies in the correction of distances *before* extracting ordering information. This gives the advantage over other attempts, which have to correct for extracted time lines by post processing or additional data.Mon, 20 Jun 2016 00:00:00 +0000
http://inverseprobability.com/publications/zwiessele-topslam16.html
http://inverseprobability.com/publications/zwiessele-topslam16.htmlZwiessele:topslam16Differentially Private Gaussian ProcessesA major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Differential privacy is a framework which allows algorithms to have provable privacy guarantees. Gaussian processes are a widely used approach for dealing with uncertainty in functions. This paper explores differentially private mechanisms for Gaussian processes. We compare binning and adding noise before regression with adding noise post-regression. For the former we develop a new kernel for use with binned data. For the latter we show that using inducing inputs allows us to reduce the scale of the added perturbation. We find that, for the datasets used, adding noise to a binned dataset has superior accuracy. Together these methods provide a starter toolkit for combining differential privacy and Gaussian processes.Thu, 02 Jun 2016 00:00:00 +0000
http://inverseprobability.com/publications/smith-dpgp16.html
http://inverseprobability.com/publications/smith-dpgp16.htmlSmith:dpgp16Recurrent Gaussian ProcessesWe define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available.Mon, 02 May 2016 00:00:00 +0000
http://inverseprobability.com/publications/mattos-recurrent16.html
http://inverseprobability.com/publications/mattos-recurrent16.htmlMattos:recurrent16GLASSES: Relieving The Myopia Of Bayesian OptimisationWe present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search. The majority of global optimisation approaches in use are myopic, in only considering the impact of the next function value; the non-myopic approaches that do exist are able to consider only a handful of future evaluations. Our novel algorithm, GLASSES, permits the consideration of dozens of evaluations into the future. This is done by approximating the ideal look-ahead loss function, which is expensive to evaluate, by a cheaper alternative in which the future steps of the algorithm are simulated beforehand. An Expectation Propagation algorithm is used to compute the expected value of the loss. We show that the far-horizon planning thus enabled leads to substantive performance gains in empirical tests.Mon, 02 May 2016 00:00:00 +0000
http://inverseprobability.com/publications/gonzalez-glasses16.html
http://inverseprobability.com/publications/gonzalez-glasses16.htmlGonzalez:glasses16Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor AnalysisFactor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.Sun, 17 Apr 2016 00:00:00 +0000
http://inverseprobability.com/publications/damianou-ibfa16.html
http://inverseprobability.com/publications/damianou-ibfa16.htmlDamianou:ibfa16Detecting periodicities with Gaussian processesWe consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matérn family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in the arabidopsis genome.Wed, 13 Apr 2016 00:00:00 +0000
http://inverseprobability.com/publications/durrande-periodicities16.html
http://inverseprobability.com/publications/durrande-periodicities16.htmlDurrande-periodicities16Chained <span>G</span>aussian ProcessesGaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to an linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models “Chained Gaussian Processes”: the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions.Fri, 01 Jan 2016 00:00:00 +0000
http://inverseprobability.com/publications/saul-chained16.html
http://inverseprobability.com/publications/saul-chained16.htmlSaul:chained16Batch <span>B</span>ayesian Optimization via Local PenalizationThe popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These could either be computational or physical facets of the process being optimized. Batch methods, however, require the modeling of the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. We investigate this issue and propose a highly effective heuristic based on an estimate of the function’s Lipschitz constant that captures the most important aspect of this interaction–local repulsion–at negligible computational overhead. A penalized acquisition function is used to collect batches of points minimizing the non-parallelizable computational effort. The resulting algorithm compares very well, in run-time, with much more elaborate alternatives.Fri, 01 Jan 2016 00:00:00 +0000
http://inverseprobability.com/publications/gonzalez-batch16.html
http://inverseprobability.com/publications/gonzalez-batch16.htmlGonzalez:batch16Variational Inference for Latent Variables and Uncertain Inputs in <span>G</span>aussian ProcessesThe Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear dimensionality reduction that has been widely applied. However, the current approach for training GP-LVMs is based on maximum likelihood, where the latent projection variables are maximised over rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard variational inference framework that allows to approximately integrate out the latent variables and subsequently train a GP-LVM by maximising an analytic lower bound on the exact marginal likelihood. We apply this method for learning a GP-LVM from i.i.d. observations and for learning non-linear dynamical systems where the observations are temporally correlated. We show that a benefit of the variational Bayesian procedure is its robustness to over-fitting and its ability to automatically select the dimensionality of the non-linear latent space. The resulting framework is generic, flexible and easy to extend for other purposes, such as Gaussian process regression with uncertain or partially missing inputs. We demonstrate our method on synthetic data and standard machine learning benchmarks, as well as challenging real world datasets, including high resolution video data.Fri, 01 Jan 2016 00:00:00 +0000
http://inverseprobability.com/publications/damianou-variational15.html
http://inverseprobability.com/publications/damianou-variational15.htmlDamianou-variational15Variationally Auto-Encoded Deep <span>G</span>aussian ProcessesWe develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.Fri, 01 Jan 2016 00:00:00 +0000
http://inverseprobability.com/publications/dai-variationally16.html
http://inverseprobability.com/publications/dai-variationally16.htmlDai:variationally16Genome-wide modeling of transcription kinetics reveals patterns of RNA production delaysGenes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor α activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.Mon, 05 Oct 2015 00:00:00 +0000
http://inverseprobability.com/publications/honkela-genome15.html
http://inverseprobability.com/publications/honkela-genome15.htmlHonkela-genome15A reverse-engineering approach to dissect post-translational modulators of transcription factor’s activity from transcriptional dataBackground Transcription factors (TFs) act downstream of the major signalling pathways functioning as master regulators of cell fate. Their activity is tightly regulated at the transcriptional, post-transcriptional and post-translational level. Proteins modifying TF activity are not easily identified by experimental high-throughput methods. Results We developed a computational strategy, called Differential Multi-Information (DMI), to infer post-translational modulators of a transcription factor from a compendium of gene expression profiles (GEPs). DMI is built on the hypothesis that the modulator of a TF (i.e. kinase/phosphatases), when expressed in the cell, will cause the TF target genes to be co-expressed. On the contrary, when the modulator is not expressed, the TF will be inactive resulting in a loss of co-regulation across its target genes. DMI detects the occurrence of changes in target gene co-regulation for each candidate modulator, using a measure called Multi-Information. We validated the DMI approach on a compendium of 5,372 GEPs showing its predictive ability in correctly identifying kinases regulating the activity of 14 different transcription factors. Conclusions DMI can be used in combination with experimental approaches as high-throughput screening to efficiently improve both pathway and target discovery. An on-line web-tool enabling the user to use DMI to identify post-transcriptional modulators of a transcription factor of interest che be found at http://dmi.tigem.it.Thu, 03 Sep 2015 00:00:00 +0000
http://inverseprobability.com/publications/gambardella-reverse15.html
http://inverseprobability.com/publications/gambardella-reverse15.htmlGambardella-reverse15Semi-described and semi-supervised learning with <span>G</span>aussian processesPropagating input uncertainty through non-linear Gaussian process (GP) mappings is intractable. This hinders the task of training GPs using uncertain and partially observed inputs. In this paper we refer to this task as “semi-described learning”. We then introduce a GP framework that solves both, the semi-described and the semi-supervised learning problems (where missing values occur in the outputs). Auto-regressive state space simulation is also recognised as a special case of semi-described learning. To achieve our goal we develop variational methods for handling semi-described inputs in GPs, and couple them with algorithms that allow for imputing the missing values while treating the uncertainty in a principled, Bayesian manner. Extensive experiments on simulated and real-world data study the problems of iterative forecasting and regression/classification with missing values. The results suggest that the principled propagation of uncertainty stemming from our framework can significantly improve performance in these tasks.Thu, 01 Jan 2015 00:00:00 +0000
http://inverseprobability.com/publications/damianou-semi15.html
http://inverseprobability.com/publications/damianou-semi15.htmlDamianou:semi15Malaria surveillance with multiple data sources using Gaussian process modelsA statistical framework for monitoring the health of a population should ideally be able to combine data from a wide variety of sources, such as remote sensing, telecoms, and official health records, in a principled manner. Gaussian process regression is commonly used to visualise disease incidence by interpolating values across a map; in this article, we show how it can be extended to deal with many different types of information by introducing a flexible covariance structure across data sources. Combining many data sources in a single model provides a number of practical advantages, such as the ability to automatically determine the importance of each data source through likelihood optimisation, and to deal with missing values. We show the basic idea with an application of malaria density modeling across Uganda using administrative records and remote sensing vegetation index data, and then go on to describe further extensions such as the incorporation of human mobility data extracted from mobile phone call detail records (CDRs).Tue, 09 Dec 2014 00:00:00 +0000
http://inverseprobability.com/publications/mubangizi-malaria14.html
http://inverseprobability.com/publications/mubangizi-malaria14.htmlMubangizi:malaria14Consistent mapping of government malaria records across a changing territory delimitationMon, 22 Sep 2014 00:00:00 +0000
http://inverseprobability.com/publications/andrade-consistent14.html
http://inverseprobability.com/publications/andrade-consistent14.htmlAndrade-consistent14Metrics for Probabilistic GeometriesWe investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.Wed, 23 Jul 2014 00:00:00 +0000
http://inverseprobability.com/publications/tosi-metrics14.html
http://inverseprobability.com/publications/tosi-metrics14.htmlTosi:metrics14Inference of <span>RNA</span> Polymerase <span>II</span> Transcription Dynamics from Chromatin Immunoprecipitation Time Course DataWed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/maina-inference14.html
http://inverseprobability.com/publications/maina-inference14.htmlMaina-inference14Tilted Variational <span>B</span>ayesWe present a novel method for approximate inference. Using some of the constructs from expectation propagation (EP), we derive a lower bound of the marginal likelihood in a similar fashion to variational Bayes (VB). The method combines some of the benefits of VB and EP: it can be used with light-tailed likelihoods (where traditional VB fails), and it provides a lower bound on the marginal likelihood. We apply the method to Gaussian process classification, a situation where the Kullback-Leibler divergence minimized in traditional VB can be infinite, and to robust Gaussian process regression, where the inference process is dramatically simplified in comparison to EP.\
\
Code to reproduce all the experiments can be found at <http://github.com/SheffieldML/TVB>.Wed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/hensman-tvb14.html
http://inverseprobability.com/publications/hensman-tvb14.htmlHensman:tvb14Nested Variational Compression in Deep <span>G</span>aussian ProcessesWed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/hensman-nested14.html
http://inverseprobability.com/publications/hensman-nested14.htmlHensman:nested14Fast nonparametric clustering of structured time-seriesIn this publication, we combine two Bayesian nonparametric models: the Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP model is to introduce a variation on the GP prior which enables us to model structured time-series data, i.e. data containing groups where we wish to model inter- and intra-group variability. Our innovation in the DP model is an implementation of a new fast collapsed variational inference procedure which enables us to optimize our variational approximation significantly faster than standard VB approaches. In a biological time series application we show how our model better captures salient features of the data, leading to better consistency with existing biological classifications, while the associated inference algorithm provides a significant speed-up over EM-based variational inference.Wed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/hensman-fast14.html
http://inverseprobability.com/publications/hensman-fast14.htmlHensman-fast14Warped linear mixed models for the genetic analysis of transformed phenotypesLinear mixed models (LMMs) are a powerful and established tool for studying genotype–phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction.Wed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/fusi-warped14.html
http://inverseprobability.com/publications/fusi-warped14.htmlFusi-warped14Variational Inference for Uncertainty on the Inputs of <span>G</span>aussian Process ModelsWed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/damianou-variational14.html
http://inverseprobability.com/publications/damianou-variational14.htmlDamianou-variational14Gaussian Process Models with Parallelization and <span>GPU</span> accelerationWed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/dai-gpu14.html
http://inverseprobability.com/publications/dai-gpu14.htmlDai:gpu14Hybrid Discriminative-Generative Approaches with <span>G</span>aussian ProcessesMachine learning practitioners are often faced with a choice between a discriminative and a generative approach to modelling. Here, we present a model based on a hybrid approach that breaks down some of the barriers between the discriminative and generative points of view, allowing continuous dimensionality reduction of hybrid discrete-continous data, discriminative classification with missing inputs and manifold learning informed by class labels.Wed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/andrade-hybrid14.html
http://inverseprobability.com/publications/andrade-hybrid14.htmlAndrade:hybrid14Linear Latent Force Models Using <span>G</span>aussian ProcessesPurely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.Mon, 13 May 2013 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-llfm13.html
http://inverseprobability.com/publications/alvarez-llfm13.htmlAlvarez-llfm13Deep <span>G</span>aussian ProcessesIn this paper we introduce deep Gaussian process (GP) models. Deep GPs are a deep belief network based on Gaussian process mappings. The data is modeled as the output of a multivariate GP. The inputs to that Gaussian process are then governed by another GP. A single layer model is equivalent to a standard GP or the GP latent variable model (GP-LVM). We perform inference in the model by approximate variational marginalization. This results in a strict lower bound on the marginal likelihood of the model which we use for model selection (number of layers and nodes per layer). Deep belief networks are typically applied to relatively large data sets using stochastic gradient descent for optimization. Our fully Bayesian treatment allows for the application of deep models even when data is scarce. Model selection by our variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples.Mon, 29 Apr 2013 00:00:00 +0000
http://inverseprobability.com/publications/damianou-deepgp13.html
http://inverseprobability.com/publications/damianou-deepgp13.htmlDamianou:deepgp13The Bigraphical LassoThe i.i.d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data points. We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. A prominent product in spectral graph theory, this structure has appealing properties for regression, enhanced sparsity and interpretability. To deal with the parameter explosion we introduce L1 penalties and fit the model through a flip-flop algorithm that results in a linear number of lasso regressions.Tue, 01 Jan 2013 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-bigraphical13.html
http://inverseprobability.com/publications/kalaitzis-bigraphical13.htmlKalaitzis:bigraphical13Mining Regulatory Network Connections by Ranking Transcription Factor Target Genes Using Time Series Expression DataTue, 01 Jan 2013 00:00:00 +0000
http://inverseprobability.com/publications/honkela-mining12.html
http://inverseprobability.com/publications/honkela-mining12.htmlHonkela:mining12Hierarchical <span>B</span>ayesian modelling of gene expression time series across irregularly sampled replicates and clusters**Background**\
\
Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications.\
\
**Results**\
\
We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method’s capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method’s ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications.\
\
**Conclusion**\
\
The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors’ website: <http://staffwww.dcs.shef.ac.uk/people/J.Hensman/>.Tue, 01 Jan 2013 00:00:00 +0000
http://inverseprobability.com/publications/hensman-hierarchical13.html
http://inverseprobability.com/publications/hensman-hierarchical13.htmlHensman-hierarchical13<span>G</span>aussian Processes for Big DataTue, 01 Jan 2013 00:00:00 +0000
http://inverseprobability.com/publications/hensman-bigdata13.html
http://inverseprobability.com/publications/hensman-bigdata13.htmlHensman:bigdata13Detecting Regulatory Gene-Environment Interactions with Unmeasured Environmental Factors**Motivation**: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits.\
\
**Results**: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it’s not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability.\
\
**Availability**: Software available at <http://ml.sheffield.ac.uk/qtl/limmi>\
\
**Contact**: [oliver.stegle@ebi.ac.uk](oliver.stegle@ebi.ac.uk), [nicolo.fusi@sheffield.ac.uk](nicolo.fusi@sheffield.ac.uk)Tue, 01 Jan 2013 00:00:00 +0000
http://inverseprobability.com/publications/fusi-detecting13.html
http://inverseprobability.com/publications/fusi-detecting13.htmlFusi-detecting13Unravelling the enigma of selective vulnerability in neurodegeneration: motor neurons resistant to degeneration in <span>ALS</span> show distinct gene expression characteristics and decreased susceptibility to excitotoxicityA consistent clinical feature of amyotrophic lateral sclerosis (ALS) is the sparing of eye movements and the function of external sphincters, with corresponding preservation of motor neurons in the brainstem oculomotor nuclei, and of Onuf’s nucleus in the sacral spinal cord. Studying the differences in properties of neurons that are vulnerable and resistant to the disease process in ALS may provide insights into the mechanisms of neuronal degeneration, and identify targets for therapeutic manipulation. We used microarray analysis to determine the differences in gene expression between oculomotor and spinal motor neurons, isolated by laser capture microdissection from the midbrain and spinal cord of neurologically normal human controls. We compared these to transcriptional profiles of oculomotor nuclei and spinal cord from rat and mouse, obtained from the GEO omnibus database. We show that oculomotor neurons have a distinct transcriptional profile, with significant differential expression of 1,757 named genes (q < 0.001). Differentially expressed genes are enriched for the functional categories of synaptic transmission, ubiquitin-dependent proteolysis, mitochondrial function, transcriptional regulation, immune system functions, and the extracellular matrix. Marked differences are seen, across the three species, in genes with a function in synaptic transmission, including several glutamate and GABA receptor subunits. Using patch clamp recording in acute spinal and brainstem slices, we show that resistant oculomotor neurons show a reduced AMPA-mediated inward calcium current, and a higher GABA-mediated chloride current, than vulnerable spinal motor neurons. The findings suggest that reduced susceptibility to excitotoxicity, mediated in part through enhanced GABAergic transmission, is an important determinant of the relative resistance of oculomotor neurons to degeneration in ALS.Tue, 01 Jan 2013 00:00:00 +0000
http://inverseprobability.com/publications/brockington-unravelling13.html
http://inverseprobability.com/publications/brockington-unravelling13.htmlBrockington-unravelling13Identifying Targets of Multiple Co-regulated Transcription Factors from Expression Time-series by <span>B</span>ayesian Model Comparison**Background**\
\
Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-speciﬁc networks involving a few interacting transcription factors (TFs) and all of their target genes. **Results**\
\
We present a computational framework for Bayesian statistical inference of target genes of multiple interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models that describe transcription of target genes taking into account combinatorial regulation. The method consists of a training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our methodology to identify targets of ﬁve TFs regulating Drosophila melanogaster mesoderm development. We ﬁnd that conﬁdent predicted links between TFs and targets are signiﬁcantly enriched for supporting ChIP-chip binding events and annotated TF-gene interations. Our method statistically signiﬁcantly outperforms existing alternatives. **Conclusions**\
\
Our results show that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity. Introducing data from several different experimental perturbations signiﬁcantly increases the accuracy.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/titsias-identifying12.html
http://inverseprobability.com/publications/titsias-identifying12.htmlTitsias-identifying12Modeling Meiotic Chromosomes Indicates a Size Dependent Contribution of Telomere Clustering and Chromosome Rigidity to Homologue JuxtapositionMeiosis is the cell division that halves the genetic component of diploid cells to form gametes or spores. To achieve this, meiotic cells undergo a radical spatial reorganisation of chromosomes. This reorganisation is a prerequisite for the pairing of parental homologous chromosomes and the reductional division, which halves the number of chromosomes in daughter cells. Of particular note is the change from a centromere clustered layout (Rabl configuration) to a telomere clustered conformation (bouquet stage). The contribution of the bouquet structure to homologous chromosome pairing is uncertain. We have developed a new in silico model to represent the chromosomes of Saccharomyces cerevisiae in space, based on a worm-like chain model constrained by attachment to the nuclear envelope and clustering forces. We have asked how these constraints could influence chromosome layout, with particular regard to the juxtaposition of homologous chromosomes and potential nonallelic, ectopic, interactions. The data support the view that the bouquet may be sufficient to bring short chromosomes together, but the contribution to long chromosomes is less. We also find that persistence length is critical to how much influence the bouquet structure could have, both on pairing of homologues and avoiding contacts with heterologues. This work represents an important development in computer modeling of chromosomes, and suggests new explanations for why elucidating the functional significance of the bouquet by genetics has been so difficult.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/penfold-meiotic12.html
http://inverseprobability.com/publications/penfold-meiotic12.htmlPenfold-meiotic12Improved linear mixed models for genome-wide association studiesSun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/listgarten-improved12.html
http://inverseprobability.com/publications/listgarten-improved12.htmlListgarten-improved12A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New ModelsWe introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-unifying12.html
http://inverseprobability.com/publications/lawrence-unifying12.htmlLawrence-unifying12Residual Component AnalysisProbabilistic principal component analysis (PPCA) seeks a low dimensional representation of a data set in the presence of independent spherical Gaussian noise, $\Sigma = \sigma^2\mathbf{I}$. The maximum likelihood solution for the model is an eigenvalue problem on the sample covariance matrix. In this paper we consider the situation where the data variance is already partially explained by other factors, e.g. conditional dependencies between the covariates, or temporal correlations leaving some residual variance. We decompose the residual variance into its components through a generalised eigenvalue problem, which we call residual component analysis (RCA). We explore a range of new algorithms that arise from the framework, including one that factorises the covariance of a Gaussian density into a low-rank and a sparse-inverse component. We illustrate the ideas on the recovery of a protein-signaling network, a gene expression time-series data set and the recovery of the human skeleton from motion capture 3-D cloud data.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-rca12.html
http://inverseprobability.com/publications/kalaitzis-rca12.htmlKalaitzis:rca12Fast variational inference in the Conjugate Exponential familyWe present a general method for deriving collapsed variational inference algorithms for probabilistic models in the conjugate exponential family. Our method unifies many existing approaches to collapsed variational inference. Our collapsed variational inference leads to a new lower bound on the marginal likelihood. We exploit the information geometry of the bound to derive much faster optimization methods based on conjugate gradients for these models. Our approach is very general and is easily applied to any model where the mean field update equations have been derived. Empirically we show significant speed-ups for probabilistic models optimized using our bound.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/hensman-fast12.html
http://inverseprobability.com/publications/hensman-fast12.htmlHensman:fast12<span>G</span>aussian Processes for Big Data with Stochastic Variational InferenceSun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/hensman-bigdata12.html
http://inverseprobability.com/publications/hensman-bigdata12.htmlHensman:bigdata12Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics StudiesExpression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at <http://ml.sheffield.ac.uk/qtl/>.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/fusi-genomics12.html
http://inverseprobability.com/publications/fusi-genomics12.htmlFusi-genomics12Genome-wide occupancy links Hoxa2 to Wnt-$\beta$-catenin signaling in mouse embryonic developmentThe regulation of gene expression is central to developmental programs and largely depends on the binding of sequence-specific transcription factors with cis-regulatory elements in the genome. Hox transcription factors specify the spatial coordinates of the body axis in all animals with bilateral symmetry, but a detailed knowledge of their molecular function in instructing cell fates is lacking. Here, we used chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) to identify Hoxa2 genomic locations in a time and space when it is actively instructing embryonic development in mouse. Our data reveals that Hoxa2 has large genome coverage and potentially regulates thousands of genes. Sequence analysis of Hoxa2-bound regions identifies high occurrence of two main classes of motifs, corresponding to Hox and Pbx–Hox recognition sequences. Examination of the binding targets of Hoxa2 faithfully captures the processes regulated by Hoxa2 during embryonic development; in addition, it uncovers a large cluster of potential targets involved in the Wnt-signaling pathway. In vivo examination of canonical Wnt–$\beta$-catenin signaling reveals activity specifically in Hoxa2 domain of expression, and this is undetectable in Hoxa2 mutant embryos. The comprehensive mapping of Hoxa2-binding sites provides a framework to study Hox regulatory networks in vertebrate developmental processes.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/donaldson-genome12.html
http://inverseprobability.com/publications/donaldson-genome12.htmlDonaldson-genome12Manifold Relevance DeterminationIn this paper we present a fully Bayesian latent variable model which exploits conditional nonlinear (in)-dependence structures to learn an efficient latent representation. The latent space is factorized to represent shared and private information from multiple views of the data. In contrast to previous approaches, we introduce a relaxation to the discrete segmentation and allow for a “softly” shared latent space. Further, Bayesian techniques allow us to automatically estimate the dimensionality of the latent spaces. The model is capable of capturing structure underlying extremely high dimensional spaces. This is illustrated by modelling unprocessed images with tenths of thousands of pixels. This also allows us to directly generate novel images from the trained model by sampling from the discovered latent spaces. We also demonstrate the model by prediction of human pose in an ambiguous setting. Our Bayesian framework allows us to perform disambiguation in a principled manner by including latent space priors which incorporate the dynamic nature of the data.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/damianou-manifold12.html
http://inverseprobability.com/publications/damianou-manifold12.htmlDamianou:manifold12Kernels for Vector-Valued Functions: A ReviewKernel methods are among the most popular techniques in machine learning. From a regularization perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a probabilistic perspec- tive they are the key in the context of Gaussian processes, where the kernel function is known as the covariance function. Traditionally, kernel methods have been used in supervised learning problems with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partially by frameworks like multitask learning. In this monograph, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-vector12.html
http://inverseprobability.com/publications/alvarez-vector12.htmlAlvarez-vector12Computationally Efficient Convolved Multiple Output <span>Gaussian</span> ProcessesRecently there has been an increasing interest in regression methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different efficient approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in school exams score prediction, pollution prediction and gene expression dataSun, 01 May 2011 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-computationally11.html
http://inverseprobability.com/publications/alvarez-computationally11.htmlAlvarez-computationally11Demodulation as Probabilistic InferenceSat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/turner-pad11.html
http://inverseprobability.com/publications/turner-pad11.htmlTurner-pad11Markov chain <span>M</span>onte <span>C</span>arlo algorithms for <span>G</span>aussian processes’What’s going to happen next?’ Time series data hold the answers, and Bayesian methods represent the cutting edge in learning what they have to say. This ambitious book is the first unified treatment of the emerging knowledge-base in Bayesian time series techniques. Exploiting the unifying framework of probabilistic graphical models, the book covers approximation schemes, both Monte Carlo and deterministic, and introduces switching, multi-object, non-parametric and agent-based models in a variety of application environments. It demonstrates that the basic framework supports the rapid creation of models tailored to specific applications and gives insight into the computational complexity of their implementation. The authors span traditional disciplines such as statistics and engineering and the more recently established areas of machine learning and pattern recognition. Readers with a basic understanding of applied probability, but no experience with time series analysis, are guided from fundamental concepts to the state-of-the-art in research and practice.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/titsias-mcmcgp11.html
http://inverseprobability.com/publications/titsias-mcmcgp11.htmlTitsias:mcmcgp11Efficient Inference in Matrix-Variate <span>G</span>aussian Models with i.i.d. Observation NoiseSat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/stegle-sparse11.html
http://inverseprobability.com/publications/stegle-sparse11.htmlStegle:sparse11Overlapping Mixtures of <span>G</span>aussian Processes for the Data Association ProblemIn this work we introduce a mixture of GPs to address the data association problem, i.e., to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following “trajectories” across input space. We use a non-standard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/lazaro-overlapping11.html
http://inverseprobability.com/publications/lazaro-overlapping11.htmlLazaro-overlapping11Spectral Dimensionality Reduction via Maximum EntropyWe introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We directly maximize the likelihood and show results that are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-spectral11.html
http://inverseprobability.com/publications/lawrence-spectral11.htmlLawrence:spectral11<span>G</span>aussian Process Inference for Differential Equation Models of Transcriptional RegulationSat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gpinference11.html
http://inverseprobability.com/publications/lawrence-gpinference11.htmlLawrence:gpinference11A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through <span>Gaussian</span> Process Regression<span>**Background**</span>\
\
The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process.\
\
<span>**Results**</span>\
\
We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach significantly outperforms the current state of the art.\
\
<span>**Conclusions**</span>\
\
Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of micro-array time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-simple11.html
http://inverseprobability.com/publications/kalaitzis-simple11.htmlKalaitzis-simple11Residual Component AnalysisProbabilistic principal component analysis (PPCA) seeks a low dimensional representation of a data set in the presence of independent spherical Gaussian noise, $\Sigma = (\sigma^2)\mathbf{I}$. The maximum likelihood solution for the model is an eigenvalue problem on the sample covariance matrix. In this paper we consider the situation where the data variance is already partially explained by other factors, e.g. covariates of interest, or temporal correlations leaving some residual variance. We decompose the residual variance into its components through a generalized eigenvalue problem, which we call residual component analysis (RCA). We show that canonical covariates analysis (CCA) is a special case of our algorithm and explore a range of new algorithms that arise from the framework. We illustrate the ideas on a gene expression time series data set and the recovery of human pose from silhouette.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-rca11.html
http://inverseprobability.com/publications/kalaitzis-rca11.htmlKalaitzis:rca11tigre: Transcription factor inference through <span>Gaussian</span> process reconstruction of expression for Bioconductor**Summary**: tigre is an R/Bioconductor package for inference of transcription factor activity and ranking candidate target genes from gene expression time series. The underlying methodology is based on Gaussian process inference on a differential equation model that allows the use of short, unevenly sampled, time series. The method has been designed with efficient parallel implementation in mind, and the package supports parallel operation even without additional software.\
\
**Availability**: The tigre package is included in Bioconductor since release 2.6 for R 2.11. The package and a user’s guide are available at http://www.bioconductor.org.\
\
**Contact**: antti.honkela@hiit.fi; m.rattray@sheffield.ac.uk; n.lawrence@dcs.shef.ac.ukSat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/honkela-tigre11.html
http://inverseprobability.com/publications/honkela-tigre11.htmlHonkela-tigre11Accurate modeling of confounding variation in <span>eQTL</span> studies leads to a great increase in power to detect trans-regulatory effectsExpression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.\
\
Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.\
\
We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/fusi-accurate11.html
http://inverseprobability.com/publications/fusi-accurate11.htmlFusi:accurate11Variational <span>Gaussian</span> Process Dynamical SystemsHigh dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/damianou-vgpds11.html
http://inverseprobability.com/publications/damianou-vgpds11.htmlDamianou:vgpds11Linear Latent Force Models Using <span>G</span>aussian ProcessesPurely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-llfm11.html
http://inverseprobability.com/publications/alvarez-llfm11.htmlAlvarez:llfm11Kernels for Vector-Valued Functions: a ReviewKernel methods are among the most popular techniques in machine learning. From a frequentist/discriminative perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a Bayesian/generative perspective they are the key in the context of Gaussian processes, where the kernel function is also known as the covariance function. Traditionally, kernel methods have been used in supervised learning problem with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partly by frameworks like multitask learning. In this paper, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-kernels11.html
http://inverseprobability.com/publications/alvarez-kernels11.htmlAlvarez:kernels11Model-based Method for Transcription Factor Target Identification with Limited DataWe present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.Tue, 27 Apr 2010 00:00:00 +0000
http://inverseprobability.com/publications/honkela-modelbased10.html
http://inverseprobability.com/publications/honkela-modelbased10.htmlHonkela-modelbased10Elementary properties of <span>CaV</span>1.3 <span>Ca</span>2+ channels expressed in mouse cochlear inner hair cellsMammalian cochlear inner hair cells (IHCs) are specialized to process developmental signals during immature stages and sound stimuli in adult animals. These signals are conveyed onto auditory afferent nerve fibres. Neurotransmitter release at IHC ribbon synapses is controlled by L-type CaV1.3 Ca2+ channels, the biophysics of which are still unknown in native mammalian cells. We have investigated the localization and elementary properties of Ca2+ channels in immature mouse IHCs under near-physiological recording conditions. CaV1.3 Ca2+ channels at the cell pre-synaptic site co-localize with about half of the total number of ribbons present in immature IHCs. These channels activated at relatively hyperpolarized membrane potentials (about -70 mV), showed a relatively short first latency and weak inactivation, which would allow IHCs to generate and accurately encode spontaneous Ca2+ action potential activity characteristic of these immature cells. The CaV1.3 Ca2+ channels showed a very low open probability (about 0.15 at -20 mV: near the peak of an action potential). Comparison of elementary and macroscopic Ca2+ currents indicated that very few Ca2+ channels are associated with each docked vesicle at IHC ribbon synapses. Finally, we found that the open probability of Ca2+ channels, but not their opening time, was voltage dependent. This finding provides a possible correlation between presynaptic Ca2+ channel properties and the characteristic frequency/amplitude of EPSCs in auditory afferent fibres.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/zampini-elementary09.html
http://inverseprobability.com/publications/zampini-elementary09.htmlZampini-elementary09Bayesian <span>G</span>aussian Process Latent Variable ModelWe introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/titsias-bayesgplvm10.html
http://inverseprobability.com/publications/titsias-bayesgplvm10.htmlTitsias:bayesGPLVM10A Unifying Probabilistic Perspective for Spectral Dimensionality ReductionWe introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We develop new algorithms that directly maximize the likelihood and show that these new algorithms are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-unifying10.html
http://inverseprobability.com/publications/lawrence-unifying10.htmlLawrence:unifying10Introduction to Learning and Inference in Computational Systems BiologyComputational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-licsbintro10.html
http://inverseprobability.com/publications/lawrence-licsbintro10.htmlLawrence:licsbintro10Gaussian Processes for Missing Species in Biochemical SystemsComputational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-licsbgp10.html
http://inverseprobability.com/publications/lawrence-licsbgp10.htmlLawrence:licsbgp10A Brief Introduction to <span>B</span>ayesian InferenceComputational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-licsbbayes10.html
http://inverseprobability.com/publications/lawrence-licsbbayes10.htmlLawrence:licsbbayes10TFInfer: a tool for probabilistic inference of transcription factor activities**Summary**: TFInfer is a novel open access, standalone tool for genome-wide inference of transcription factor activities from gene expression data. Based on an earlier MATLAB version, the software has now been extended in a number of ways. It has been significantly optimised in terms of performance, and it was given novel functionality, by allowing the user to model both time series and data from multiple independent conditions. With a full documentation and intuitive graphical user interface, together with an in-built data base of yeast and Escherichia coli transcription factors, the software does not require any mathematical or computational expertise to be used effectively.\
**Availability**: <http://homepages.inf.ed.ac.uk/gsanguin/TFInfer.html> **Contact**: [gsanguin@staffmail.ed.ac.uk](gsanguin@staffmail.ed.ac.uk)Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/asif-tfinfer10.html
http://inverseprobability.com/publications/asif-tfinfer10.htmlAsif-tfinfer10Switched Latent Force Models for Movement SegmentationLatent force models encode the interaction between multiple related dynamical systems in the form of a kernel or covariance function. Each variable to be modeled is represented as the output of a differential equation and each differential equation is driven by a weighted sum of latent functions with uncertainty given by a Gaussian process prior. In this paper we consider employing the latent force model framework for the problem of determining robot motor primitives. To deal with discontinuities in the dynamical systems or the latent driving force we intro- duce an extension of the basic latent force model, that switches between different latent functions and potentially different dynamical systems. This creates a versatile representation for robot movements that can capture discrete changes and non-linearities in the dynamics. We give illustrative examples on both synthetic data and for striking movements recorded using a Barrett WAM robot as haptic in- put device. Our inspiration is robot motor primitives, but we expect our model to have wide application for dynamical systems including models for human motion capture data and systems biology.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-switched10.html
http://inverseprobability.com/publications/alvarez-switched10.htmlAlvarez:switched10Efficient Multioutput <span>G</span>aussian Processes through Variational Inducing KernelsInterest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Álvarez and Lawrence @Alvarez:convolved08 recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias @Titsias:variational09 to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-efficient10.html
http://inverseprobability.com/publications/alvarez-efficient10.htmlAlvarez:efficient10Efficient Sampling for <span>G</span>aussian Process Inference using Control VariablesSampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by continuously minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/titsias-efficient08.html
http://inverseprobability.com/publications/titsias-efficient08.htmlTitsias:efficient08puma: a <span>B</span>ioconductor package for Propagating Uncertainty in Microarray Analysis<span>**Background**</span>\
\
Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied.\
\
<span>**Results**</span>\
\
puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation.\
\
<span>**Conclusions**</span>\
\
For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/pearson-puma09.html
http://inverseprobability.com/publications/pearson-puma09.htmlPearson-puma09The variational <span>G</span>aussian approximation revisitedThu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/opper-variational09.html
http://inverseprobability.com/publications/opper-variational09.htmlOpper-variational09Non-Linear Matrix Factorization with <span>G</span>aussian ProcessesA popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous state-of-the-art performance.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nlmf09.html
http://inverseprobability.com/publications/lawrence-nlmf09.htmlLawrence:nlmf09Backing Off: Hierarchical Decomposition of Activity for 3D Novel Pose RecoveryFor model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, lowdimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity modelsThu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/darby-backing09.html
http://inverseprobability.com/publications/darby-backing09.htmlDarby:backing09Accelerating <span>B</span>ayesian Inference over Nonlinear Differential Equations with <span>G</span>aussian ProcessesIdentification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our methdo involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible *without* solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and elay differential equations, and provide a comprehensive comparison with current state of the art methods.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/calderhead-accelerating08.html
http://inverseprobability.com/publications/calderhead-accelerating08.htmlCalderhead:accelerating08Variational Inducing Kernels for Sparse Convolved Multiple Output <span>G</span>aussian ProcessesInterest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Álvarez and Lawrence (2009) recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-viktech09.html
http://inverseprobability.com/publications/alvarez-viktech09.htmlAlvarez:vikTech09Sparse Convolved Multiple Output <span>G</span>aussian ProcessesRecently there has been an increasing interest in methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different sparse approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in pollution prediction, school exams score prediction and gene expression data.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-multitech09.html
http://inverseprobability.com/publications/alvarez-multitech09.htmlAlvarez:multiTech09Latent Force ModelsPurely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modeling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from computational biology, motion capture and geostatistics.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-lfm09.html
http://inverseprobability.com/publications/alvarez-lfm09.htmlAlvarez:lfm09Sparse Convolved <span>G</span>aussian Processes for Multi-output RegressionWe present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-convolved08.html
http://inverseprobability.com/publications/alvarez-convolved08.htmlAlvarez:convolved08Topologically-Constrained Latent Variable ModelsIn dimensionality reduction approaches, the data are typically embedded in a Euclidean latent space. However for some data sets this is inappropriate. For example, in human motion data we expect latent spaces that are cylindrical or a toroidal, that are poorly captured with a Euclidean space. In this paper, we present a range of approaches for embedding data in a non-Euclidean latent space. Our focus is the Gaussian Process latent variable model. In the context of human motion modeling this allows us to (a) learn models with interpretable latent directions enabling, for example, style/content separation, and (b) generalise beyond the data set enabling us to learn transitions between motion styles even though such transitions are not present in the data.Tue, 01 Jan 2008 00:00:00 +0000
http://inverseprobability.com/publications/urtasun-topology08.html
http://inverseprobability.com/publications/urtasun-topology08.htmlUrtasun:topology08Probabilistic approach to detecting dependencies between data setsTue, 01 Jan 2008 00:00:00 +0000
http://inverseprobability.com/publications/klami-probabilistic08.html
http://inverseprobability.com/publications/klami-probabilistic08.htmlKlami-probabilistic08<span>G</span>aussian Process Modelling of Latent Chemical Species: Applications to Inferring Transcription Factor Activities<span>**Motivation:**</span> Inference of *latent chemical species* in biochemical interaction networks is a key problem in estimation of the structure and parameters of the genetic, metabolic and protein interaction networks that underpin all biological processes. We present a framework for Bayesian marginalisation of these latent chemical species through Gaussian process priors.\
\
<span>**Results:**</span> We demonstrate our general approach on three different biological examples of single input motifs, including both activation and repression of transcription. We focus in particular on the problem of inferring transcription factor activity when the concentration of active protein cannot easily be measured. We show how the uncertainty in the inferred transcription factor activity can be integrated out in order to derive a likelihood function that can be used for the estimation of regulatory model parameters. An advantage of our approach is that we avoid the use of a coarse-grained discretization of continuous-time functions, which would lead to a large number of additional parameters to be estimated. We develop efficient exact and approximate inference schemes, which are much more efficient than competing sampling-based schemes and therefore provide us with a practical toolkit for model-based inference.\
\
<span>**Availability:**</span> The software and data for recreating all the experiments in this paper is available in MATLAB from <http://inverseprobability.com/gpsim>\
\
<span>**Contact:**</span> Neil LawrenceTue, 01 Jan 2008 00:00:00 +0000
http://inverseprobability.com/publications/gao-latent08.html
http://inverseprobability.com/publications/gao-latent08.htmlGao-latent08<span>G</span>aussian Process Latent Variable Models For Human Pose EstimationWe describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) \[1\] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.Tue, 01 Jan 2008 00:00:00 +0000
http://inverseprobability.com/publications/ek-pose07.html
http://inverseprobability.com/publications/ek-pose07.htmlEk:pose07Ambiguity Modeling in Latent SpacesWe are interested in the situation where we have two or more representations of an underlying phenomenon. In particular we are interested in the scenario where the representation are complementary. This implies that a single individual representation is not sufficient to fully discriminate a specific instance of the underlying phenomenon, it also means that each representation is an ambiguous representation of the other complementary spaces. In this paper we present a latent variable model capable of consolidating multiple complementary representations. Our method extends canonical correlation analysis by introducing additional latent spaces that are specific to the different representations, thereby explaining the full variance of the observations. These additional spaces, explaining representation specific variance, separately model the variance in a representation ambiguous to the other. We develop a spectral algorithm for fast computation of the embeddings and a probabilistic model (based on Gaussian processes) for validation and inference. The proposed model has several potential application areas, we demonstrate its use for multi-modal regression on a benchmark human pose estimation data set.Tue, 01 Jan 2008 00:00:00 +0000
http://inverseprobability.com/publications/ek-ambiguity08.html
http://inverseprobability.com/publications/ek-ambiguity08.htmlEk:ambiguity08Model-driven detection of Clean Speech Patches in NoiseListeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.Wed, 01 Aug 2007 00:00:00 +0000
http://inverseprobability.com/publications/laidler-model07.html
http://inverseprobability.com/publications/laidler-model07.htmlLaidler:model07Modelling transcriptional regulation using <span>G</span>aussian ProcessesModelling the dynamics of transcriptional processes in the cell requires the knowledge of a number of key biological quantities. While some of them are relatively easy to measure, such as mRNA decay rates and mRNA abundance levels, it is still very hard to measure the active concentration levels of the transcription factor proteins that drive the process and the sensitivity of target genes to these concentrations. In this paper we show how these quantities for a given transcription factor can be inferred from gene expression levels of a set of known target genes. We treat the protein concentration as a latent function with a Gaussian Process prior, and include the sensitivities, mRNA decay rates and baseline expression levels as hyperparameters. We apply this procedure to a human leukemia dataset, focusing on the tumour repressor p53 and obtaining results in good accordance with recent biological studies.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-transcriptionalgp06.html
http://inverseprobability.com/publications/lawrence-transcriptionalgp06.htmlLawrence:transcriptionalGP06Variational Optimisation by Marginal MatchingMon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nipsw07.html
http://inverseprobability.com/publications/lawrence-nipsw07.htmlLawrence:nipsw07Learning for Larger Datasets with the <span>G</span>aussian Process Latent Variable ModelIn this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GP-LVM). We review three techniques and discuss how they may be implemented in the context of the GP-LVM. Each approach is then implemented on a well known benchmark data set and compared with earlier attempts to sparsify the model.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-larger07.html
http://inverseprobability.com/publications/lawrence-larger07.htmlLawrence:larger07Hierarchical <span>G</span>aussian Process Latent Variable ModelsThe Gaussian process latent variable model (GP-LVM) is a powerful approach for probabilistic modelling of high dimensional data through dimensional reduction. In this paper we extend the GP-LVM through hierarchies. A hierarchical model (such as a tree) allows us to express conditional independencies in the data as well as the manifold structure. We first introduce Gaussian process hierarchies through a simple dynamical model, we then extend the approach to a more complex hierarchy which is applied to the visualisation of human motion data sets.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-hgplvm07.html
http://inverseprobability.com/publications/lawrence-hgplvm07.htmlLawrence:hgplvm07<span>WiFi-SLAM</span> Using <span>G</span>aussian Process Latent Variable ModelsWiFi localization, the task of determining the physical location of a mobile device from wireless signal strengths, has been shown to be an accurate method of indoor and outdoor localization and a powerful building block for location-aware applications. However, most localization techniques require a training set of signal strength readings labeled against a ground truth location map, which is prohibitive to collect and maintain as maps grow large. In this paper we propose a novel technique for solving the WiFi SLAM problem using the Gaussian Process Latent Variable Model (GP-LVM) to determine the latent-space locations of unlabeled signal strength data. We show how GP-LVM, in combination with an appropriate motion dynamics model, can be used to reconstruct a topological connectivity graph from a signal strength sequence which, in combination with the learned Gaussian Process signal strength model, can be used to perform efficient localization.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/ferris-wifi07.html
http://inverseprobability.com/publications/ferris-wifi07.htmlFerris:wifi07<span>G</span>aussian Process Latent Variable Models for Fault DetectionThe Gaussian process latent variable model (GPLVM) is a novel unsupervised approach to nonlinear low dimensional embedding proposed by Lawrence (2005). This paper presents the development of a framework for the implementation of the GPLVM for fault detection. A series of experiments have been carried out comparing and combining the GPLVM to the conventional and widely used linear dimension reduction technique of principal component analysis (PCA). The inclusion of the GPLVM for the visualisation and data analysis, led to a considerable improvement in the classification resultsMon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/eciolaza-fault07.html
http://inverseprobability.com/publications/eciolaza-fault07.htmlEciolaza:fault07Probe-level Measurement Error Improves Accuracy in Detecting Differential Gene Expression**Motivation:** Finding differentially expressed genes is a
fundamental objective of a microarray experiment. Numerous methods
have been proposed to perform this task. Existing methods are based
on point estimates of gene expression level obtained from each
microarray experiment. This approach discards potentially useful
information about measurement error that can be obtained from an
appropriate probe-level analysis. Probabilistic probe-level models
can be used to measure gene expression and also provide a level of
uncertainty in this measurement. This probe-level variance provides
useful information which can help in the identification of
differentially expressed genes.
**Results:** We propose a Bayesian method to include probe-level
variances into the detection of differentially expressed genes from
replicated experiments. A variational approximation is used for
effcient parameter estimation. We compare this approximation with
MAP and MCMC parameter estimation in terms of computational
effciency and accuracy. The method is used to calculate the
probability of positive log-ratio (PPLR) of expression levels
between conditions. Using the measurements from a recently developed
Affymetrix probe-level model, multi-mgMOS, we test PPLR on a
spike-in data set and a mouse time-course data set. Results show
that the inclusion of probelevel measurement error improves accuracy
in detecting differential gene expression.
**Availability:** The methods described in this paper have been
implemented in an R package *pplr* that is currently available from
<http://umber.sbs.man.ac.uk/resources/puma>.
**Contact:** Magnus Rattray
Fri, 01 Sep 2006 00:00:00 +0000
http://inverseprobability.com/publications/liu-variances06.html
http://inverseprobability.com/publications/liu-variances06.htmlLiu-variances06Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant AnalysisIn this paper we consider a novel Bayesian interpretation of Fisher’s discriminiant analysis. We relate Rayleigh’s coefficient to a noise model that minimizes a cost based on the most probable class centres and that abandons the ‘regression to the labels’ assumption used by other algorithms. This yields a direction of discrimination equivalent to Fisher’s discriminant. We use Bayes’ rule to infer the posterior distribution for the direction of discrimination and in this process, priors and constraining distributions are incorporated to reach the desired result. Going further, with the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher’s discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through the optimisation of the marginal log-likelihood of the data. An added bonus of the new formulation is that it enables us to link the regularisation coefficient with the generalisation error.Wed, 01 Feb 2006 00:00:00 +0000
http://inverseprobability.com/publications/pena-fbd04.html
http://inverseprobability.com/publications/pena-fbd04.htmlPena-fbd04Identifying submodules of cellular regulatory networksRecent high throughput techniques in molecular biology have brought about the possibility of directly identifying the architecture of regulatory networks on a genome-wide scale. However, the computational task of estimating fine-grained models on a genome-wide scale is daunting. Therefore, it is of great importance to be able to reliably identify submodules of the network that can be effectively modelled as independent subunits. In this paper we present a procedure to obtain submodules of a cellular network by using information from gene-expression measurements. We integrate network architecture data with genome-wide gene expression measurements in order to determine which regulatory relations are actually confirmed by the expression data. We then use this information to obtain non-trivial submodules of the regulatory network using two distinct algorithms, a naive exhaustive algorithm and a spectral algorithm based on the eigendecomposition of an affinity matrix. We test our method on two yeast biological data sets, using regulatory information obtained from chromatin immunoprecipitation.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-trento06.html
http://inverseprobability.com/publications/sanguinetti-trento06.htmlSanguinetti:trento06Missing Data in Kernel <span>PCA</span>Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-missingkpca06.html
http://inverseprobability.com/publications/sanguinetti-missingkpca06.htmlSanguinetti:missingkpca06A Probabilistic Model to Integrate Chip and Microarray DataSun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-integrate06.html
http://inverseprobability.com/publications/sanguinetti-integrate06.htmlSanguinetti:integrate06Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities<span>**Motivation**</span>: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\
\
<span>**Results**</span>: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\
\
<span>**Availability**</span>: MATLAB code is available from <http://umber.sbs.man.ac.uk/resources/puma>.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-chipvar06.html
http://inverseprobability.com/publications/sanguinetti-chipvar06.htmlSanguinetti-chipvar06A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription<span>**Motivation:**</span> Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. This task, however, is difficult for a number of reasons: transcription factors’ expression levels are often low and noisy, and many transcription factors are post-transcriptionally regulated. It is therefore useful to infer the activity of the transcription factors from the expression levels of their target genes.\
\
<span>**Results:**</span> We introduce a novel probabilistic model to infer transcription factor activities from microarray data when the structure of the regulatory network is known. The model is based on regression, retaining the computational efficiency to allow genome-wide investigation, but is rendered more flexible by sampling regression coefficients independently for each gene. This allows us to determine the strength with which a transcription factor regulates each of its target genes, therefore providing a quantitative description of the transcriptional regulatory network. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates of the activities. We demonstrate our model on two yeast data sets. In both cases the network structure was obtained using Chromatine Immunoprecipitation data. We show how predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\
\
<span>**Availability:**</span> MATLAB code is available from <http://umber.sbs.man.ac.uk/resources/puma>.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-chipdyno06.html
http://inverseprobability.com/publications/sanguinetti-chipdyno06.htmlSanguinetti-chipdyno06Propagating Uncertainty in Microarray Data AnalysisMicroarray technology is associated with many sources of experimental uncertainty. In this review we discuss a number of approaches for dealing with this uncertainty in the processing of data from microarray experiments. We focus here on the analysis of high-density oligonucleotide arrays, such as the popular Affymetrix GeneChip® array, which contain multiple probes for each target. This set of probes can be used to determine an estimate for the target concentration and can also be used to determine the experimental uncertainty associated with this measurement. This measurement uncertainty can then be propagated through the downstream analysis using probabilistic methods. We give examples showing how these credibility intervals can be used to help identify differential expression, to combine information from replicated experiments and to improve the performance of principal component analysis.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/rattray-propagating06.html
http://inverseprobability.com/publications/rattray-propagating06.htmlRattray-propagating06Large Scale Learning with the <span>G</span>aussian Process Latent Variable ModelIn this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GP-LVM). We review three techniques and discuss how they may be implemented in the context of the GP-LVM. We briefly consider a GPR toy problem to highlight the strenghts and weaknesses of the different approaches before studying the perfomance of these techniques on a benchmark visualisation data set.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-largescale06.html
http://inverseprobability.com/publications/lawrence-largescale06.htmlLawrence:largescale06Gaussian Processes and the Null-Category Noise ModelWith Gaussian process classifiers (GPC) we aim to predict the posterior probability of the class labels given an input data point, $p(y_i|x_i)$. In general we find that this posterior distribution is unaffected by unlabeled data points during learning. Support vector machines are strongly related to GPCs, but one notable difference is that the decision boundary in an SVM can be influenced by unlabeled data. The source of this discrepancy is the SVM’s margin: a characteristic which is not shared with the GPC. The presence of the marchin allows the support vector machine to seek low data density regions for the decision boundary, effectively allowing it to incorporate the cluster assumption (see Chapter 6). In this chapter we present the *null category noise model*. A probabilistic equivalent of the margin. By combining this noise model with a GPC we are able to incorporated the cluster assumption without explicitly modeling the input data density distributions and without a special choice of kernel.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gpncnm05.html
http://inverseprobability.com/publications/lawrence-gpncnm05.htmlLawrence:gpncnm05The <span>G</span>aussian Process Latent Variable ModelThe Gaussian process latent variable model (GP-LVM) is a recently proposed probabilistic approach to obtaining a reduced dimension representation of a data set. In this tutorial we motivate and describe the GP-LVM, giving reviews of the model itself and some of the concepts behind it.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gplvmtut06.html
http://inverseprobability.com/publications/lawrence-gplvmtut06.htmlLawrence:gplvmtut06Local Distance Preservation in the <span>GP-LVM</span> through Back ConstraintsThe Gaussian process latent variable model (GP-LVM) is a generative approach to non-linear low dimensional embedding, that provides a smooth probabilistic mapping from latent to data space. It is also a non-linear generalization of probabilistic PCA (PPCA) @Tipping:probpca99. While most approaches to non-linear dimensionality methods focus on preserving local distances in data space, the GP-LVM focusses on exactly the opposite. Being a smooth mapping from latent to data space, it focusses on keeping things apart in latent space that are far apart in data space. In this paper we first provide an overview of dimensionality reduction techniques, placing the emphasis on the kind of distance relation preserved. We then show how the GP-LVM can be generalized, through back constraints, to additionally preserve local distances. We give illustrative experiments on common data sets.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-backconstraints06.html
http://inverseprobability.com/publications/lawrence-backconstraints06.htmlLawrence:backconstraints06Fast Variational Inference for <span>G</span>aussian <span>P</span>rocess Models through <span>KL</span>-CorrectionVariational inference is a exible approach to solving problems of intractability in Bayesian models. Unfortunately the convergence of variational methods is often slow. We review a recently suggested variational approach for approximate inference in Gaussian process (GP) models and show how convergence may be dramatically improved through the use of a positive correction term to the standard variational bound. We refer to the modied bound as a KL-corrected bound. The KL-corrected bound is a lower bound on the true likelihood, but an upper bound on the original variational bound. Timing comparisons between optimisation of the two bounds show that optimisation of the new bound consistently improves the speed of convergence.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/king-klcorrection06.html
http://inverseprobability.com/publications/king-klcorrection06.htmlKing:klcorrection06Probabilistic Non-linear Principal Component Analysis with <span>G</span>aussian Process Latent Variable ModelsSummarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.Tue, 01 Nov 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-pnpca05.html
http://inverseprobability.com/publications/lawrence-pnpca05.htmlLawrence-pnpca05A Hybrid <span>MaxEnt/HMM</span> Based <span>ASR</span> SystemThe aim of this work is to develop a practical framework, which extends the classical Hidden Markov Model (HMM) for continuous speech recognition based on the Maximum Entropy (MaxEnt) principle. The MaxEnt models can estimate the posterior probabilities directly as with Hybrid NN/HMM connectionist speech recogniton systems. In particular, a new acoustic modelling based on discriminative MaxEnt models is formulated and is being developed to replace the generative Gaussian Mixture Models (GMM) commonly used to model acoustic variability. Initial experimental results using the TIMIT phone task are reported.Sun, 04 Sep 2005 00:00:00 +0000
http://inverseprobability.com/publications/hifny-maxent05.html
http://inverseprobability.com/publications/hifny-maxent05.htmlHifny:maxent05A Tractable Probabilistic Model for <span>A</span>ffymetrix Probe-level Analysis across Multiple Chips<span>**Motivation:**</span> Affymetrix GeneChip arrays are currently the most widely used microarray technology. Many summarisation methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large data sets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and that it does not account for specific binding to the mismatch (MM) probes.\
\
<span>**Results:**</span> We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark data sets and a real time-course data set, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions.\
\
<span>**Availability:**</span> Both mgMOS and the new model multi-mgMOS have been implemented in an R package that is currently available from <http://umber.sbs.man.ac.uk/resources/puma>.Thu, 14 Jul 2005 00:00:00 +0000
http://inverseprobability.com/publications/liu-tractable04.html
http://inverseprobability.com/publications/liu-tractable04.htmlLiu-tractable04Variational inference for <span>S</span>tudent-$t$ models: Robust <span>B</span>ayesian interpolation and generalised component analysisWe demonstrate how a variational approximation scheme enables effective inference of key parameters in probabilisitic signal models which employ the Student-t distribution. Using the two scenarios of previous termrobustnext term interpolation and independent component analysis (ICA) as examples, we illustrate the key feature of the approach: that the form of the noise distribution in the interpolation case, and the source distributions in the ICA case, can be inferred from the data concurrent with all other model parameters.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/tipping-variational05.html
http://inverseprobability.com/publications/tipping-variational05.htmlTipping-variational05Automatic Determination of the Number of Clusters Using Spectral AlgorithmsSat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-automatic05.html
http://inverseprobability.com/publications/sanguinetti-automatic05.htmlSanguinetti:automatic05Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data<span>**Motivation:**</span> Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
<span>**Results:**</span> We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
<span>**Availability:**</span> The software used in the paper is available from <http://www.bioinf.manchester.ac.uk/resources/puma>. The microarray data are deposited in the NCBI database.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-accounting05.html
http://inverseprobability.com/publications/sanguinetti-accounting05.htmlSanguinetti-accounting05Semi-supervised Learning via <span>G</span>aussian ProcessesWe present a probabilistic approach to learning a Gaussian Process classifier in the presence of unlabeled data. Our approach involves a “null category noise model” (NCNM) inspired by ordered categorical noise models. The noise model reflects an assumption that the data density is lower between the class-conditional densities. We illustrate our approach on a toy problem and present comparative results for the semi-supervised classification of handwritten digits.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-semisuper04.html
http://inverseprobability.com/publications/lawrence-semisuper04.htmlLawrence:semisuper04Lawrence Mocap05Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-mocap05.html
http://inverseprobability.com/publications/lawrence-mocap05.htmlExtensions of the Informative Vector MachineThe informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-extensions05.html
http://inverseprobability.com/publications/lawrence-extensions05.htmlLawrence:extensions05Variational Inference in <span>G</span>aussian Processes via Probabilistic Point AssimilationWe introduce a novel variational approach for approximate inference in Gaussian process (GP) models. The key advantages of our approach are the ease with which different noise models can be incorporated and improved speed of convergence. We refer to the algorithm as probabilistic point assimilation (PPA). We introduce the algorithm firstly using the ‘weight space’ view and then through its Gaussian process formulation. We illustrate the approach on several benchmark data sets.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/king-ppa05.html
http://inverseprobability.com/publications/king-ppa05.htmlKing:ppa05Reducing the Variability in <span>cDNA</span> Microarray Image Processing by <span>B</span>ayesian Inference<span>**Motivation:**</span> Gene expression levels are obtained from microarray experiments through the extraction of pixel intensities from a scanned image of the slide. It is widely acknowledged that variabilities can occur in expression levels extracted from the same images by different users with the same software packages. These inconsistencies arise due to differences in the refinement of the placement of the microarray ‘grids’. We introduce a novel automated approach to the refinement of grid placements that is based upon the use of Bayesian inference for determining the size, shape and positioning of the microarray ‘spots’, capturing uncertainty that can be passed to downstream analysis.\
\
<span>**Results:**</span> Our experiments demonstrate that variability between users can be significantly reduced using the approach. The automated nature of the approach also saves hours of researchers’ time normally spent in refining the grid placement.\
\
<span>**Availability:**</span> A MATLAB implementation of the algorithm and an image of the slide used in our experiments, as well as the code necessary to recreate them are available for non-commercial use from <http://inverseprobability.com/vis>.Thu, 22 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-variability03.html
http://inverseprobability.com/publications/lawrence-variability03.htmlLawrence-variability03Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant AnalysisIn this paper we consider a Bayesian interpretation of Fisher’s discriminant. By relating Rayleigh’s coefficient to a likelihood function and through the choice of a suitable prior we use Bayes’ rule to infer a posterior distribution over projections. Through the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher’s discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through optimisation of the marginalised likelihood of the data.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/pena-fbd-tech04.html
http://inverseprobability.com/publications/pena-fbd-tech04.htmlPena:fbd-tech04Matching Kernels through <span>K</span>ullback-<span>L</span>eibler Divergence MinimisationIn this paper we study the general constrained minimisation of Kullback-Leibler (KL) divergences between two zero mean Gaussian distributions. We reduce the problem to an equivalent minimisation involving the eigenvectors of the two kernel matrices, and provide explicit solutions in some cases. We then focus, as an example, on the important case of constraining the approximating matrix to be block diagonal. We prove a stability result on the approximating matrix, and speculate on how these results may be used to give further theoretical foundation to widely used techniques such as spectral clustering.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-matching04.html
http://inverseprobability.com/publications/lawrence-matching04.htmlLawrence:matching04Learning to Learn with the Informative Vector MachineThis paper describes an efficient method for learning the parameters of a Gaussian process (GP). The parameters are learned from multiple tasks which are assumed to have been drawn independently from the same GP prior. An efficient algorithm is obtained by extending the informative vector machine (IVM) algorithm to handle the multi-task learning case. The multi-task IVM (MT-IVM) saves computation by greedily selecting the most informative examples from the separate tasks. The MT-IVM is also shown to be more efficient than sub-sampling on an artificial data-set and more effective than the traditional IVM in a speaker dependent phoneme recognition task.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-learning04.html
http://inverseprobability.com/publications/lawrence-learning04.htmlLawrence:learning04The Informative Vector Machine: A Practical Probabilistic Alternative to the Support Vector MachineWe present a practical probabilistic alternative to the popular support vector machine (SVM). The algorithm is an approximation to a Gaussian process, and is probabilistic in the sense that it maintains the process variance that is implied by the use of a kernel function, which the SVM discards. We show that these variances may be tracked and made use of selection of an active set which gives a sparse representation for the model. For an active set size of $d$ our algorithm exhibits $O(d^{2}N)$ computational complexity and $O(dN)$ storage requirements. It has already been shown that the approach is comptetive with the SVM in terms of performance and running time, here we give more details of the approach and demonstrate that kernel parameters may also be learned in a practical and effective manner.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ivmtech04.html
http://inverseprobability.com/publications/lawrence-ivmtech04.htmlLawrence:ivmTech04Probabilistic Non-linear Principal Component Analysis with <span>G</span>aussian Process Latent Variable ModelsSummarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gplvmtech04.html
http://inverseprobability.com/publications/lawrence-gplvmtech04.htmlLawrence:gplvmTech04<span>G</span>aussian Process Models for Visualisation of High Dimensional DataIn this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior’s covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance functions which allow non-linear mappings. This more general Gaussian process latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our non-linear algorithm can be *further* kernelised leading to ‘twin kernel PCA’ in which a *mapping* *between feature spaces* occurs.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gplvm03.html
http://inverseprobability.com/publications/lawrence-gplvm03.htmlLawrence:gplvm03Acoustic Space Dimensionality Selection and Combination using the Maximum Entropy PrincipleIn this paper we propose a discriminative approach to acoustic space dimensionality selection based on maximum entropy modelling. We form a set of constraints by composing the acoustic space with the space of phone classes, and use a continuous feature formulation of maximum entropy modelling to select an optimal feature set. The suggested approach has two steps: (1) the selection of the best acoustic space that efficiently and economically represents the acoustic data and its variability; (2) the combination of selected acoustic features in the maximum entropy framework to estimate the posterior probabilities over the phonetic labels given the acoustic input. Specific contributions of this paper include a parameter estimation algorithm (generalized improved iterative scaling) that enables the use of negative features, the parameterization of constraint functions using Gaussian mixture models, and experimental results using the TIMIT database.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/abdelhaleem-acoustic04.html
http://inverseprobability.com/publications/abdelhaleem-acoustic04.htmlAbdelHaleem:acoustic04A Probabilistic Model for the Extraction of Expression Levels from Oligonucleotide ArraysIn this work we present a probabilistic model to estimate summaries of Affymetrix GeneChip probe level data. Comparisons with two different models were made both on a publicly available dataset and on a study performed in our laboratory, showing that our model performs better for consistency of fold change.Mon, 01 Dec 2003 00:00:00 +0000
http://inverseprobability.com/publications/milo-probabilistic03.html
http://inverseprobability.com/publications/milo-probabilistic03.htmlMilo-probabilistic03Variational Inference for Visual TrackingThe likelihood models used in probabilistic visual tracking applications are often complex non-linear and/or non-Gaussian functions, leading to analytically intractable inference. Solutions then require numerical approximation techniques, of which the particle filter is a popular choice. Particle filters, however, degrade in performance as the dimensionality of the state space increases and the support of the likelihood decreases. As an alternative to particle filters this paper introduces a variational approximation to the tracking recursion. The variational inference is intractable in itself, and is combined with an efficient importance sampling procedure to obtain the required estimates. The algorithm is shown to compare favourably with particle filtering techniques on a synthetic example and two real tracking problems. The first involves the tracking of a designated object in a video sequence based on its colour properties, whereas the second involves contour extraction in a single image.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/vermaak-variational03.html
http://inverseprobability.com/publications/vermaak-variational03.htmlVermaak:variational03A Variational Approach to Robust <span>B</span>ayesian InterpolationThis paper details a robust Bayesian interpolation procedure for linear-in-the-parameter models. Robustness is achieved via a Student-$t$ noise model, defined hierarchically in terms of an inverse-Gamma prior distribution over individual Gaussian observation variances. Variational techniques are exploited to update this prior in light of the data, while also inferring all other model variables. The key to this approach is flexibility; it can infer Gaussian noise where appropriate but can adapt to accommodate heavier-tailed distributions in the presence of outliers.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/tipping-variational03.html
http://inverseprobability.com/publications/tipping-variational03.htmlTipping:variational03Fast Forward Selection to Speed Up Sparse <span>G</span>aussian Process RegressionWe present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the “support” patterns at random, yet it can outperform random selection on hard curve fitting tasks. More importantly, it leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically. We demonstrate the model selection capabilities of the algorithm in a range of experiments. In line with the development of our method, we present a simple view on sparse approximations for GP models and their underlying assumptions and show relations to other methods.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/seeger-fast03.html
http://inverseprobability.com/publications/seeger-fast03.htmlSeeger:fast03<span>B</span>ayesian Processing of Microarray ImagesGene expression measurements quantify the level of mRNA produced from each gene. Two principal methods exist for producing slides for extracting these levels: photolithography and spotted arrays. One difficulty with the spotted array format is determining the size and location of the spots on the array. In this paper we present a Bayesian approach to processing images produced by these arrays that seeks posterior distributions over the size and positions of the spots. This enables us to estimate expression ratios and their variances. Exact inference for the model we specify is intractable; we develop an approximate inference technique which combines importance sampling with variational inference. Our technique has already been shown to be more consistent than both manual processing and another automated technique @Lawrence:variability03. Here we present large-scale results for twenty-four microarray slides each representing 5760 genes and show the dramatic effects of incorporating variance in our downsteam analysis. Software based on this algorithm is available for academic use.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-microarray03.html
http://inverseprobability.com/publications/lawrence-microarray03.htmlLawrence:microarray03Fast Sparse <span>G</span>aussian Process Methods: The Informative Vector MachineWe present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretical principles, previously suggested for active learning. In contrast to most previous work on sparse GPs, our goal is not only to learn sparse predictors (which can be evaluated in $O(d)$ rather than $O(n)$, $d<<n$, $n$ the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most $O(nd^2)$, and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet it requires only a fraction of the training time. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ivm02.html
http://inverseprobability.com/publications/lawrence-ivm02.htmlLawrence:ivm02Generalised Component AnalysisPrincipal component analysis is a well known approach for determining the principal sub-space of a data-set. Independent component analysis is a widely utilised technique for recovering the linearly embedded independent components of a data-set. In this paper we develop an algorithm that, for super-Gaussian sources, extracts the direction and number of independent components of a data-set and determines the principal sub-space of the remaining components. This is achieved through the use of a latent variable model. We refer to the approach as Generalised Component Analysis and demonstrate its ability to both extract indpendent and principal components, as well as to determine the number of independent components, on toy and real word data-sets.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gca01.html
http://inverseprobability.com/publications/lawrence-gca01.htmlLawrence:GCA01Variational Inference GuideThis report is a brief introduction to variational inference for Bayesian models from the perspective of the Expectation Maximisation (EM) algorithm @Dempster:EM77. We start with an overview of the EM algorithm from the perspective of variational inference and then we show how approximate inference may also be performed. We discuss briefly when variational inference may be used and finally we mention the variational importance sampler as an alternative approach.Tue, 01 Jan 2002 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-variationalguide02.html
http://inverseprobability.com/publications/lawrence-variationalguide02.htmllawrence:variationalguide02Optimising Synchronisation Times for Mobile DevicesWith the increasing number of users of mobile computing devices (e.g. personal digital assistants) and the advent of third generation mobile phones, wireless communications are becoming increasingly important. Many applications rely on the device maintaining a *replica* of a data-structure which is stored on a server, for example news databases, calendars and e-mail. In this paper we explore the question of the optimal strategy for synchronising such replicas. We utilise probabilistic models to represent how the data-structures evolve and to model user behaviour. We then formulate objective functions which can be minimised with respect to the synchronisation timings. We demonstrate, using two real world data-sets, that a user can obtain more up-to-date information using our approach.Tue, 01 Jan 2002 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-sync01.html
http://inverseprobability.com/publications/lawrence-sync01.htmlLawrence:sync01Sparse <span>B</span>ayesian Learning: The Informative Vector MachineTue, 01 Jan 2002 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-sparse02.html
http://inverseprobability.com/publications/lawrence-sparse02.htmllawrence:sparse02A Comparison of State-of-the-Art Classification Techniques with Application to CytogeneticsSeveral state-of-the-art techniques: a neural network, Bayesian neural network, support vector machine and naive Bayesian classifier are experimentally evaluated in discriminating fluorescence in-situ hybridization (FISH) signals. Highly accurate classification of signals from real data and artefacts of two cytogenetic probes (colours) is required for detecting abnormalities in the data. More than 3,100 FISH signals are classified by the techniques into colour and as real or artefact with accuracies of around 98% and 88%, respectively. The results of the comparison also show a trade-off between simplicity represented by the naive Bayesian classifier and high classification performance represented by the other techniques.Sun, 01 Apr 2001 00:00:00 +0000
http://inverseprobability.com/publications/lerner-comparison01.html
http://inverseprobability.com/publications/lerner-comparison01.htmlLerner-comparison01Probabilistic Modelling of Replica DivergenceIt is common in distributed systems to replicate data. In many cases this data evolves in a consistent fashion and this evolution can be modelled. A *probabilistic model* of the evolution allows us to estimate the divergence of the replicas and can be used by the application to alter its behaviour, for example to control synchronisation times, to determine the propagation of writes, and to convey to the user information about how much the data may have evolved. In this paper, we describe how the evolution of the data may be modelled and outline how the probabilistic model may be utilised in various applications, concentrating on a news database example.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/rowstron-sync01.html
http://inverseprobability.com/publications/rowstron-sync01.htmlRowstron:sync01The Structure of Neural Network PosteriorsExact inference in Bayesian neural networks is non analytic to compute and as a result approximate approaches such as the evidence procedure, Monte-Carlo sampling and variational inference have been proposed. In this paper we explore the structure of the posterior distributions in such a model through a new approximating distribution based on *mixtures* of Gaussian distributions and show how it may be implemented.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-structure01.html
http://inverseprobability.com/publications/lawrence-structure01.htmlLawrence:structure01Node Relevance DeterminationHierarchical Bayesian inference in parameterised models offers an approach for controlling complexity. In this paper we utilise a novel prior for the leaning of a model’s structure. We call the prior *node relevance determination*. It is applicable in a range of models including sigmoid belief networks and Boltzmann machines. We demonstrate how the approach may be applied to determine structure in a multi-layer perceptron.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nrd01.html
http://inverseprobability.com/publications/lawrence-nrd01.htmlLawrence:nrd01Estimating a Kernel <span>F</span>isher Discriminant in the Presence of Label NoiseData noise is present in many machine learning problems domains, some of these are well studied but others have received less attention. In this paper we propose an algorithm for constructing a kernel Fisher discriminant (KFD) from training examples with *noisy labels*. The approach allows to associate with each example a probability of the label being flipped. We utilise an expectation maximization (EM) algorithm for updating the probabilities. The E-step uses class conditional probabilities estimated as a by-product of the KFD algorithm. The M-step updates the flip probabilities and determines the parameters of the discriminant. We have applied the approach to two real-world data-sets. The results show the feasibility of the approach.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-noisy01.html
http://inverseprobability.com/publications/lawrence-noisy01.htmlLawrence:noisy01Variational Learning for Multi-layer networks of Linear Threshold UnitsLinear threshold units were originally proposed as models of biological neurons. They were widely studied in the context of the perceptron @Rosenblatt:book62. Due to the difficulties of finding a general algorithm for networks with hidden nodes, they never passed into general use. We derive an algorithm in the context of graphical models and show how it may be applied in multi-layer networks of linear threshold units. We demonstrate the algorithm through three well known datasets.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ltu01.html
http://inverseprobability.com/publications/lawrence-ltu01.htmlLawrence:ltu01A Sparse <span>B</span>ayesian Compression Scheme — The Informative Vector MachineKernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-informative01.html
http://inverseprobability.com/publications/lawrence-informative01.htmlLawrence:informative01Variational Learning for Multi-layer networks of Linear Threshold UnitsLinear threshold units were originally proposed as models of biological neurons. They were widely studied in the context of the perceptron @Rosenblatt:book62. Due to the difficulties of finding a general algorithm for networks with hidden nodes, they never passed into general use. We derive an algorithm in the context of graphical models and show how it may be applied in multi-layer networks of linear threshold units. We demonstrate the algorithm through three well known datasets.Sat, 01 Jan 2000 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ltu_report00.html
http://inverseprobability.com/publications/lawrence-ltu_report00.htmlLawrence:ltu_report00Variational <span>B</span>ayesian Independent Component AnalysisBlind separation of signals through the info-max algorithm may be viewed as maximum likelihood learning in a latent variable model. In this paper we present an alternative approach to maximum likelihood learning in these models, namely Bayesian inference. It has already been shown how Bayesian inference can be applied to determine latent dimensionality in principal component analysis models @Bishop:bayesPCA98. In this paper we derive a similar approach for removing unecessary source dimensions in an independent component analysis model. We present results on a toy data-set and on some artificially mixed images.Sat, 01 Jan 2000 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ica99.html
http://inverseprobability.com/publications/lawrence-ica99.htmlLawrence:ICA99A Variational Bayesian Committee of Neural NetworksExact inference in Bayesian neural networks is non analytic to compute and
as a result approximate approaches such as the evidence procedure, Monte-Carlo sampling
and variational inference have been proposed. In this paper we present a general
overview of the Bayesian approach with a particular emphasis on the variational
procedure. We then present a new approximating distribution based on *mixtures*
of Gaussian distributions and show how it may be implemented. We present results
on a simple toy problem and on two real world data-sets.
Fri, 01 Jan 1999 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nnmixtures99.html
http://inverseprobability.com/publications/lawrence-nnmixtures99.htmlLawrence:nnmixtures99Mixture Representations for Inference and Learning in <span>B</span>oltzmann MachinesBoltzmann machines are undirected graphical models with two-state stochastic variables, in which the logarithms of the clique potentials are quadratic functions of the node states. They have been widely studied in the neural computing literature, although their practical applicability has been limited by the difficulty of finding an effective learning algorithm. One well-established approach, known as mean field theory, represents the stochastic distribution using a factorized approximation. However, the corresponding learning algorithm often fails to find a good solution. We conjecture that this is due to the implicit uni-modality of the mean field approximation which is therefore unable to capture multi-modality in the true distribution. In this paper we use variational methods to approximate the stochastic distribution using multi-modal *mixtures* of factorized distributions. We present results for both inference and learning to demonstrate the effectiveness of this approach.Thu, 01 Jan 1998 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-mixtures98.html
http://inverseprobability.com/publications/lawrence-mixtures98.htmlLawrence:mixtures98Markovian inference in belief networksBayesian belief networks can represent the complicated probabilistic processes that form natural sensory inputs. Once the parameters of the network have been learned,nonlinear inferences about the input can be made by computing the posterior distribution over the hidden units (e.g., depth in stereo vision) given the input. Computing the posterior distribution exactly is not practical in richly-connected networks, but it turns out that by using a variational (a.k.a., mean field) method, it is easy to find a product-form distribution that approximates the true posterior distribution. This approximation assumes that the hidden variables are independent given the current input. In this paper, we explore a more powerful variational technique that models the posterior distribution using a Markov chain. We compare this method with inference using mean fields and mixtures of mean fields in randomly generated networks.Thu, 01 Jan 1998 00:00:00 +0000
http://inverseprobability.com/publications/frey-markovian98.html
http://inverseprobability.com/publications/frey-markovian98.htmlFrey:Markovian98Approximating Posterior Distributions in Belief Networks using MixturesExact inference in densely connected Bayesian networks is
computationally intractable, and so there is considerable interest
in developing effective approximation schemes. One approach which
has been adopted is to bound the log likelihood using a mean-field
approximating distribution. While this leads to a tractable
algorithm, the mean field distribution is assumed to be factorial
and hence unimodal. In this paper we demonstrate the feasibility of
using a richer class of approximating distributions based on
*mixtures* of mean field distributions. We derive an efficient
algorithm for updating the mixture parameters and apply it to the
problem of learning in sigmoid belief networks. Our results
demonstrate a systematic improvement over simple mean field theory
as the number of mixture components is increased.
Thu, 01 Jan 1998 00:00:00 +0000
http://inverseprobability.com/publications/bishop-mixtures97.html
http://inverseprobability.com/publications/bishop-mixtures97.htmlBishop:mixtures97