Neil Lawrence's PublicationsPublications from Neil Lawrence.
http://inverseprobability.com/publications/
Sun, 27 Nov 2022 23:57:47 +0000Sun, 27 Nov 2022 23:57:47 +0000Jekyll v3.9.2Bayesian learning via neural Schrödinger–Föllmer flowsIn this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control. We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics. Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.Wed, 23 Nov 2022 00:00:00 +0000
http://inverseprobability.com/publications/bayesian-learning-via-schroedinger-follmer-flows.html
http://inverseprobability.com/publications/bayesian-learning-via-schroedinger-follmer-flows.htmlChallenges in Machine Learning Deployment: A Survey of Case StudiesIn recent years, machine learning has transitioned from a field of academic research interest to a field capable of solving real-world business problems. However, the deployment of machine learning models in production systems can present a number of issues and concerns. This survey reviews published reports of deploying machine learning solutions in a variety of use cases, industries and applications and extracts practical considerations corresponding to stages of the machine learning deployment workflow. By mapping found challenges to the steps of the machine learning deployment workflow we show that practitioners face issues at each stage of the deployment process. The goal of this paper is to lay out a research agenda to explore approaches addressing these challenges.Sat, 30 Apr 2022 00:00:00 +0000
http://inverseprobability.com/publications/challenges-in-deploying-machine-learning-a-survey-of-case-studies.html
http://inverseprobability.com/publications/challenges-in-deploying-machine-learning-a-survey-of-case-studies.htmlDifferentially Private Regression and Classification with Sparse Gaussian ProcessesA continuing challenge for machine learning is providing methods to perform computation on data while ensuring the data remains private. In this paper we build on the provable privacy guarantees of differential privacy which has been combined with Gaussian processes through the previously published cloaking method, an approach that tackles the problem of providing privacy for the outputs of a training set. In this paper we solve several shortcomings of this method, starting with the problem of predictions in regions with low data density. We experiment with the use of inducing points to provide a sparse approximation and show that these can provide robust differential privacy in outlier areas and at higher dimensions. We then look at classification, and modify the Laplace approximation approach to provide differentially private predictions. We then combine this with the sparse approximation and demonstrate the capability to perform classification in high dimensions. We finally explore the issue of hyperparameter selection and develop a method for their private selection. This paper and associated libraries provide a robust toolkit for combining differential privacy and Gaussian processes in a practical manner.
Fri, 08 Oct 2021 00:00:00 +0000
http://inverseprobability.com/publications/differentially-private-regression-and-classificaiton-with-sparse-gaussian-processes.html
http://inverseprobability.com/publications/differentially-private-regression-and-classificaiton-with-sparse-gaussian-processes.htmlSolving Schrödinger Bridges via Maximum LikelihoodThe Schrödinger bridge problem (SBP) finds the most likely stochastic evolution between two probability distributions given a prior stochastic evolution. As well as applications in the natural sciences, problems of this kind have important applications in machine learning such as dataset alignment and hypothesis testing. Whilst the theory behind this problem is relatively mature, scalable numerical recipes to estimate the Schrödinger bridge remain an active area of research. Our main contribution is the proof of equivalence between solving the SBP and an autoregressive maximum likelihood estimation objective. This formulation circumvents many of the challenges of density estimation and enables direct application of successful machine learning techniques. We propose a numerical procedure to estimate SBPs using Gaussian process and demonstrate the practical usage of our approach in numerical simulations and experiments.
Tue, 31 Aug 2021 00:00:00 +0000
http://inverseprobability.com/publications/solving-schroedinger-bridges-via-maximum-likelihood.html
http://inverseprobability.com/publications/solving-schroedinger-bridges-via-maximum-likelihood.htmlMulti-view Learning as a Nonparametric Nonlinear Inter-Battery Factor AnalysisFactor analysis aims to determine latent factors, or traits, which summarize a given data set.
Inter-battery factor analysis extends this notion to multiple views of the data. In this paper
we show how a nonlinear, nonparametric version of these models can be recovered through the
Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning
where the latent variables can be used both for exploratory purposes and for learning
representations that enable efficient inference for ambiguous estimation tasks. Learning is
performed in a Bayesian manner through the formulation of a variational compression scheme which
gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong
regularization during training, allowing the structure of the latent space to be determined
efficiently and automatically. We demonstrate this by producing the first (to our knowledge)
published results of learning from dozens of views, even when data is scarce. We further show
experimental results on several different types of multi-view data sets and for different kinds
of tasks, including exploratory data analysis, generation, ambiguity modelling through latent
priors and classification.
Thu, 03 Jun 2021 00:00:00 +0000
http://inverseprobability.com/publications/multi-view-learning-as-a-nonparametric-nonlinear-inter-battery-factor-analysis.html
http://inverseprobability.com/publications/multi-view-learning-as-a-nonparametric-nonlinear-inter-battery-factor-analysis.htmlDecision-making with UncertaintyIn emergency situations like the coronavirus pandemic, decisions
must be made quickly, with only partial information. But good
decisions are still possible using risk–benefit analysis.
Wed, 02 Dec 2020 00:00:00 +0000
http://inverseprobability.com/publications/decision-making-with-uncertainty.html
http://inverseprobability.com/publications/decision-making-with-uncertainty.htmlEmpirical Bayes Transductive Meta-Learning with Synthetic GradientsWe propose a meta-learning approach that learns from multiple tasks
in a transductive setting, by leveraging the unlabeled query set in
addition to the support set to generate a more powerful model for
each task. To develop our framework, we revisit the empirical Bayes
formulation for multi-task learning. The evidence lower bound of
the marginal log-likelihood of empirical Bayes decomposes as a sum
of local KL divergences between the variational posterior and the
true posterior on the query set of each task. We derive a novel
amortized variational inference that couples all the variational
posteriors via a meta-model, which consists of a synthetic gradient
network and an initialization network. Each variational posterior is
derived from synthetic gradient descent to approximate the true
posterior on the query set, although where we do not have access to
the true gradient. Our results on the Mini-ImageNet and CIFAR-FS
benchmarks for episodic few-shot classification outperform previous
state-of-the-art methods. Besides, we conduct two zero-shot learning
experiments to further explore the potential of the synthetic
gradient.
Sun, 26 Apr 2020 00:00:00 +0000
http://inverseprobability.com/publications/empirical-bayes-transductive-meta-learning-with-synthetic-gradients.html
http://inverseprobability.com/publications/empirical-bayes-transductive-meta-learning-with-synthetic-gradients.htmlHu-empirical20Bottom-up Data Trusts: Disturbing the 'One Size Fits All' Approach to Data GovernanceFrom the friends we make to the foods we like, via our shopping and
sleeping habits, most aspects of our quotidian lives can now be
turned into machine-readable data points. For those able to turn
these data points into models predicting what we will do next, this
data can be a source of wealth. For those keen to replace biased,
fickle human decisions, this data—sometimes misleadingly—offers the
promise of automated, increased accuracy. For those intent on
modifying our behaviour, this data can help build a puppeteer's
strings. As we move from one way of framing data governance
challenges to another, salient answers change accordingly. Just like
the wealth redistribution way of framing those challenges tends to
be met with a property-based, 'it's *our* data' answer, when one
frames the problem in terms of manipulation potential,
dignity-based, human rights answers rightly prevail (via fairness
and transparency-based answers to contestability concerns). Positive
data-sharing aspirations tend to be raised within altogether
different conversations from those aimed at addressing the above
concerns. Our data Trusts proposal challenges these boundaries.
Tue, 01 Oct 2019 00:00:00 +0000
http://inverseprobability.com/publications/bottom-up-data-trusts-disturbing-the-one-size-fits-all-approach-to-data-governance.html
http://inverseprobability.com/publications/bottom-up-data-trusts-disturbing-the-one-size-fits-all-approach-to-data-governance.htmlDelacroix-trusts19Variational Information Distillation for Knowledge TransferTransferring knowledge from a teacher neural network pretrained on
the same or a similar task to a student neural network can
significantly improve the performance of the student neural
network. Existing knowledge transfer approaches match the
activations or the corresponding hand-crafted features of the
teacher and the student networks. We propose an
information-theoretic framework for knowledge transfer which
formulates knowledge transfer as maximizing the mutual information
between the teacher and the student networks. We compare our method
with existing knowledge transfer methods on both knowledge
distillation and transfer learning tasks and show that our method
consistently outperforms existing methods. We further demonstrate
the strength of our method on knowledge transfer across
heterogeneous network architectures by transferring knowledge from a
convolutional neural network (CNN) to a multi-layer perceptron (MLP)
on CIFAR-10. The resulting MLP significantly outperforms
the-state-of-the-art methods and it achieves similar performance to
the CNN with a single convolutional layer.
Sat, 15 Jun 2019 00:00:00 +0000
http://inverseprobability.com/publications/variational-information-distillation-for-knowledge-transfer.html
http://inverseprobability.com/publications/variational-information-distillation-for-knowledge-transfer.htmlTransferring Knowledge across Learning ProcessesIn complex transfer learning scenarios new tasks might not be
tightly linked to previous tasks. Approaches that transfer
information contained only in the final parameters of a source model
will therefore struggle. Instead, transfer learning at at higher
level of abstraction is needed. We propose Leap, a framework that
achieves this by transferring knowledge across learning
processes. We associate each task with a manifold on which the
training process travels from initialization to final parameters and
construct a meta-learning objective that minimizes the expected
length of this path. Our framework leverages only information
obtained during training and can be computed on the fly at
negligible cost. We demonstrate that our framework outperforms
competing methods, both in meta-learning and transfer learning, on a
set of computer vision tasks. Finally, we demonstrate that Leap can
transfer knowledge across learning processes in demanding
reinforcement learning environments (Atari) that involve millions of
gradient steps.
Fri, 26 Apr 2019 00:00:00 +0000
http://inverseprobability.com/publications/transferring-knowledge-across-learning-processes.html
http://inverseprobability.com/publications/transferring-knowledge-across-learning-processes.htmlFlennerhag-transferring19Intrinsic Gaussian Processes on Complex Constrained DomainsWe propose a class of intrinsic Gaussian processes (GPs) for
interpolation, regression and classification on manifolds with a
primary focus on complex constrained domains or irregularly shaped
spaces arising as subsets or submanifolds of $\Re$, $\Re^2$, $\Re^3$
and beyond. For example, intrinsic GPs can accommodate spatial
domains arising as complex subsets of Euclidean space. Intrinsic GPs
respect the potentially complex boundary or interior conditions as
well as the intrinsic geometry of the spaces. The key novelty of the
approach proposed is to utilize the relationship between heat
kernels and the transition density of Brownian motion on manifolds
for constructing and approximating valid and computationally
feasible covariance kernels. This enables intrinsic GPs to be
practically applied in great generality, whereas existing approaches
for smoothing on constrained domains are limited to simple special
cases. The broad utilities of the intrinsic GP approach are
illustrated through simulation studies and data examples.
Fri, 19 Apr 2019 00:00:00 +0000
http://inverseprobability.com/publications/intrinsic-gaussian-processes-on-complex-constrained-domains.html
http://inverseprobability.com/publications/intrinsic-gaussian-processes-on-complex-constrained-domains.htmlGaussian Process Latent Force Models for Learning and Stochastic Control of Physical SystemsThis paper is concerned with learning and stochastic control in
physical systems that contain unknown input signals. These unknown
signals are modeled as Gaussian processes (GP) with certain
parameterized covariance structures. The resulting latent force
models can be seen as hybrid models that contain a first-principle
physical model part and a nonparametric GP model part. We briefly
review the statistical inference and learning methods for this kind
of models, introduce stochastic control methodology for these
models, and provide new theoretical observability and
controllability results for them.
Mon, 08 Oct 2018 00:00:00 +0000
http://inverseprobability.com/publications/gaussian-process-latent-force-models-for-learning-and-stochastic-control-of-physical-systems.html
http://inverseprobability.com/publications/gaussian-process-latent-force-models-for-learning-and-stochastic-control-of-physical-systems.htmlStructured Variationally Auto-encoded OptimizationWe tackle the problem of optimizing a black-box objective function
defined over a highly-structured input space. This problem is
ubiquitous in science and engineering. In machine learning,
inferring the structure of a neural network or the Automatic
Statistician (AS), where the optimal kernel combination for a
Gaussian process is selected, are two important examples. We use the
\as as a case study to describe our approach, that can be easily
generalized to other domains. We propose an Structure Generating
Variational Auto-encoder (SG-VAE) to embed the original space of
kernel combinations into some low-dimensional continuous manifold
where Bayesian optimization (BO) ideas are used. This is possible
when structural knowledge of the problem is available, which can be
given via a simulator or any other form of generating potentially
good solutions. The right exploration-exploitation balance is
imposed by propagating into the search the uncertainty of the latent
space of the SG-VAE, that is computed using variational
inference. The key aspect of our approach is that the SG-VAE can be
used to bias the search towards relevant regions, making it suitable
for transfer learning tasks. Several experiments in various
application domains are used to illustrate the utility and
generality of the approach described in this work.
Tue, 03 Jul 2018 00:00:00 +0000
http://inverseprobability.com/publications/lu18c.html
http://inverseprobability.com/publications/lu18c.htmlDifferentially Private Regression with Gaussian ProcessesA major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Here we combine the provable privacy guarantees of the differential privacy framework with the flexibility of Gaussian processes (GPs). We propose a method using GPs to provide differentially private (DP) regression. We then improve this method by crafting the DP noise covariance structure to efficiently protect the training data, while minimising the scale of the added noise. We find that this cloaking method achieves the greatest accuracy, while still providing privacy guarantees, and offers practical DP for regression over multi-dimensional inputs. Together these methods provide a starter toolkit for combining differential privacy and GPs.Sat, 31 Mar 2018 00:00:00 +0000
http://inverseprobability.com/publications/differentially-private-regression-with-gaussian-processes.html
http://inverseprobability.com/publications/differentially-private-regression-with-gaussian-processes.htmlThe Emergence of Organizing Structure in Conceptual RepresentationBoth scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form—where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach can learn intuitive organizations, including a tree for animals and a ring for the color circle, it assumes a strong inductive bias that considers only these particular forms, and each form is explicitly provided as initial knowledge. Here we introduce a new computational model of how organizing structure can be discovered, utilizing a broad hypothesis space with a preference for sparse connectivity. Given that the inductive bias is more general, the model's initial knowledge shows little qualitative resemblance to some of the discoveries it supports. As a consequence, the model can also learn complex structures for domains that lack intuitive description, as well as predict human property induction judgments without explicit structural forms. By allowing form to emerge from sparsity, our approach clarifies how both the richness and flexibility of human conceptual organization can coexist.Tue, 09 Jan 2018 00:00:00 +0000
http://inverseprobability.com/publications/the-emergence-of-organizing-structure-in-conceptual-representation.html
http://inverseprobability.com/publications/the-emergence-of-organizing-structure-in-conceptual-representation.htmlLake:emergence18Efficient Modeling of Latent Information in Supervised Learning using Gaussian ProcessesOften in machine learning, data are collected as a combination of
multiple conditions, e.g., the voice recordings of multiple persons,
each labeled with an ID. How could we build a model that captures
the latent information related to these conditions and generalize to
a new one with few data? We present a new model called Latent
Variable Multiple Output Gaussian Processes (LVMOGP) and that allows
to jointly model multiple conditions for regression and generalize
to a new condition with a few data points at test time. LVMOGP
infers the posteriors of Gaussian processes together with a latent
space representing the information about different conditions. We
derive an efficient variational inference method for LVMOGP, of
which the computational complexity is as low as sparse Gaussian
processes. We show that LVMOGP significantly outperforms related
Gaussian process methods on various tasks with both synthetic and
real data.
Tue, 05 Dec 2017 00:00:00 +0000
http://inverseprobability.com/publications/efficient-modelling-of-latent-information-in-supervised-learning-using-gaussian-processes.html
http://inverseprobability.com/publications/efficient-modelling-of-latent-information-in-supervised-learning-using-gaussian-processes.htmlDai:supervised17Efficient Inference for Sparse Latent Variable Models of Transcriptional RegulationMotivation
Regulation of gene expression in prokaryotes involves complex
co-regulatory mechanisms involving large numbers of transcriptional
regulatory proteins and their target genes. Uncovering these
genome-scale interactions constitutes a major bottleneck in systems
biology. Sparse latent factor models, assuming activity of
transcription factors (TFs) as unobserved, provide a biologically
interpretable modelling framework, integrating gene expression and
genome-wide binding data, but at the same time pose a hard
computational inference problem. Existing probabilistic inference
methods for such models rely on subjective filtering and suffer from
scalability issues, thus are not well-suited for realistic
genome-scale applications.
Results
We present a fast Bayesian sparse factor model, which takes input
gene expression and binding sites data, either from ChIP-seq
experiments or motif predictions, and outputs active TF-gene links
as well as latent TF activities. Our method employs an efficient
variational Bayes scheme for model inference enabling its
application to large datasets which was not feasible with existing
MCMC-based inference methods for such models. We validate our method
on synthetic data against a similar model in the literature,
employing MCMC for inference, and obtain comparable results with a
small fraction of the computational time. We also apply our method
to large-scale data from Mycobacterium tuberculosis involving
ChIP-seq data on 113 TFs and matched gene expression data for 3863
putative target genes. We evaluate our predictions using an
independent transcriptomics experiment involving over-expression of
TFs.
Availability and implementation
An easy-to-use Jupyter notebook demo of our method with data is
available at https://github.com/zhenwendai/SITAR.
Supplementary information
Supplementary data are available at Bioinformatics online.
Sat, 26 Aug 2017 00:00:00 +0000
http://inverseprobability.com/publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.html
http://inverseprobability.com/publications/efficient-inference-for-sparse-latent-variable-models-of-transcriptional-regulation.htmlDai:sparse18Preferential Bayesian OptimizationBayesian optimization (BO) has emerged during the last few years as an effective approach to optimize black-box functions where direct queries of the objective are expensive. We consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) and that allows to find the optimum of a latent function that can only be queried through pairwise comparisons, so-called duels. PBO extend the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the the winner of each duel by means of Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments in which we show how the way correlations are modeled is the key ingredient to drastically reduce the number of comparisons to find the optimum of the latent function of interest.Mon, 17 Jul 2017 00:00:00 +0000
http://inverseprobability.com/publications/preferential-bayesian-optimization.html
http://inverseprobability.com/publications/preferential-bayesian-optimization.htmlLiving Together: Mind and Machine IntelligenceIn this commentary we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to *embodiment factors*. We define embodiment factors as the ratio between an entity's ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is not sentient intelligence we should fear but non-sentient intelligence.
Wed, 24 May 2017 00:00:00 +0000
http://inverseprobability.com/publications/living-together-mind-and-machine-intelligence.html
http://inverseprobability.com/publications/living-together-mind-and-machine-intelligence.htmlpublicationsData Readiness LevelsApplication of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing values, inconvenient storage mechanisms, intellectual property, security and privacy. All these aspects obstruct the sharing and interconnection of data, and the eventual interpretation of data through machine learning or other approaches.
In project reporting, a major challenge is in encapsulating these problems and enabling goals to be built around the processing of data. Project overruns can occur due to failure to account for the amount of time required to curate and collate. But to understand these failures we need to have a common language for assessing the readiness of a particular data set. This position paper proposes the use of data readiness levels: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.
Fri, 05 May 2017 00:00:00 +0000
http://inverseprobability.com/publications/data-readiness-levels.html
http://inverseprobability.com/publications/data-readiness-levels.htmlLawrence-readiness16Manifold Alignment Determination: finding correspondences across different data viewsWe present Manifold Alignment Determination (MAD), an algorithm for learning alignments between data points from multiple views or modalities. The approach is capable of learning correspondences between views as well as correspondences between individual data-points. The proposed method requires only a few aligned examples from which it is capable to recover a global alignment through a probabilistic model. The strong, yet flexible regularization provided by the generative model is sufficient to align the views. We provide experiments on both synthetic and real data to highlight the benefit of the proposed approach.Thu, 12 Jan 2017 00:00:00 +0000
http://inverseprobability.com/publications/manifold-alignment-determination.html
http://inverseprobability.com/publications/manifold-alignment-determination.htmlDamianou:mad16Topslam: Waddington Landscape Recovery for Single Cell ExperimentsWe present an approach to estimating the nature of the Waddington (or epigenetic) landscape that underlies a population of individual cells. Through exploiting high resolution single cell transcription experiments we show that cells can be located on a landscape that reflects their differentiated nature. Our approach makes use of probabilistic non-linear dimensionality reduction that respects the topology of our estimated epigenetic landscape. In simulation studies and analyses of real data we show that the approach, known as , outperforms previous attempts to understand the differentiation landscape. Hereby, the novelty of our approach lies in the correction of distances *before* extracting ordering information. This gives the advantage over other attempts, which have to correct for extracted time lines by post processing or additional data.Mon, 20 Jun 2016 00:00:00 +0000
http://inverseprobability.com/publications/zwiessele-topslam16.html
http://inverseprobability.com/publications/zwiessele-topslam16.htmlZwiessele-topslam16Differentially Private Gaussian ProcessesA major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Differential privacy is a framework which allows algorithms to have provable privacy guarantees. Gaussian processes are a widely used approach for dealing with uncertainty in functions. This paper explores differentially private mechanisms for Gaussian processes. We compare binning and adding noise before regression with adding noise post-regression. For the former we develop a new kernel for use with binned data. For the latter we show that using inducing inputs allows us to reduce the scale of the added perturbation. We find that, for the datasets used, adding noise to a binned dataset has superior accuracy. Together these methods provide a starter toolkit for combining differential privacy and Gaussian processes.Thu, 02 Jun 2016 00:00:00 +0000
http://inverseprobability.com/publications/smith-dpgp16.html
http://inverseprobability.com/publications/smith-dpgp16.htmlSmith:dpgp16Chained Gaussian ProcessesGaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to an linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models “Chained Gaussian Processes”: the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions.Mon, 02 May 2016 00:00:00 +0000
http://inverseprobability.com/publications/saul-chained16.html
http://inverseprobability.com/publications/saul-chained16.htmlSaul-chained16Recurrent Gaussian ProcessesWe define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available.Mon, 02 May 2016 00:00:00 +0000
http://inverseprobability.com/publications/mattos-recurrent16.html
http://inverseprobability.com/publications/mattos-recurrent16.htmlMattos:recurrent16GLASSES: Relieving The Myopia Of Bayesian OptimisationWe present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search. The majority of global optimisation approaches in use are myopic, in only considering the impact of the next function value; the non-myopic approaches that do exist are able to consider only a handful of future evaluations. Our novel algorithm, GLASSES, permits the consideration of dozens of evaluations into the future. This is done by approximating the ideal look-ahead loss function, which is expensive to evaluate, by a cheaper alternative in which the future steps of the algorithm are simulated beforehand. An Expectation Propagation algorithm is used to compute the expected value of the loss. We show that the far-horizon planning thus enabled leads to substantive performance gains in empirical tests.Mon, 02 May 2016 00:00:00 +0000
http://inverseprobability.com/publications/gonzalez-glasses16.html
http://inverseprobability.com/publications/gonzalez-glasses16.htmlGonzalez:glasses16Variationally Auto-Encoded Deep Gaussian ProcessesWe develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.Mon, 02 May 2016 00:00:00 +0000
http://inverseprobability.com/publications/dai-variationally16.html
http://inverseprobability.com/publications/dai-variationally16.htmlDai-variationally16Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor AnalysisFactor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.Sun, 17 Apr 2016 00:00:00 +0000
http://inverseprobability.com/publications/damianou-ibfa16.html
http://inverseprobability.com/publications/damianou-ibfa16.htmlDamianou:ibfa16Detecting Periodicities with Gaussian processesWe consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matérn family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in the arabidopsis genome.Wed, 13 Apr 2016 00:00:00 +0000
http://inverseprobability.com/publications/durrande-periodicities16.html
http://inverseprobability.com/publications/durrande-periodicities16.htmlDurrande-periodicities16Batch Bayesian Optimization via Local PenalizationThe popularity of Bayesian optimization methods for efficient exploration
of parameter spaces has lead to a series of papers applying Gaussian processes as
surrogates in the optimization of functions. However, most proposed approaches only
allow the exploration of the parameter space to occur sequentially. Often, it is
desirable to simultaneously propose batches of parameter values to explore. This
is particularly the case when large parallel processing facilities are available.
These could either be computational or physical facets of the process being optimized.
Batch methods, however, require the modeling of the interaction between the different
evaluations in the batch, which can be expensive in complex scenarios. We investigate
this issue and propose a highly effective heuristic based on an estimate of the
function's Lipschitz constant that captures the most important aspect of this
interaction---local repulsion---at negligible computational overhead. A penalized
acquisition function is used to collect batches of points minimizing the non-parallelizable
computational effort. The resulting algorithm compares very well, in run-time, with
much more elaborate alternatives.
Fri, 01 Jan 2016 00:00:00 +0000
http://inverseprobability.com/publications/gonzalez-batch16.html
http://inverseprobability.com/publications/gonzalez-batch16.htmlGonzalez:batch16Variational Inference for Latent Variables and Uncertain Inputs in Gaussian ProcessesThe Gaussian process latent variable model (GP-LVM) provides a flexible
approach for non-linear dimensionality reduction that has been widely applied. However,
the current approach for training GP-LVMs is based on maximum likelihood, where
the latent projection variables are maximised over rather than integrated out. In
this paper we present a Bayesian method for training GP-LVMs by introducing a non-standard
variational inference framework that allows to approximately integrate out the latent
variables and subsequently train a GP-LVM by maximising an analytic lower bound
on the exact marginal likelihood. We apply this method for learning a GP-LVM from
i.i.d. observations and for learning non-linear dynamical systems where the observations
are temporally correlated. We show that a benefit of the variational Bayesian procedure
is its robustness to over-fitting and its ability to automatically select the dimensionality
of the non-linear latent space. The resulting framework is generic, flexible and
easy to extend for other purposes, such as Gaussian process regression with uncertain
or partially missing inputs. We demonstrate our method on synthetic data and standard
machine learning benchmarks, as well as challenging real world datasets, including
high resolution video data.
Fri, 01 Jan 2016 00:00:00 +0000
http://inverseprobability.com/publications/damianou-variational15.html
http://inverseprobability.com/publications/damianou-variational15.htmlDamianou-variational15Genome-wide Modeling of Transcription Kinetics Reveals Patterns of RNA Production DelaysGenes with similar transcriptional activation kinetics can display
very different temporal mRNA profiles because of differences in
transcription time, degradation rate, and RNA-processing
kinetics. Recent studies have shown that a splicing-associated RNA
production delay can be significant. To investigate this issue more
generally, it is useful to develop methods applicable to genome-wide
datasets. We introduce a joint model of transcriptional activation
and mRNA accumulation that can be used for inference of
transcription rate, RNA production delay, and degradation rate given
data from high-throughput sequencing time course experiments. We
combine a mechanistic differential equation model with a
nonparametric statistical modeling approach allowing us to capture a
broad range of activation kinetics, and we use Bayesian parameter
estimation to quantify the uncertainty in estimates of the kinetic
parameters. We apply the model to data from estrogen receptor $\alpha$
activation in the MCF-7 breast cancer cell line. We use RNA
polymerase II ChIP-Seq time course data to characterize
transcriptional activation and mRNA-Seq time course data to quantify
mature transcripts. We find that 11% of genes with a good signal in
the data display a delay of more than 20 min between completing
transcription and mature mRNA production. The genes displaying these
long delays are significantly more likely to be short. We also find
a statistical association between high delay and late intron
retention in pre-mRNA data, indicating significant
splicing-associated production delays in many genes.
Mon, 05 Oct 2015 00:00:00 +0000
http://inverseprobability.com/publications/honkela-genome15.html
http://inverseprobability.com/publications/honkela-genome15.htmlHonkela-genome15A Reverse-Engineering Approach to Dissect Post-translational Modulators of transcription Factor's Activity from Transcriptional DataBackground Transcription factors (TFs) act downstream of the major signalling pathways functioning as master regulators of cell fate. Their activity is tightly regulated at the transcriptional, post-transcriptional and post-translational level. Proteins modifying TF activity are not easily identified by experimental high-throughput methods. Results We developed a computational strategy, called Differential Multi-Information (DMI), to infer post-translational modulators of a transcription factor from a compendium of gene expression profiles (GEPs). DMI is built on the hypothesis that the modulator of a TF (i.e. kinase/phosphatases), when expressed in the cell, will cause the TF target genes to be co-expressed. On the contrary, when the modulator is not expressed, the TF will be inactive resulting in a loss of co-regulation across its target genes. DMI detects the occurrence of changes in target gene co-regulation for each candidate modulator, using a measure called Multi-Information. We validated the DMI approach on a compendium of 5,372 GEPs showing its predictive ability in correctly identifying kinases regulating the activity of 14 different transcription factors. Conclusions DMI can be used in combination with experimental approaches as high-throughput screening to efficiently improve both pathway and target discovery. An on-line web-tool enabling the user to use DMI to identify post-transcriptional modulators of a transcription factor of interest can be found at http://dmi.tigem.it.Thu, 03 Sep 2015 00:00:00 +0000
http://inverseprobability.com/publications/gambardella-reverse15.html
http://inverseprobability.com/publications/gambardella-reverse15.htmlGambardella-reverse15Semi-described and Semi-supervised Learning with Gaussian ProcessesPropagating input uncertainty through non-linear Gaussian process (GP) mappings is intractable. This hinders the task of training GPs using uncertain and partially observed inputs. In this paper we refer to this task as 'semi-described learning'. We then introduce a GP framework that solves both, the semi-described and the semi-supervised learning problems (where missing values occur in the outputs). Auto-regressive state space simulation is also recognised as a special case of semi-described learning. To achieve our goal we develop variational methods for handling semi-described inputs in GPs, and couple them with algorithms that allow for imputing the missing values while treating the uncertainty in a principled, Bayesian manner. Extensive experiments on simulated and real-world data study the problems of iterative forecasting and regression/classification with missing values. The results suggest that the principled propagation of uncertainty stemming from our framework can significantly improve performance in these tasks.Sun, 12 Jul 2015 00:00:00 +0000
http://inverseprobability.com/publications/damianou-semi15.html
http://inverseprobability.com/publications/damianou-semi15.htmlDamianou-semi15Malaria surveillance with multiple data sources using Gaussian process modelsA statistical framework for monitoring the health of a population should ideally be able to combine data from a wide variety of sources, such as remote sensing, telecoms, and official health records, in a principled manner. Gaussian process regression is commonly used to visualise disease incidence by interpolating values across a map; in this article, we show how it can be extended to deal with many different types of information by introducing a flexible covariance structure across data sources. Combining many data sources in a single model provides a number of practical advantages, such as the ability to automatically determine the importance of each data source through likelihood optimisation, and to deal with missing values. We show the basic idea with an application of malaria density modeling across Uganda using administrative records and remote sensing vegetation index data, and then go on to describe further extensions such as the incorporation of human mobility data extracted from mobile phone call detail records (CDRs).Tue, 09 Dec 2014 00:00:00 +0000
http://inverseprobability.com/publications/mubangizi-malaria14.html
http://inverseprobability.com/publications/mubangizi-malaria14.htmlMubangizi-malaria14Gaussian Process Models with Parallelization and GPU accelerationIn this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the sparse Gaussian process formulation. Additionally, the computational bottleneck is implemented with GPU acceleration for further speed up. Combining both techniques allows applying Gaussian process models to millions of datapoints. The efficiency of our algorithm is demonstrated with a synthetic dataset. Its source code has been integrated into our popular software library GPy.
Sat, 18 Oct 2014 00:00:00 +0000
http://inverseprobability.com/publications/dai-gpu14.html
http://inverseprobability.com/publications/dai-gpu14.htmlDai-gpu14Consistent Mapping of Government Malaria Records Across a Changing Territory DelimitationBackground
Health Management Information Systems (HMIS) are a crucial tool for supporting planning and decision-making. The benefits of such systems will depend on the quality of the data they provide and on the response capacity of the decision-makers [1]. The analysis of malaria incidence records of the HMIS, in Uganda, faces two main complications. First, artificial trends induced by a non-negligible and variable rate of non-reporting hospitals. Second, lack of comparability across time, due to changes in the districts boundaries.
Materials and methods
We propose a method for estimating the incidence of malaria for the different district definitions across time. Although we have information for the whole country across many years, this task requires making estimates for periods where data is not available for a specific district delimitation. We provide disease maps based on HMIS information by exploiting the relationship with its environmental drivers. Our approach relies on the Gaussian process framework. In particular, we use multiple output kernel techniques [2] to achieve consistency between the totals and subtotals of incidence records at different levels of territory aggregations. In the case of map generation, this approach allows us to combine information from different sources at different spatial resolution. We use the HMIS malaria records from 2003 to 2013. The records consist of weekly information aggregated within districts. The information available also includes the number of hospitals reporting each week. We use the normalized difference vegetation index and land surface temperature measurements, both commonly used for identifying suitable habitats for mosquito breeding [3].
Results
For recently created districts, our method allows comparability between the current malaria incidence and periods before they started reporting to the HMIS. The probabilistic model defined allows HMIS users to generate samples from a incidence distribution to develop further analysis. We also generate disease maps by combining administrative records with remote sensing data.
Mon, 22 Sep 2014 00:00:00 +0000
http://inverseprobability.com/publications/consistent-mapping-of-government-malaria-records-across-a-changing-territory-delimination.html
http://inverseprobability.com/publications/consistent-mapping-of-government-malaria-records-across-a-changing-territory-delimination.htmlAndrade-consistent14Warped Linear Mixed Models for the Genetic Analysis of Transformed PhenotypesLinear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction.Fri, 19 Sep 2014 00:00:00 +0000
http://inverseprobability.com/publications/fusi-warped14.html
http://inverseprobability.com/publications/fusi-warped14.htmlFusi-warped14Variational Inference for Uncertainty on the Inputs of Gaussian Process ModelsThe Gaussian process latent variable model (GP-LVM) provides a flexible approach for non-linear
dimensionality reduction that has been widely applied. However, the current approach for training
GP-LVMs is based on maximum likelihood, where the latent projection variables are maximized over
rather than integrated out. In this paper we present a Bayesian method for training GP-LVMs by
introducing a non-standard variational inference framework that allows to approximately integrate
out the latent variables and subsequently train a GP-LVM by maximizing an analytic lower bound on the
exact marginal likelihood. We apply this method for learning a GP-LVM from iid observations and for
learning non-linear dynamical systems where the observations are temporally correlated. We show that
a benefit of the variational Bayesian procedure is its robustness to overfitting and its ability to
automatically select the dimensionality of the nonlinear latent space. The resulting framework is
generic, flexible and easy to extend for other purposes, such as Gaussian process regression with
uncertain inputs and semi-supervised Gaussian processes. We demonstrate our method on synthetic data
and standard machine learning benchmarks, as well as challenging real world datasets, including high
resolution video data.
Sun, 14 Sep 2014 00:00:00 +0000
http://inverseprobability.com/publications/variational-inference-for-uncertainty-on-the-inputs-of-gaussian-process-models.html
http://inverseprobability.com/publications/variational-inference-for-uncertainty-on-the-inputs-of-gaussian-process-models.htmlDamianou-variational14Metrics for Probabilistic GeometriesWe investigate the geometrical structure of probabilistic generative dimensionality reduction models using the tools of Riemannian geometry. We explicitly define a distribution over the natural metric given by the models. We provide the necessary algorithms to compute expected metric tensors where the distribution over mappings is given by a Gaussian process. We treat the corresponding latent variable model as a Riemannian manifold and we use the expectation of the metric under the Gaussian process prior to define interpolating paths and measure distance between latent points. We show how distances that respect the expected metric lead to more appropriate generation of new data.Wed, 23 Jul 2014 00:00:00 +0000
http://inverseprobability.com/publications/tosi-metrics14.html
http://inverseprobability.com/publications/tosi-metrics14.htmlTosi:metrics14Inference of RNA Polymerase II Transcription Dynamics from Chromatin Immunoprecipitation Time Course DataGene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression.
The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene
expression. In this work, we present a probabilistic model of transcription dynamics which is fitted
to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate
transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model
parameters are estimated using either maximum likelihood estimation or via Bayesian inference
using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for
parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected
range of transcription speeds, based on previous experiments. The model describes the movement of
pol-II down the gene body and can be used to identify the time of induction for transcriptionally
engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine
which genes respond quickly to stimuli and group genes that share activity profiles and may therefore
be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure
pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2).
The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes
with the advantage that our approach can be applied genome-wide. We validate the biological
significance of the pol-II promoter activity clusters by investigating cluster-specific transcription
factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced
genes are enriched for both estrogen receptor alpha (ER) and FOXA1 binding in their proximal promoter
regions.
Wed, 14 May 2014 00:00:00 +0000
http://inverseprobability.com/publications/inference-of-rna-polymerase-ii-transcription-dynamics-from-chromatin-immunoprecipitation-time-course-data.html
http://inverseprobability.com/publications/inference-of-rna-polymerase-ii-transcription-dynamics-from-chromatin-immunoprecipitation-time-course-data.htmlMaina-inference14Fast Nonparametric Clustering of Structured Time-SeriesIn this publication, we combine two Bayesian nonparametric models: the Gaussian Process (GP) and the Dirichlet Process (DP). Our innovation in the GP model is to introduce a variation on the GP prior which enables us to model structured time-series data, i.e. data containing groups where we wish to model inter- and intra-group variability. Our innovation in the DP model is an implementation of a new fast collapsed variational inference procedure which enables us to optimize our variational approximation significantly faster than standard VB approaches. In a biological time series application we show how our model better captures salient features of the data, leading to better consistency with existing biological classifications, while the associated inference algorithm provides a significant speed-up over EM-based variational inference.Fri, 18 Apr 2014 00:00:00 +0000
http://inverseprobability.com/publications/hensman-fast14.html
http://inverseprobability.com/publications/hensman-fast14.htmlHensman-fast14Tilted Variational BayesWe present a novel method for approximate inference. Using some of the
constructs from expectation propagation (EP), we derive a lower bound of the marginal
likelihood in a similar fashion to variational Bayes (VB). The method combines some
of the benefits of VB and EP: it can be used with light-tailed likelihoods (where
traditional VB fails), and it provides a lower bound on the marginal likelihood.
We apply the method to Gaussian process classification, a situation where the Kullback-Leibler
divergence minimized in traditional VB can be infinite, and to robust Gaussian process
regression, where the inference process is dramatically simplified in comparison
to EP.
Code to reproduce all the experiments can be found at <http://github.com/SheffieldML/TVB>.
Wed, 02 Apr 2014 00:00:00 +0000
http://inverseprobability.com/publications/tilted-variational-bayes.html
http://inverseprobability.com/publications/tilted-variational-bayes.htmlHensman-tvb14Hybrid Discriminative-Generative Approaches with Gaussian ProcessesMachine learning practitioners are often faced with a choice between a discriminative and a generative approach to modelling. Here, we present a model based on a hybrid approach that breaks down some of the barriers between the discriminative and generative points of view, allowing continuous dimensionality reduction of hybrid discrete-continous data, discriminative classification with missing inputs and manifold learning informed by class labels.Wed, 02 Apr 2014 00:00:00 +0000
http://inverseprobability.com/publications/andrade-hybrid14.html
http://inverseprobability.com/publications/andrade-hybrid14.htmlAndrade-hybrid14Nested Variational Compression in Deep Gaussian ProcessesDeep Gaussian processes provide a flexible approach to probabilistic modelling of data using either supervised or unsupervised learning. For tractable
inference approximations to the marginal likelihood of the model must be made. The original approach to approximate inference in these models used
variational compression to allow for approximate variational marginalization of the hidden variables leading to a lower bound on the marginal likelihood
of the model [Damianou and Lawrence, 2013]. In this paper we extend this idea with a nested variational compression. The resulting lower bound on the
likelihood can be easily parallelized or adapted for stochastic variational inference.
Wed, 01 Jan 2014 00:00:00 +0000
http://inverseprobability.com/publications/hensman-nested14.html
http://inverseprobability.com/publications/hensman-nested14.htmlHensman-nested14Hierarchical Bayesian Modelling of Gene Expression Time Series Across Irregularly Sampled Replicates and Clusters**Background**
Time course data from microarrays and high-throughput
sequencing experiments require simple, computationally efficient and powerful statistical
models to extract meaningful biological signal, and for tasks such as data fusion
and clustering. Existing methodologies fail to capture either the temporal or replicated
nature of the experiments, and often impose constraints on the data collection process,
such as regularly spaced samples, or similar sampling schema across replications.
**Results**
We propose hierarchical Gaussian processes as a general model of gene expression time-series,
with application to a variety of problems. In particular, we illustrate the method\u2019s
capacity for missing data imputation, data fusion and clustering.The method can
impute data which is missing both systematically and at random: in a hold-out test
on real data, performance is significantly better than commonly used imputation
methods. The method\u2019s ability to model inter- and intra-cluster variance leads
to more biologically meaningful clusters. The approach removes the necessity for
evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset
with irregular replications.
**Conclusion**
The hierarchical Gaussian
process model provides an excellent statistical basis for several gene-expression
time-series tasks. It has only a few additional parameters over a regular GP, has
negligible additional complexity, is easily implemented and can be integrated into
several existing algorithms. Our experiments were implemented in python, and are
available from the authors' website: <http://staffwww.dcs.shef.ac.uk/people/J.Hensman/>.
Tue, 20 Aug 2013 00:00:00 +0000
http://inverseprobability.com/publications/hierarchical-bayesian-modelling-of-gene-expression-time-series-across-irregularly-sampled-replicates-and-clusters.html
http://inverseprobability.com/publications/hierarchical-bayesian-modelling-of-gene-expression-time-series-across-irregularly-sampled-replicates-and-clusters.htmlHensman-hierarchical13Gaussian Processes for Big DataWe introduce stochastic variational inference for Gaussian process models. This enables the application of Gaussian process (GP) models to data sets containing millions of data points. We show how GPs can be variationally decomposed to depend on a set of globally relevant inducing variables which factorize the model in the necessary manner to perform variational inference. Our approach is readily extended to models with non-Gaussian likelihoods and latent variable models based around Gaussian processes. We demonstrate the approach on a simple toy problem and two real world data sets.
Thu, 11 Jul 2013 00:00:00 +0000
http://inverseprobability.com/publications/gaussian-processes-for-big-data.html
http://inverseprobability.com/publications/gaussian-processes-for-big-data.htmlHensman-bigdata13The Bigraphical LassoThe i.i.d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data points. We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. A prominent product in spectral graph theory, this structure has appealing properties for regression, enhanced sparsity and interpretability. To deal with the parameter explosion we introduce L1 penalties and fit the model through a flip-flop algorithm that results in a linear number of lasso regressions.Sun, 26 May 2013 00:00:00 +0000
http://inverseprobability.com/publications/the-bigraphical-lasso.html
http://inverseprobability.com/publications/the-bigraphical-lasso.htmlKalaitzis:bigraphical13Linear Latent Force Models Using Gaussian ProcessesPurely data driven approaches for machine learning present difficulties
when data is scarce relative to the complexity of the model or when the model is
forced to extrapolate. On the other hand, purely mechanistic approaches need to
identify and specify all the interactions in the problem at hand (which may not
be feasible) and still leave the issue of how to parameterize the system. In this
paper, we present a hybrid approach using Gaussian processes and differential equations
to combine data driven modelling with a physical model of the system. We show how
different, physically-inspired, kernel functions can be developed through sensible,
simple, mechanistic assumptions about the underlying system. The versatility of
our approach is illustrated with three case studies from motion capture, computational
biology and geostatistics.
Mon, 13 May 2013 00:00:00 +0000
http://inverseprobability.com/publications/linear-latent-force-models-using-gaussian-processes.html
http://inverseprobability.com/publications/linear-latent-force-models-using-gaussian-processes.htmlAlvarez-llfm13Deep Gaussian ProcessesIn this paper we introduce deep Gaussian process (GP) models. Deep GPs are
a deep belief network based on Gaussian process mappings. The data is modeled as
the output of a multivariate GP. The inputs to that Gaussian process are then governed
by another GP. A single layer model is equivalent to a standard GP or the GP latent
variable model (GP-LVM). We perform inference in the model by approximate variational
marginalization. This results in a strict lower bound on the marginal likelihood
of the model which we use for model selection (number of layers and nodes per layer).
Deep belief networks are typically applied to relatively large data sets using stochastic
gradient descent for optimization. Our fully Bayesian treatment allows for the application
of deep models even when data is scarce. Model selection by our variational bound
shows that a five layer hierarchy is justified even when modelling a digit data
set containing only 150 examples.
Mon, 29 Apr 2013 00:00:00 +0000
http://inverseprobability.com/publications/deep-gaussian-processes.html
http://inverseprobability.com/publications/deep-gaussian-processes.htmlDamianou:deepgp13Unravelling the enigma of selective vulnerability in neurodegeneration: motor neurons resistant to degeneration in ALS show distinct gene expression characteristics and decreased susceptibility to excitotoxicityA consistent clinical feature of amyotrophic lateral sclerosis (ALS) is the sparing of eye movements and the function of external sphincters, with corresponding preservation of motor neurons in the brainstem oculomotor nuclei, and of Onuf\u2019s nucleus in the sacral spinal cord. Studying the differences in properties of neurons that are vulnerable and resistant to the disease process in ALS may provide insights into the mechanisms of neuronal degeneration, and identify targets for therapeutic manipulation. We used microarray analysis to determine the differences in gene expression between oculomotor and spinal motor neurons, isolated by laser capture microdissection from the midbrain and spinal cord of neurologically normal human controls. We compared these to transcriptional profiles of oculomotor nuclei and spinal cord from rat and mouse, obtained from the GEO omnibus database. We show that oculomotor neurons have a distinct transcriptional profile, with significant differential expression of 1,757 named genes ($q < 0.001$). Differentially expressed genes are enriched for the functional categories of synaptic transmission, ubiquitin-dependent proteolysis, mitochondrial function, transcriptional regulation, immune system functions, and the extracellular matrix. Marked differences are seen, across the three species, in genes with a function in synaptic transmission, including several glutamate and GABA receptor subunits. Using patch clamp recording in acute spinal and brainstem slices, we show that resistant oculomotor neurons show a reduced AMPA-mediated inward calcium current, and a higher GABA-mediated chloride current, than vulnerable spinal motor neurons. The findings suggest that reduced susceptibility to excitotoxicity, mediated in part through enhanced GABAergic transmission, is an important determinant of the relative resistance of oculomotor neurons to degeneration in ALS.
Thu, 04 Apr 2013 00:00:00 +0000
http://inverseprobability.com/publications/brockington-unravelling13.html
http://inverseprobability.com/publications/brockington-unravelling13.htmlBrockington-unravelling13Detecting Regulatory Gene-Environment Interactions with Unmeasured Environmental Factors<b>Motivation</b>: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits.<br><br>
<b>Results</b>: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it’s not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability.<br><br>
<b>Availability</b>: Software available at http://pmbio.github.io/envGPLVM/.<br><br>
<b>Contact</b>: [oliver.stegle@ebi.ac.uk](oliver.stegle@ebi.ac.uk), [nicolo.fusi@sheffield.ac.uk](nicolo.fusi@sheffield.ac.uk)Wed, 03 Apr 2013 00:00:00 +0000
http://inverseprobability.com/publications/fusi-detecting13.html
http://inverseprobability.com/publications/fusi-detecting13.htmlFusi-detecting13Fast variational inference in the Conjugate Exponential familyWe present a general method for deriving collapsed variational inference algorithms for probabilistic models in the conjugate exponential family. Our method unifies many existing approaches to collapsed variational inference. Our collapsed variational inference leads to a new lower bound on the marginal likelihood. We exploit the information geometry of the bound to derive much faster optimization methods based on conjugate gradients for these models. Our approach is very general and is easily applied to any model where the mean field update equations have been derived. Empirically we show significant speed-ups for probabilistic models optimized using our bound.Tue, 04 Dec 2012 00:00:00 +0000
http://inverseprobability.com/publications/hensman-fast12.html
http://inverseprobability.com/publications/hensman-fast12.htmlHensman:fast12Mining Regulatory Network Connections by Ranking Transcription Factor Target Genes Using Time Series Expression DataReverse engineering the gene regulatory network is challenging because the amount of available data is very limited compared to the complexity of the underlying network. We present a technique addressing this problem through focussing on a more limited problem: inferring direct targets of a transcription factor from short expression time series. The method is based on combining Gaussian process priors and ordinary differential equation models allowing inference on limited potentially unevenly sampled data. The method is implemented as an R/Bioconductor package, and it is demonstrated by ranking candidate targets of the p53 tumour suppressor.
Sat, 08 Sep 2012 00:00:00 +0000
http://inverseprobability.com/publications/honkela-mining12.html
http://inverseprobability.com/publications/honkela-mining12.htmlHonkela:mining12Manifold Relevance DeterminationIn this paper we present a fully Bayesian latent variable model which exploits conditional nonlinear (in)-dependence structures to learn an efficient latent representation. The latent space is factorized to represent shared and private information from multiple views of the data. In contrast to previous approaches, we introduce a relaxation to the discrete segmentation and allow for a “softly” shared latent space. Further, Bayesian techniques allow us to automatically estimate the dimensionality of the latent spaces. The model is capable of capturing structure underlying extremely high dimensional spaces. This is illustrated by modelling unprocessed images with tenths of thousands of pixels. This also allows us to directly generate novel images from the trained model by sampling from the discovered latent spaces. We also demonstrate the model by prediction of human pose in an ambiguous setting. Our Bayesian framework allows us to perform disambiguation in a principled manner by including latent space priors which incorporate the dynamic nature of the data.Tue, 26 Jun 2012 00:00:00 +0000
http://inverseprobability.com/publications/damianou-manifold12.html
http://inverseprobability.com/publications/damianou-manifold12.htmlDamianou-manifold12Kernels for Vector-Valued Functions: A ReviewKernel methods are among the most popular techniques in machine learning. From a regularization perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a probabilistic perspec- tive they are the key in the context of Gaussian processes, where the kernel function is known as the covariance function. Traditionally, kernel methods have been used in supervised learning problems with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partially by frameworks like multitask learning. In this monograph, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.Tue, 19 Jun 2012 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-vector12.html
http://inverseprobability.com/publications/alvarez-vector12.htmlAlvarez-vector12Modeling Meiotic Chromosomes Indicates a Size Dependent Contribution of Telomere Clustering and Chromosome Rigidity to Homologue JuxtapositionMeiosis is the cell division that halves the genetic component of diploid
cells to form gametes or spores. To achieve this, meiotic cells undergo a radical
spatial reorganisation of chromosomes. This reorganisation is a prerequisite for
the pairing of parental homologous chromosomes and the reductional division, which
halves the number of chromosomes in daughter cells. Of particular note is the change
from a centromere clustered layout (Rabl configuration) to a telomere clustered
conformation (bouquet stage). The contribution of the bouquet structure to homologous
chromosome pairing is uncertain. We have developed a new in silico model to represent
the chromosomes of Saccharomyces cerevisiae in space, based on a worm-like chain
model constrained by attachment to the nuclear envelope and clustering forces. We
have asked how these constraints could influence chromosome layout, with particular
regard to the juxtaposition of homologous chromosomes and potential nonallelic,
ectopic, interactions. The data support the view that the bouquet may be sufficient
to bring short chromosomes together, but the contribution to long chromosomes is
less. We also find that persistence length is critical to how much influence the
bouquet structure could have, both on pairing of homologues and avoiding contacts
with heterologues. This work represents an important development in computer modeling
of chromosomes, and suggests new explanations for why elucidating the functional
significance of the bouquet by genetics has been so difficult.
Thu, 03 May 2012 00:00:00 +0000
http://inverseprobability.com/publications/modeling-meiotic-chromosomes-indicates-a-size-dependent-contribution-of-telomere-clustering-and-chromosome-rigidity-to-homologue-juxtaposition.html
http://inverseprobability.com/publications/modeling-meiotic-chromosomes-indicates-a-size-dependent-contribution-of-telomere-clustering-and-chromosome-rigidity-to-homologue-juxtaposition.htmlPenfold-meiotic12Overlapping Mixtures of Gaussian Processes for the Data Association ProblemIn this work we introduce a mixture of GPs to address the data association problem, i.e., to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a non-standard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings.
Wed, 04 Apr 2012 00:00:00 +0000
http://inverseprobability.com/publications/lazaro-overlapping11.html
http://inverseprobability.com/publications/lazaro-overlapping11.htmlLazaro-overlapping11Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics StudiesExpression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at <http://ml.sheffield.ac.uk/qtl/>.Thu, 05 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/fusi-genomics12.html
http://inverseprobability.com/publications/fusi-genomics12.htmlFusi-genomics12Genome-wide occupancy links Hoxa2 to Wnt-$\beta$-catenin signaling in mouse embryonic developmentThe regulation of gene expression is central to developmental programs and largely depends on the binding of sequence-specific transcription factors with cis-regulatory elements in the genome. Hox transcription factors specify the spatial coordinates of the body axis in all animals with bilateral symmetry, but a detailed knowledge of their molecular function in instructing cell fates is lacking. Here, we used chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) to identify Hoxa2 genomic locations in a time and space when it is actively instructing embryonic development in mouse. Our data reveals that Hoxa2 has large genome coverage and potentially regulates thousands of genes. Sequence analysis of Hoxa2-bound regions identifies high occurrence of two main classes of motifs, corresponding to Hox and Pbx–Hox recognition sequences. Examination of the binding targets of Hoxa2 faithfully captures the processes regulated by Hoxa2 during embryonic development; in addition, it uncovers a large cluster of potential targets involved in the Wnt-signaling pathway. In vivo examination of canonical Wnt-$eta$-catenin signaling reveals activity specifically in Hoxa2 domain of expression, and this is undetectable in Hoxa2 mutant embryos. The comprehensive mapping of Hoxa2-binding sites provides a framework to study Hox regulatory networks in vertebrate developmental processes.Mon, 02 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/donaldson-genome12.html
http://inverseprobability.com/publications/donaldson-genome12.htmlDonaldson-genome12Identifying Targets of Multiple Co-regulated Transcription Factors from Expression Time-series by Bayesian Model Comparison**Background**
Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-speciﬁc networks involving a few interacting transcription factors (TFs) and all of their target genes.
**Results** We present a computational framework for Bayesian statistical inference of target genes of multiple interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models that describe transcription of target genes taking into account combinatorial regulation. The method consists of a training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our methodology to identify targets of five TFs regulating Drosophila melanogaster mesoderm development. We find that confident predicted links between TFs and targets are significantly enriched for supporting ChIP-chip binding events and annotated TF-gene interations. Our method statistically significantly outperforms existing alternatives.
**Conclusions** Our results show that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity. Introducing data from several different experimental perturbations significantly increases the accuracy.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/titsias-identifying12.html
http://inverseprobability.com/publications/titsias-identifying12.htmlTitsias-identifying12Residual Component AnalysisProbabilistic principal component analysis (PPCA) seeks a low dimensional representation of a data set in the presence of independent spherical Gaussian noise, $\Sigma = \sigma^2\mathbf{I}$. The maximum likelihood solution for the model is an eigenvalue problem on the sample covariance matrix. In this paper we consider the situation where the data variance is already partially explained by other factors, e.g. conditional dependencies between the covariates, or temporal correlations leaving some residual variance. We decompose the residual variance into its components through a generalised eigenvalue problem, which we call residual component analysis (RCA). We explore a range of new algorithms that arise from the framework, including one that factorises the covariance of a Gaussian density into a low-rank and a sparse-inverse component. We illustrate the ideas on the recovery of a protein-signaling network, a gene expression time-series data set and the recovery of the human skeleton from motion capture 3-D cloud data.Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-rca12.html
http://inverseprobability.com/publications/kalaitzis-rca12.htmlKalaitzis:rca12Gaussian Processes for Big Data with Stochastic Variational InferenceSun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/hensman-bigdata12.html
http://inverseprobability.com/publications/hensman-bigdata12.htmlHensman:bigdata12A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New ModelsWe introduce a new perspective on spectral dimensionality reduction which
views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective
is based on the maximum entropy principle which is in turn inspired by maximum variance
unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is
a nonlinear generalization of principal component analysis. We relate the model
to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally
linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant
of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that
MEU and ALLE are competitive with the leading spectral approaches on a robot navigation
visualization and a human motion capture data set. Finally the maximum likelihood
perspective allows us to introduce a new approach to dimensionality reduction based
on L1 regularization of the Gaussian random field via the graphical lasso.
Sun, 01 Jan 2012 00:00:00 +0000
http://inverseprobability.com/publications/a-unifying-probabilistic-perspective-for-spectral-dimensionality-reduction-insights-and-new-models.html
http://inverseprobability.com/publications/a-unifying-probabilistic-perspective-for-spectral-dimensionality-reduction-insights-and-new-models.htmlLawrence-unifying12Efficient Inference in Matrix-Variate Gaussian Models with i.i.d. Observation NoiseInference in matrix-variate Gaussian models has major applications for multi- output prediction and
joint learning of row and column covariances from matrix- variate data. Here, we discuss an approach for
efficient inference in such models that explicitly account for iid observation noise. Computational
tractability can be retained by exploiting the Kronecker product between row and column covariance
matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse
inverse covariance between features while accounting for a low-rank confounding covariance between
samples. We show practical utility on applications to biology, where we model covariances with more than
100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to
better reconstruct the confounders.
Mon, 12 Dec 2011 00:00:00 +0000
http://inverseprobability.com/publications/stegle-sparse11.html
http://inverseprobability.com/publications/stegle-sparse11.htmlStegle:sparse11Markov Chain Monte Carlo Algorithms for Gaussian Processes'What's going to happen next?' Time series data hold the answers, and Bayesian methods represent the cutting edge in learning what they have to say. This ambitious book is the first unified treatment of the emerging knowledge-base in Bayesian time series techniques. Exploiting the unifying framework of probabilistic graphical models, the book covers approximation schemes, both Monte Carlo and deterministic, and introduces switching, multi-object, non-parametric and agent-based models in a variety of application environments. It demonstrates that the basic framework supports the rapid creation of models tailored to specific applications and gives insight into the computational complexity of their implementation. The authors span traditional disciplines such as statistics and engineering and the more recently established areas of machine learning and pattern recognition. Readers with a basic understanding of applied probability, but no experience with time series analysis, are guided from fundamental concepts to the state-of-the-art in research and practice.Thu, 11 Aug 2011 00:00:00 +0000
http://inverseprobability.com/publications/titsias-mcmcgp11.html
http://inverseprobability.com/publications/titsias-mcmcgp11.htmlTitsias:mcmcgp11Linear Latent Force Models Using Gaussian ProcessesPurely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.Wed, 13 Jul 2011 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-llfm11.html
http://inverseprobability.com/publications/alvarez-llfm11.htmlAlvarez-llfm11Kernels for Vector-Valued Functions: a ReviewKernel methods are among the most popular techniques in machine learning. From a frequentist/discriminative perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a Bayesian/generative perspective they are the key in the context of Gaussian processes, where the kernel function is also known as the covariance function. Traditionally, kernel methods have been used in supervised learning problem with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partly by frameworks like multitask learning. In this paper, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.Thu, 30 Jun 2011 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-kernels11.html
http://inverseprobability.com/publications/alvarez-kernels11.htmlAlvarez-kernels11Residual Component AnalysisProbabilistic principal component analysis (PPCA) seeks a low dimensional representation of a data set in the presence of independent spherical Gaussian noise, $\Sigma = (\sigma^2)\mathbf{I}$. The maximum likelihood solution for the model is an eigenvalue problem on the sample covariance matrix. In this paper we consider the situation where the data variance is already partially explained by other factors, e.g. covariates of interest, or temporal correlations leaving some residual variance. We decompose the residual variance into its components through a generalized eigenvalue problem, which we call residual component analysis (RCA). We show that canonical covariates analysis (CCA) is a special case of our algorithm and explore a range of new algorithms that arise from the framework. We illustrate the ideas on a gene expression time series data set and the recovery of human pose from silhouette.Tue, 21 Jun 2011 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-rca11.html
http://inverseprobability.com/publications/kalaitzis-rca11.htmlKalaitzis:rca11Spectral Dimensionality Reduction via Maximum EntropyWe introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We directly maximize the likelihood and show results that are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set.Tue, 14 Jun 2011 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-spectral11.html
http://inverseprobability.com/publications/lawrence-spectral11.htmlLawrence:spectral11Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effectsExpression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals.
Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation.
We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies.Thu, 02 Jun 2011 00:00:00 +0000
http://inverseprobability.com/publications/fusi-accurate11.html
http://inverseprobability.com/publications/fusi-accurate11.htmlFusi-accurate11A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression**Background**
\
The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process.\
\
**Results**
\
We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach significantly outperforms the current state of the art.\
\
**Conclusions**\
\
Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of micro-array time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series.Fri, 20 May 2011 00:00:00 +0000
http://inverseprobability.com/publications/kalaitzis-simple11.html
http://inverseprobability.com/publications/kalaitzis-simple11.htmlKalaitzis-simple11Computationally Efficient Convolved Multiple Output <span>Gaussian</span> ProcessesRecently there has been an increasing interest in regression methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different efficient approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in school exams score prediction, pollution prediction and gene expression dataSun, 01 May 2011 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-computationally11.html
http://inverseprobability.com/publications/alvarez-computationally11.htmlAlvarez-computationally11tigre: Transcription Factor Inference through Gaussian Process Reconstruction of Expression for Bioconductor**Summary**: tigre is an R/Bioconductor package for inference of transcription
factor activity and ranking candidate target genes from gene expression time series.
The underlying methodology is based on Gaussian process inference on a differential
equation model that allows the use of short, unevenly sampled, time series. The
method has been designed with efficient parallel implementation in mind, and the
package supports parallel operation even without additional software.
**Availability**: The tigre package is included in Bioconductor since release 2.6 for R 2.11. The
package and a user's guide are available at http://www.bioconductor.org.
**Contact**: antti.honkela@hiit.fi; m.rattray@sheffield.ac.uk; n.lawrence@dcs.shef.ac.uk
Mon, 07 Feb 2011 00:00:00 +0000
http://inverseprobability.com/publications/transcription-factor-inference-through-gaussian-process-reconstruction-of-expression-for-bioconductor.mdhonkela-tigre11.html
http://inverseprobability.com/publications/transcription-factor-inference-through-gaussian-process-reconstruction-of-expression-for-bioconductor.mdhonkela-tigre11.htmlHonkela-tigre11<span>G</span>aussian Process Inference for Differential Equation Models of Transcriptional RegulationSat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gpinference11.html
http://inverseprobability.com/publications/lawrence-gpinference11.htmlLawrence:gpinference11Variational Gaussian Process Dynamical SystemsHigh dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences.Sat, 01 Jan 2011 00:00:00 +0000
http://inverseprobability.com/publications/damianou-vgpds11.html
http://inverseprobability.com/publications/damianou-vgpds11.htmlDamianou:vgpds11A Unifying Probabilistic Perspective for Spectral Dimensionality ReductionWe introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We develop new algorithms that directly maximize the likelihood and show that these new algorithms are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.Fri, 22 Oct 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-unifying10.html
http://inverseprobability.com/publications/lawrence-unifying10.htmlLawrence:unifying10TFInfer: a tool for probabilistic inference of transcription factor activities<b>Summary</b>: TFInfer is a novel open access, standalone tool for genome-wide inference of transcription factor activities from gene expression data. Based on an earlier MATLAB version, the software has now been extended in a number of ways. It has been significantly optimised in terms of performance, and it was given novel functionality, by allowing the user to model both time series and data from multiple independent conditions. With a full documentation and intuitive graphical user interface, together with an in-built data base of yeast and Escherichia coli transcription factors, the software does not require any mathematical or computational expertise to be used effectively.<br><br>
<b>Availability</b>: <http://homepages.inf.ed.ac.uk/gsanguin/TFInfer.html><br><br>
<b>Contact</b>: [gsanguin@staffmail.ed.ac.uk](gsanguin@staffmail.ed.ac.uk)Fri, 15 Oct 2010 00:00:00 +0000
http://inverseprobability.com/publications/asif-tfinfer10.html
http://inverseprobability.com/publications/asif-tfinfer10.htmlAsif-tfinfer10Model-based Method for Transcription Factor Target Identification with Limited DataWe present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.Tue, 27 Apr 2010 00:00:00 +0000
http://inverseprobability.com/publications/honkela-modelbased10.html
http://inverseprobability.com/publications/honkela-modelbased10.htmlHonkela-modelbased10Bayesian Gaussian Process Latent Variable ModelWe introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs.Wed, 31 Mar 2010 00:00:00 +0000
http://inverseprobability.com/publications/titsias-bayesgplvm10.html
http://inverseprobability.com/publications/titsias-bayesgplvm10.htmlTitsias:bayesGPLVM10Efficient Multioutput Gaussian Processes through Variational Inducing KernelsInterest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Álvarez and Lawrence @Alvarez:convolved08 recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias @Titsias:variational09 to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.Wed, 31 Mar 2010 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-efficient10.html
http://inverseprobability.com/publications/alvarez-efficient10.htmlAlvarez:efficient10Elementary properties of CaV1.3 Ca2+ channels expressed in mouse cochlear inner hair cellsMammalian cochlear inner hair cells (IHCs) are specialized to process developmental signals during immature stages and sound stimuli in adult animals. These signals are conveyed onto auditory afferent nerve fibres. Neurotransmitter release at IHC ribbon synapses is controlled by L-type CaV1.3 Ca2+ channels, the biophysics of which are still unknown in native mammalian cells. We have investigated the localization and elementary properties of Ca2+ channels in immature mouse IHCs under near-physiological recording conditions. CaV1.3 Ca2+ channels at the cell pre-synaptic site co-localize with about half of the total number of ribbons present in immature IHCs. These channels activated at relatively hyperpolarized membrane potentials (about -70 mV), showed a relatively short first latency and weak inactivation, which would allow IHCs to generate and accurately encode spontaneous Ca2+ action potential activity characteristic of these immature cells. The CaV1.3 Ca2+ channels showed a very low open probability (about 0.15 at -20 mV: near the peak of an action potential). Comparison of elementary and macroscopic Ca2+ currents indicated that very few Ca2+ channels are associated with each docked vesicle at IHC ribbon synapses. Finally, we found that the open probability of Ca2+ channels, but not their opening time, was voltage dependent. This finding provides a possible correlation between presynaptic Ca2+ channel properties and the characteristic frequency/amplitude of EPSCs in auditory afferent fibres.Wed, 20 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/zampini-elementary09.html
http://inverseprobability.com/publications/zampini-elementary09.htmlZampini-elementary09Introduction to Learning and Inference in Computational Systems BiologyComputational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-licsbintro10.html
http://inverseprobability.com/publications/lawrence-licsbintro10.htmlLawrence:licsbintro10Gaussian Processes for Missing Species in Biochemical SystemsComputational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-licsbgp10.html
http://inverseprobability.com/publications/lawrence-licsbgp10.htmlLawrence:licsbgp10A Brief Introduction to <span>B</span>ayesian InferenceComputational systems biology aims to develop algorithms that uncover the structure and parameterization of the underlying mechanistic model—in other words, to answer specific questions about the underlying mechanisms of a biological system—in a process that can be thought of as learning or inference. This volume offers state-of-the-art perspectives from computational biology, statistics, modeling, and machine learning on new methodologies for learning and inference in biological networks. The chapters offer practical approaches to biological inference problems ranging from genome-wide inference of genetic regulation to pathway-specific studies. Both deterministic models (based on ordinary differential equations) and stochastic models (which anticipate the increasing availability of data from small populations of cells) are considered. Several chapters emphasize Bayesian inference, so the editors have included an introduction to the philosophy of the Bayesian approach and an overview of current work on Bayesian inference. Taken together, the methods discussed by the experts in Learning and Inference in Computational Systems Biology provide a foundation upon which the next decade of research in systems biology can be built.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-licsbbayes10.html
http://inverseprobability.com/publications/lawrence-licsbbayes10.htmlLawrence:licsbbayes10Switched Latent Force Models for Movement SegmentationLatent force models encode the interaction between multiple related dynamical systems in the form of a kernel or covariance function. Each variable to be modeled is represented as the output of a differential equation and each differential equation is driven by a weighted sum of latent functions with uncertainty given by a Gaussian process prior. In this paper we consider employing the latent force model framework for the problem of determining robot motor primitives. To deal with discontinuities in the dynamical systems or the latent driving force we intro- duce an extension of the basic latent force model, that switches between different latent functions and potentially different dynamical systems. This creates a versatile representation for robot movements that can capture discrete changes and non-linearities in the dynamics. We give illustrative examples on both synthetic data and for striking movements recorded using a Barrett WAM robot as haptic input device. Our inspiration is robot motor primitives, but we expect our model to have wide application for dynamical systems including models for human motion capture data and systems biology.Fri, 01 Jan 2010 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-switched10.html
http://inverseprobability.com/publications/alvarez-switched10.htmlAlvarez:switched10Variational Inducing Kernels for Sparse Convolved Multiple Output Gaussian ProcessesInterest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Álvarez and Lawrence (2009) recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.Wed, 16 Dec 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-viktech09.html
http://inverseprobability.com/publications/alvarez-viktech09.htmlAlvarez:vikTech09puma: a Bioconductor package for Propagating Uncertainty in Microarray Analysis<b>Background</b><br><br>
Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied.<br><br>
<b>Results</b><br><br>
puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation.
<b>Conclusions</b>
For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.Thu, 09 Jul 2009 00:00:00 +0000
http://inverseprobability.com/publications/pearson-puma09.html
http://inverseprobability.com/publications/pearson-puma09.htmlPearson-puma09Non-Linear Matrix Factorization with Gaussian ProcessesA popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous state-of-the-art performance.Mon, 01 Jun 2009 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nlmf09.html
http://inverseprobability.com/publications/lawrence-nlmf09.htmlLawrence:nlmf09Latent Force ModelsPurely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modeling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from computational biology, motion capture and geostatistics.Wed, 15 Apr 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-lfm09.html
http://inverseprobability.com/publications/alvarez-lfm09.htmlAlvarez:lfm09Backing Off: Hierarchical Decomposition of Activity for 3D Novel Pose RecoveryFor model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, lowdimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity modelsThu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/darby-backing09.html
http://inverseprobability.com/publications/darby-backing09.htmlDarby:backing09Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian ProcessesIdentification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our methdo involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible *without* solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and elay differential equations, and provide a comprehensive comparison with current state of the art methods.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/calderhead-accelerating08.html
http://inverseprobability.com/publications/calderhead-accelerating08.htmlCalderhead:accelerating08Sparse Convolved Multiple Output <span>G</span>aussian ProcessesRecently there has been an increasing interest in methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different sparse approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in pollution prediction, school exams score prediction and gene expression data.Thu, 01 Jan 2009 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-multitech09.html
http://inverseprobability.com/publications/alvarez-multitech09.htmlAlvarez:multiTech09Efficient Sampling for Gaussian Process Inference using Control VariablesSampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by continuously minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation.Mon, 08 Dec 2008 00:00:00 +0000
http://inverseprobability.com/publications/titsias-efficient08.html
http://inverseprobability.com/publications/titsias-efficient08.htmlTitsias:efficient08Sparse Convolved Gaussian Processes for Multi-output RegressionWe present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network.Mon, 08 Dec 2008 00:00:00 +0000
http://inverseprobability.com/publications/alvarez-convolved08.html
http://inverseprobability.com/publications/alvarez-convolved08.htmlAlvarez:convolved08Gaussian Process Modelling of Latent Chemical Species: Applications to Inferring Transcription Factor Activities<b>Motivation:</b> Inference of <i>latent chemical species<i> in
biochemical interaction networks is a key problem in estimation of
the structure and parameters of the genetic, metabolic and protein
interaction networks that underpin all biological processes. We
present a framework for Bayesian marginalisation of these latent
chemical species through Gaussian process priors.<br><br>
<b>Results:</b> We demonstrate our general approach on three
different biological examples of single input motifs, including both
activation and repression of transcription. We focus in particular
on the problem of inferring transcription factor activity when the
concentration of active protein cannot easily be measured. We show
how the uncertainty in the inferred transcription factor activity
can be integrated out in order to derive a likelihood function that
can be used for the estimation of regulatory model parameters. An
advantage of our approach is that we avoid the use of a
coarse-grained discretization of continuous-time functions, which
would lead to a large number of additional parameters to be
estimated. We develop efficient exact and approximate inference
schemes, which are much more efficient than competing sampling-based
schemes and therefore provide us with a practical toolkit for
model-based inference.<br><br>
<b>Availability:</b> The software and data for recreating all the
experiments in this paper is available in MATLAB from
<http://inverseprobability.com/gpsim><br><br>
<b>Contact:</b> Neil Lawrence
Fri, 15 Aug 2008 00:00:00 +0000
http://inverseprobability.com/publications/gao-latent08.html
http://inverseprobability.com/publications/gao-latent08.htmlGao-latent08Ambiguity Modeling in Latent SpacesWe are interested in the situation where we have two or more representations of an underlying phenomenon. In particular we are interested in the scenario where the representation are complementary. This implies that a single individual representation is not sufficient to fully discriminate a specific instance of the underlying phenomenon, it also means that each representation is an ambiguous representation of the other complementary spaces. In this paper we present a latent variable model capable of consolidating multiple complementary representations. Our method extends canonical correlation analysis by introducing additional latent spaces that are specific to the different representations, thereby explaining the full variance of the observations. These additional spaces, explaining representation specific variance, separately model the variance in a representation ambiguous to the other. We develop a spectral algorithm for fast computation of the embeddings and a probabilistic model (based on Gaussian processes) for validation and inference. The proposed model has several potential application areas, we demonstrate its use for multi-modal regression on a benchmark human pose estimation data set.Sat, 09 Aug 2008 00:00:00 +0000
http://inverseprobability.com/publications/ek-ambiguity08.html
http://inverseprobability.com/publications/ek-ambiguity08.htmlEk:ambiguity08Topologically-Constrained Latent Variable ModelsIn dimensionality reduction approaches, the data are typically embedded in a Euclidean latent space. However for some data sets this is inappropriate. For example, in human motion data we expect latent spaces that are cylindrical or a toroidal, that are poorly captured with a Euclidean space. In this paper, we present a range of approaches for embedding data in a non-Euclidean latent space. Our focus is the Gaussian Process latent variable model. In the context of human motion modeling this allows us to (a) learn models with interpretable latent directions enabling, for example, style/content separation, and (b) generalise beyond the data set enabling us to learn transitions between motion styles even though such transitions are not present in the data.Sat, 05 Jul 2008 00:00:00 +0000
http://inverseprobability.com/publications/urtasun-topology08.html
http://inverseprobability.com/publications/urtasun-topology08.htmlUrtasun:topology08Gaussian Process Latent Variable Models For Human Pose EstimationWe describe a method for recovering 3D human body pose from silhouettes. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM) \[1\] encapsulating both pose and silhouette features Our method is generative, this allows us to model the ambiguities of a silhouette representation in a principled way. We learn a dynamical model over the latent space which allows us to disambiguate between ambiguous silhouettes by temporal consistency. The model has only two free parameters and has several advantages over both regression approaches and other generative methods. In addition to the application shown in this paper the suggested model is easily extended to multiple observation spaces without constraints on type.Tue, 01 Jan 2008 00:00:00 +0000
http://inverseprobability.com/publications/ek-pose07.html
http://inverseprobability.com/publications/ek-pose07.htmlEk:pose07Variational Optimisation by Marginal MatchingFri, 07 Dec 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nipsw07.html
http://inverseprobability.com/publications/lawrence-nipsw07.htmlLawrence:nipsw07Model-driven detection of Clean Speech Patches in NoiseListeners may be able to recognise speech in adverse conditions by glimpsing time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A cleanness measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.Wed, 01 Aug 2007 00:00:00 +0000
http://inverseprobability.com/publications/laidler-model07.html
http://inverseprobability.com/publications/laidler-model07.htmlLaidler:model07Learning for Larger Datasets with the Gaussian Process Latent Variable ModelIn this paper we apply the latest techniques in sparse Gaussian process regression (GPR) to the Gaussian process latent variable model (GP-LVM). We review three techniques and discuss how they may be implemented in the context of the GP-LVM. Each approach is then implemented on a well known benchmark data set and compared with earlier attempts to sparsify the model.Sun, 11 Mar 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-larger07.html
http://inverseprobability.com/publications/lawrence-larger07.htmlLawrence:larger07Modelling transcriptional regulation using Gaussian ProcessesModelling the dynamics of transcriptional processes in the cell requires the knowledge of a number of key biological quantities. While some of them are relatively easy to measure, such as mRNA decay rates and mRNA abundance levels, it is still very hard to measure the active concentration levels of the transcription factor proteins that drive the process and the sensitivity of target genes to these concentrations. In this paper we show how these quantities for a given transcription factor can be inferred from gene expression levels of a set of known target genes. We treat the protein concentration as a latent function with a Gaussian Process prior, and include the sensitivities, mRNA decay rates and baseline expression levels as hyperparameters. We apply this procedure to a human leukemia dataset, focusing on the tumour repressor p53 and obtaining results in good accordance with recent biological studies.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-transcriptionalgp06.html
http://inverseprobability.com/publications/lawrence-transcriptionalgp06.htmlLawrence:transcriptionalGP06Hierarchical Gaussian Process Latent Variable ModelsThe Gaussian process latent variable model (GP-LVM) is a powerful approach for probabilistic modelling of high dimensional data through dimensional reduction. In this paper we extend the GP-LVM through hierarchies. A hierarchical model (such as a tree) allows us to express conditional independencies in the data as well as the manifold structure. We first introduce Gaussian process hierarchies through a simple dynamical model, we then extend the approach to a more complex hierarchy which is applied to the visualisation of human motion data sets.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-hgplvm07.html
http://inverseprobability.com/publications/lawrence-hgplvm07.htmlLawrence:hgplvm07WiFi-SLAM Using Gaussian Process Latent Variable ModelsWiFi localization, the task of determining the physical location of a mobile device from wireless signal strengths, has been shown to be an accurate method of indoor and outdoor localization and a powerful building block for location-aware applications. However, most localization techniques require a training set of signal strength readings labeled against a ground truth location map, which is prohibitive to collect and maintain as maps grow large. In this paper we propose a novel technique for solving the WiFi SLAM problem using the Gaussian Process Latent Variable Model (GP-LVM) to determine the latent-space locations of unlabeled signal strength data. We show how GP-LVM, in combination with an appropriate motion dynamics model, can be used to reconstruct a topological connectivity graph from a signal strength sequence which, in combination with the learned Gaussian Process signal strength model, can be used to perform efficient localization.Mon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/ferris-wifi07.html
http://inverseprobability.com/publications/ferris-wifi07.htmlFerris:wifi07Gaussian Process Latent Variable Models for Fault DetectionThe Gaussian process latent variable model (GPLVM) is a novel unsupervised approach to nonlinear low dimensional embedding proposed by Lawrence (2005). This paper presents the development of a framework for the implementation of the GPLVM for fault detection. A series of experiments have been carried out comparing and combining the GPLVM to the conventional and widely used linear dimension reduction technique of principal component analysis (PCA). The inclusion of the GPLVM for the visualisation and data analysis, led to a considerable improvement in the classification resultsMon, 01 Jan 2007 00:00:00 +0000
http://inverseprobability.com/publications/eciolaza-fault07.html
http://inverseprobability.com/publications/eciolaza-fault07.htmlEciolaza:fault07Local Distance Preservation in the GP-LVM through Back ConstraintsThe Gaussian process latent variable model (GP-LVM) is a generative approach to non-linear low dimensional embedding, that provides a smooth probabilistic mapping from latent to data space. It is also a non-linear generalization of probabilistic PCA (PPCA) @Tipping:probpca99. While most approaches to non-linear dimensionality methods focus on preserving local distances in data space, the GP-LVM focusses on exactly the opposite. Being a smooth mapping from latent to data space, it focusses on keeping things apart in latent space that are far apart in data space. In this paper we first provide an overview of dimensionality reduction techniques, placing the emphasis on the kind of distance relation preserved. We then show how the GP-LVM can be generalized, through back constraints, to additionally preserve local distances. We give illustrative experiments on common data sets.Sun, 25 Jun 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-backconstraints06.html
http://inverseprobability.com/publications/lawrence-backconstraints06.htmlLawrence:backconstraints06Missing Data in Kernel PCAKernel Principal Component Analysis (KPCA) is a widely used
technique for visualisation and feature extraction. Despite its
success and flexibility, the lack of a probabilistic interpretation
means that some problems, such as handling missing or corrupted
data, are very hard to deal with. In this paper we exploit the
probabilistic interpretation of linear PCA together with recent
results on latent variable models in Gaussian Processes in order to
introduce an objective function for KPCA. This in turn allows a
principled approach to the missing data problem. Furthermore, this
new approach can be extended to reconstruct corrupted test data
using fixed kernel feature extractors. The experimental results show
strong improvements over widely used heuristics.
Mon, 19 Jun 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-missingkpca06.html
http://inverseprobability.com/publications/sanguinetti-missingkpca06.htmlSanguinetti:missingkpca06Large Scale Learning with the Gaussian Process Latent Variable ModelIn this paper we apply the latest techniques in sparse Gaussian
process regression (GPR) to the Gaussian process latent variable
model (GP-LVM). We review three techniques and discuss how they may
be implemented in the context of the GP-LVM. We briefly consider a
GPR toy problem to highlight the strenghts and weaknesses of the
different approaches before studying the perfomance of these
techniques on a benchmark visualisation data set.
Fri, 17 Feb 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-largescale06.html
http://inverseprobability.com/publications/lawrence-largescale06.htmlLawrence:largescale06Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant AnalysisIn this paper we consider a novel Bayesian interpretation of Fisher’s discriminiant analysis. We relate Rayleigh’s coefficient to a noise model that minimizes a cost based on the most probable class centres and that abandons the ‘regression to the labels’ assumption used by other algorithms. This yields a direction of discrimination equivalent to Fisher’s discriminant. We use Bayes’ rule to infer the posterior distribution for the direction of discrimination and in this process, priors and constraining distributions are incorporated to reach the desired result. Going further, with the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher’s discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through the optimisation of the marginal log-likelihood of the data. An added bonus of the new formulation is that it enables us to link the regularisation coefficient with the generalisation error.Wed, 01 Feb 2006 00:00:00 +0000
http://inverseprobability.com/publications/pena-fbd04.html
http://inverseprobability.com/publications/pena-fbd04.htmlPena-fbd04The Gaussian Process Latent Variable ModelThe Gaussian process latent variable model (GP-LVM) is a recently proposed probabilistic approach to obtaining a reduced dimension representation of a data set. In this tutorial we motivate and describe the GP-LVM, giving reviews of the model itself and some of the concepts behind it.Fri, 27 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/the-gaussian-process-latent-variable-model.html
http://inverseprobability.com/publications/the-gaussian-process-latent-variable-model.htmlLawrence:gplvmtut06A Probabilistic Model to Integrate Chip and Microarray DataTue, 10 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-integrate06.html
http://inverseprobability.com/publications/sanguinetti-integrate06.htmlSanguinetti:integrate06Identifying submodules of cellular regulatory networksRecent high throughput techniques in molecular biology have brought about the possibility of directly identifying the architecture of regulatory networks on a genome-wide scale. However, the computational task of estimating fine-grained models on a genome-wide scale is daunting. Therefore, it is of great importance to be able to reliably identify submodules of the network that can be effectively modelled as independent subunits. In this paper we present a procedure to obtain submodules of a cellular network by using information from gene-expression measurements. We integrate network architecture data with genome-wide gene expression measurements in order to determine which regulatory relations are actually confirmed by the expression data. We then use this information to obtain non-trivial submodules of the regulatory network using two distinct algorithms, a naive exhaustive algorithm and a spectral algorithm based on the eigendecomposition of an affinity matrix. We test our method on two yeast biological data sets, using regulatory information obtained from chromatin immunoprecipitation.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-trento06.html
http://inverseprobability.com/publications/sanguinetti-trento06.htmlSanguinetti:trento06Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities<span>**Motivation**</span>: Quantitative estimation of the regulatory relationship between transcription factors and genes is a fundamental stepping stone when trying to develop models of cellular processes. Recent experimental high-throughput techniques such as Chromatine Immunoprecipitation provide important information about the architecture of the regulatory networks in the cell. However, it is very difficult to measure the concentration levels of transcription factor proteins and determine their regulatory effect on gene transcription. It is therefore an important computational challenge to infer these quantities using gene expression data and network architecture data.\
\
<span>**Results**</span>: We develop a probabilistic state space model that allows genome-wide inference of both transcription factor protein concentrations and their effect on the transcription rates of each target gene from microarray data. We use variational inference techniques to learn the model parameters and perform posterior inference of protein concentrations and regulatory strengths. The probabilistic nature of the model also means that we can associate credibility intervals to our estimates, as well as providing a tool to detect which binding events lead to significant regulation. We demonstrate our model on artificial data and on two yeast data sets in which the network structure has previously been obtained using Chromatine Immunoprecipitation data. Predictions from our model are consistent with the underlying biology and offer novel quantitative insights into the regulatory structure of the yeast cell.\
\
<span>**Availability**</span>: MATLAB code is available from <http://umber.sbs.man.ac.uk/resources/puma>.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-chipvar06.html
http://inverseprobability.com/publications/sanguinetti-chipvar06.htmlSanguinetti-chipvar06Propagating Uncertainty in Microarray Data AnalysisMicroarray technology is associated with many sources of experimental uncertainty. In this review we discuss a number of approaches for dealing with this uncertainty in the processing of data from microarray experiments. We focus here on the analysis of high-density oligonucleotide arrays, such as the popular Affymetrix GeneChip® array, which contain multiple probes for each target. This set of probes can be used to determine an estimate for the target concentration and can also be used to determine the experimental uncertainty associated with this measurement. This measurement uncertainty can then be propagated through the downstream analysis using probabilistic methods. We give examples showing how these credibility intervals can be used to help identify differential expression, to combine information from replicated experiments and to improve the performance of principal component analysis.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/rattray-propagating06.html
http://inverseprobability.com/publications/rattray-propagating06.htmlRattray-propagating06Probe-level Measurement Error Improves Accuracy in Detecting Differential Gene Expression<span>**Motivation:**</span> Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level variance provides useful information which can help in the identification of differentially expressed genes.\
\
<span>**Results:**</span> We propose a Bayesian method to include probe-level variances into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for effcient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational effciency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in data set and a mouse time-course data set. Results show that the inclusion of probelevel measurement error improves accuracy in detecting differential gene expression.\
\
<span>**Availability:**</span> The methods described in this paper have been implemented in an R package *pplr* that is currently available from <http://umber.sbs.man.ac.uk/resources/puma>.\
\
<span>**Contact:**</span> Magnus RattraySun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/liu-variances06.html
http://inverseprobability.com/publications/liu-variances06.htmlLiu-variances06Gaussian Processes and the Null-Category Noise ModelWith Gaussian process classifiers (GPC) we aim to predict the posterior probability of the class labels given an input data point, $p(y_i|x_i)$. In general we find that this posterior distribution is unaffected by unlabeled data points during learning. Support vector machines are strongly related to GPCs, but one notable difference is that the decision boundary in an SVM can be influenced by unlabeled data. The source of this discrepancy is the SVM’s margin: a characteristic which is not shared with the GPC. The presence of the marchin allows the support vector machine to seek low data density regions for the decision boundary, effectively allowing it to incorporate the cluster assumption (see Chapter 6). In this chapter we present the *null category noise model*. A probabilistic equivalent of the margin. By combining this noise model with a GPC we are able to incorporated the cluster assumption without explicitly modeling the input data density distributions and without a special choice of kernel.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gpncnm05.html
http://inverseprobability.com/publications/lawrence-gpncnm05.htmlLawrence:gpncnm05Fast Variational Inference for Gaussian Process Models through KL-CorrectionVariational inference is a exible approach to solving problems of intractability in Bayesian models. Unfortunately the convergence of variational methods is often slow. We review a recently suggested variational approach for approximate inference in Gaussian process (GP) models and show how convergence may be dramatically improved through the use of a positive correction term to the standard variational bound. We refer to the modied bound as a KL-corrected bound. The KL-corrected bound is a lower bound on the true likelihood, but an upper bound on the original variational bound. Timing comparisons between optimisation of the two bounds show that optimisation of the new bound consistently improves the speed of convergence.Sun, 01 Jan 2006 00:00:00 +0000
http://inverseprobability.com/publications/king-klcorrection06.html
http://inverseprobability.com/publications/king-klcorrection06.htmlKing:klcorrection06Probabilistic Non-linear Principal Component Analysis with <span>G</span>aussian Process Latent Variable ModelsSummarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GP-LVM). Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling. We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes. We demonstrate the model on a range of real-world and artificially generated data sets.Tue, 01 Nov 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-pnpca05.html
http://inverseprobability.com/publications/lawrence-pnpca05.htmlLawrence-pnpca05A Hybrid <span>MaxEnt/HMM</span> Based <span>ASR</span> SystemThe aim of this work is to develop a practical framework, which extends the classical Hidden Markov Model (HMM) for continuous speech recognition based on the Maximum Entropy (MaxEnt) principle. The MaxEnt models can estimate the posterior probabilities directly as with Hybrid NN/HMM connectionist speech recogniton systems. In particular, a new acoustic modelling based on discriminative MaxEnt models is formulated and is being developed to replace the generative Gaussian Mixture Models (GMM) commonly used to model acoustic variability. Initial experimental results using the TIMIT phone task are reported.Sun, 04 Sep 2005 00:00:00 +0000
http://inverseprobability.com/publications/hifny-maxent05.html
http://inverseprobability.com/publications/hifny-maxent05.htmlHifny:maxent05A Tractable Probabilistic Model for Affymetrix Probe-level Analysis across Multiple Chips**Motivation:** Affymetrix GeneChip arrays are currently the most
widely used microarray technology. Many summarisation methods have
been developed to provide gene expression levels from Affymetrix
probe-level data. Most of the currently popular methods do not
provide a measure of uncertainty for the expression level of each
gene. The use of probabilistic models can overcome this limitation.
A full hierarchical Bayesian approach requires the use of
computationally intensive MCMC methods that are impractical for
large data sets. An alternative computationally efficient
probabilistic model, mgMOS, uses Gamma distributions to model
specific and non-specific binding with a latent variable to capture
variations in probe affinity. Although promising, the main
limitations of this model are that it does not use information from
multiple chips and that it does not account for specific binding to
the mismatch (MM) probes.
**Results:** We extend mgMOS to model the binding affinity of
probe-pairs across multiple chips and to capture the effect of
specific binding to MM probes. The new model, multi-mgMOS, provides
improved accuracy, as demonstrated on some bench-mark data sets and
a real time-course data set, and is much more computationally
efficient than a competing hierarchical Bayesian approach that
requires MCMC sampling. We demonstrate how the probabilistic model
can be used to estimate credibility intervals for expression levels
and their log-ratios between conditions.
**Availability:** Both mgMOS and the new model multi-mgMOS have been
implemented in an R package that is currently available from <http://umber.sbs.man.ac.uk/resources/puma>.
Thu, 14 Jul 2005 00:00:00 +0000
http://inverseprobability.com/publications/liu-tractable04.html
http://inverseprobability.com/publications/liu-tractable04.htmlLiu-tractable04Variational inference for <span>S</span>tudent-$t$ models: Robust <span>B</span>ayesian interpolation and generalised component analysisWe demonstrate how a variational approximation scheme enables effective inference of key parameters in probabilisitic signal models which employ the Student-t distribution. Using the two scenarios of previous termrobustnext term interpolation and independent component analysis (ICA) as examples, we illustrate the key feature of the approach: that the form of the noise distribution in the interpolation case, and the source distributions in the ICA case, can be inferred from the data concurrent with all other model parameters.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/tipping-variational05.html
http://inverseprobability.com/publications/tipping-variational05.htmlTipping-variational05Automatic Determination of the Number of Clusters Using Spectral AlgorithmsSat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-automatic05.html
http://inverseprobability.com/publications/sanguinetti-automatic05.htmlSanguinetti:automatic05Accounting for Probe-level Noise in Principal Component Analysis of Microarray Data<span>**Motivation:**</span> Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.\
\
<span>**Results:**</span> We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ’denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.\
\
<span>**Availability:**</span> The software used in the paper is available from <http://www.bioinf.manchester.ac.uk/resources/puma>. The microarray data are deposited in the NCBI database.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/sanguinetti-accounting05.html
http://inverseprobability.com/publications/sanguinetti-accounting05.htmlSanguinetti-accounting05Semi-supervised Learning via <span>G</span>aussian ProcessesWe present a probabilistic approach to learning a Gaussian Process classifier in the presence of unlabeled data. Our approach involves a “null category noise model” (NCNM) inspired by ordered categorical noise models. The noise model reflects an assumption that the data density is lower between the class-conditional densities. We illustrate our approach on a toy problem and present comparative results for the semi-supervised classification of handwritten digits.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-semisuper04.html
http://inverseprobability.com/publications/lawrence-semisuper04.htmlLawrence:semisuper04MOCAP Toolbox for MATLABSat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-mocap05.html
http://inverseprobability.com/publications/lawrence-mocap05.htmlLawrence-mocap05Extensions of the Informative Vector MachineThe informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-extensions05.html
http://inverseprobability.com/publications/lawrence-extensions05.htmlLawrence:extensions05Variational Inference in <span>G</span>aussian Processes via Probabilistic Point AssimilationWe introduce a novel variational approach for approximate inference in Gaussian process (GP) models. The key advantages of our approach are the ease with which different noise models can be incorporated and improved speed of convergence. We refer to the algorithm as probabilistic point assimilation (PPA). We introduce the algorithm firstly using the ‘weight space’ view and then through its Gaussian process formulation. We illustrate the approach on several benchmark data sets.Sat, 01 Jan 2005 00:00:00 +0000
http://inverseprobability.com/publications/king-ppa05.html
http://inverseprobability.com/publications/king-ppa05.htmlKing:ppa05Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable ModelsSummarising a high dimensional data-set with a low dimensional embedding is a standard approach for exploring its structure. In this paper we provide an overview of some existing techniques for discovering such embeddings. We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA). The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes. We refer to this model as a Gaussian process latent variable model (GPLVM). We develop a practical algorithm for GPLVMs which allow for non-linear mappings from the embedded space giving a non-linear probabilistic version of PCA. We develop the new algorithm to provide a principled approach to handling discrete valued data and missing attributes. We demonstrate the algorithm on a range of real-world and artificially generated data-sets and finally, through analysis of the GPLVM objective function, we relate the algorithm to popular spectral techniques such as kernel PCA and multidimensional scaling.Fri, 13 Aug 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gplvmtech04.html
http://inverseprobability.com/publications/lawrence-gplvmtech04.htmlLawrence:gplvmTech04Reducing the Variability in <span>cDNA</span> Microarray Image Processing by <span>B</span>ayesian Inference<span>**Motivation:**</span> Gene expression levels are obtained from microarray experiments through the extraction of pixel intensities from a scanned image of the slide. It is widely acknowledged that variabilities can occur in expression levels extracted from the same images by different users with the same software packages. These inconsistencies arise due to differences in the refinement of the placement of the microarray ‘grids’. We introduce a novel automated approach to the refinement of grid placements that is based upon the use of Bayesian inference for determining the size, shape and positioning of the microarray ‘spots’, capturing uncertainty that can be passed to downstream analysis.\
\
<span>**Results:**</span> Our experiments demonstrate that variability between users can be significantly reduced using the approach. The automated nature of the approach also saves hours of researchers’ time normally spent in refining the grid placement.\
\
<span>**Availability:**</span> A MATLAB implementation of the algorithm and an image of the slide used in our experiments, as well as the code necessary to recreate them are available for non-commercial use from <http://inverseprobability.com/vis>.Thu, 22 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-variability03.html
http://inverseprobability.com/publications/lawrence-variability03.htmlLawrence-variability03Optimising Kernel Parameters and Regularisation Coefficients for Non-linear Discriminant AnalysisIn this paper we consider a Bayesian interpretation of Fisher's discriminant. By relating Rayleigh's coefficient to a likelihood function and through the choice of a suitable prior we use Bayes' rule to infer a posterior distribution over projections. Through the use of a Gaussian process prior we show the equivalence of our model to a regularised kernel Fisher's discriminant. A key advantage of our approach is the facility to determine kernel parameters and the regularisation coefficient through optimisation of the marginalised likelihood of the data.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/pena-fbd-tech04.html
http://inverseprobability.com/publications/pena-fbd-tech04.htmlPena:fbd-tech04Matching Kernels through Kullback-Leibler Divergence MinimisationIn this paper we study the general constrained minimisation of Kullback-Leibler (KL) divergences between two zero mean Gaussian distributions. We reduce the problem to an equivalent minimisation involving the eigenvectors of the two kernel matrices, and provide explicit solutions in some cases. We then focus, as an example, on the important case of constraining the approximating matrix to be block diagonal. We prove a stability result on the approximating matrix, and speculate on how these results may be used to give further theoretical foundation to widely used techniques such as spectral clustering.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-matching04.html
http://inverseprobability.com/publications/lawrence-matching04.htmlLawrence:matching04Learning to Learn with the Informative Vector MachineThis paper describes an efficient method for learning the parameters of a Gaussian process (GP). The parameters are learned from multiple tasks which are assumed to have been drawn independently from the same GP prior. An efficient algorithm is obtained by extending the informative vector machine (IVM) algorithm to handle the multi-task learning case. The multi-task IVM (MT-IVM) saves computation by greedily selecting the most informative examples from the separate tasks. The MT-IVM is also shown to be more efficient than sub-sampling on an artificial data-set and more effective than the traditional IVM in a speaker dependent phoneme recognition task.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-learning04.html
http://inverseprobability.com/publications/lawrence-learning04.htmlLawrence:learning04The Informative Vector Machine: A Practical Probabilistic Alternative to the Support Vector MachineWe present a practical probabilistic alternative to the popular support vector machine (SVM). The algorithm is an approximation to a Gaussian process, and is probabilistic in the sense that it maintains the process variance that is implied by the use of a kernel function, which the SVM discards. We show that these variances may be tracked and made use of selection of an active set which gives a sparse representation for the model. For an active set size of $d$ our algorithm exhibits $O(d^{2}N)$ computational complexity and $O(dN)$ storage requirements. It has already been shown that the approach is comptetive with the SVM in terms of performance and running time, here we give more details of the approach and demonstrate that kernel parameters may also be learned in a practical and effective manner.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ivmtech04.html
http://inverseprobability.com/publications/lawrence-ivmtech04.htmlLawrence:ivmTech04Gaussian Process Models for Visualisation of High Dimensional DataIn this paper we introduce a new underlying probabilistic model for
principal component analysis (PCA). Our formulation interprets PCA
as a particular Gaussian process prior on a mapping from a latent
space to the observed data-space. We show that if the prior's
covariance function constrains the mappings to be linear the model
is equivalent to PCA, we then extend the model by considering less
restrictive covariance functions which allow non-linear
mappings. This more general Gaussian process latent variable model
(GPLVM) is then evaluated as an approach to the visualisation of
high dimensional data for three different data-sets. Additionally
our non-linear algorithm can be *further* kernelised leading to
'twin kernel PCA' in which a *mapping between feature spaces*
occurs.
Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gplvm03.html
http://inverseprobability.com/publications/lawrence-gplvm03.htmlLawrence:gplvm03Acoustic Space Dimensionality Selection and Combination using the Maximum Entropy PrincipleIn this paper we propose a discriminative approach to acoustic space dimensionality selection based on maximum entropy modelling. We form a set of constraints by composing the acoustic space with the space of phone classes, and use a continuous feature formulation of maximum entropy modelling to select an optimal feature set. The suggested approach has two steps: (1) the selection of the best acoustic space that efficiently and economically represents the acoustic data and its variability; (2) the combination of selected acoustic features in the maximum entropy framework to estimate the posterior probabilities over the phonetic labels given the acoustic input. Specific contributions of this paper include a parameter estimation algorithm (generalized improved iterative scaling) that enables the use of negative features, the parameterization of constraint functions using Gaussian mixture models, and experimental results using the TIMIT database.Thu, 01 Jan 2004 00:00:00 +0000
http://inverseprobability.com/publications/abdelhaleem-acoustic04.html
http://inverseprobability.com/publications/abdelhaleem-acoustic04.htmlAbdelHaleem:acoustic04A Probabilistic Model for the Extraction of Expression Levels from Oligonucleotide ArraysIn this work we present a probabilistic model to estimate summaries of Affymetrix GeneChip probe level data. Comparisons with two different models were made both on a publicly available dataset and on a study performed in our laboratory, showing that our model performs better for consistency of fold change.Mon, 01 Dec 2003 00:00:00 +0000
http://inverseprobability.com/publications/milo-probabilistic03.html
http://inverseprobability.com/publications/milo-probabilistic03.htmlMilo-probabilistic03Bayesian Processing of Microarray ImagesGene expression measurements quantify the level of mRNA produced from each gene. Two principal methods exist for producing slides for extracting these levels: photolithography and spotted arrays. One difficulty with the spotted array format is determining the size and location of the spots on the array. In this paper we present a Bayesian approach to processing images produced by these arrays that seeks posterior distributions over the size and positions of the spots. This enables us to estimate expression ratios and their variances. Exact inference for the model we specify is intractable; we develop an approximate inference technique which combines importance sampling with variational inference. Our technique has already been shown to be more consistent than both manual processing and another automated technique @Lawrence:variability03. Here we present large-scale results for twenty-four microarray slides each representing 5760 genes and show the dramatic effects of incorporating variance in our downsteam analysis. Software based on this algorithm is available for academic use.Wed, 17 Sep 2003 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-microarray03.html
http://inverseprobability.com/publications/lawrence-microarray03.htmlLawrence:microarray03Generalised Component AnalysisPrincipal component analysis is a well known approach for determining the principal sub-space of a data-set. Independent component analysis is a widely utilised technique for recovering the linearly embedded independent components of a data-set. In this paper we develop an algorithm that, for super-Gaussian sources, extracts the direction and number of independent components of a data-set and determines the principal sub-space of the remaining components. This is achieved through the use of a latent variable model. We refer to the approach as Generalised Component Analysis and demonstrate its ability to both extract independent and principal components, as well as to determine the number of independent components, on toy and real word data-sets.Fri, 23 May 2003 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-gca01.html
http://inverseprobability.com/publications/lawrence-gca01.htmlLawrence:GCA01Variational Inference for Visual TrackingThe likelihood models used in probabilistic visual tracking applications are often complex non-linear and/or non-Gaussian functions, leading to analytically intractable inference. Solutions then require numerical approximation techniques, of which the particle filter is a popular choice. Particle filters, however, degrade in performance as the dimensionality of the state space increases and the support of the likelihood decreases. As an alternative to particle filters this paper introduces a variational approximation to the tracking recursion. The variational inference is intractable in itself, and is combined with an efficient importance sampling procedure to obtain the required estimates. The algorithm is shown to compare favourably with particle filtering techniques on a synthetic example and two real tracking problems. The first involves the tracking of a designated object in a video sequence based on its colour properties, whereas the second involves contour extraction in a single image.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/vermaak-variational03.html
http://inverseprobability.com/publications/vermaak-variational03.htmlVermaak:variational03A Variational Approach to Robust Bayesian InterpolationThis paper details a robust Bayesian interpolation procedure for linear-in-the-parameter models. Robustness is achieved via a Student-$t$ noise model, defined hierarchically in terms of an inverse-Gamma prior distribution over individual Gaussian observation variances. Variational techniques are exploited to update this prior in light of the data, while also inferring all other model variables. The key to this approach is flexibility; it can infer Gaussian noise where appropriate but can adapt to accommodate heavier-tailed distributions in the presence of outliers.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/tipping-variational03.html
http://inverseprobability.com/publications/tipping-variational03.htmlTipping:variational03Fast Forward Selection to Speed Up Sparse Gaussian Process RegressionWe present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection. Our method is essentially as fast as an equivalent one which selects the “support” patterns at random, yet it can outperform random selection on hard curve fitting tasks. More importantly, it leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically. We demonstrate the model selection capabilities of the algorithm in a range of experiments. In line with the development of our method, we present a simple view on sparse approximations for GP models and their underlying assumptions and show relations to other methods.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/seeger-fast03.html
http://inverseprobability.com/publications/seeger-fast03.htmlSeeger:fast03Fast Sparse Gaussian Process Methods: The Informative Vector MachineWe present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretical principles, previously suggested for active learning. In contrast to most previous work on sparse GPs, our goal is not only to learn sparse predictors (which can be evaluated in $O(d)$ rather than $O(n)$, $d<<n$, $n$ the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most $O(nd^2)$, and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet it requires only a fraction of the training time. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.Wed, 01 Jan 2003 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ivm02.html
http://inverseprobability.com/publications/lawrence-ivm02.htmlLawrence:ivm02Variational Inference GuideThis report is a brief introduction to variational inference for Bayesian models from the perspective of the Expectation Maximisation (EM) algorithm @Dempster:EM77. We start with an overview of the EM algorithm from the perspective of variational inference and then we show how approximate inference may also be performed. We discuss briefly when variational inference may be used and finally we mention the variational importance sampler as an alternative approach.Wed, 18 Dec 2002 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-variationalguide02.html
http://inverseprobability.com/publications/lawrence-variationalguide02.htmllawrence:variationalguide02Optimising Synchronisation Times for Mobile DevicesWith the increasing number of users of mobile computing devices (e.g. personal digital assistants) and the advent of third generation mobile phones, wireless communications are becoming increasingly important. Many applications rely on the device maintaining a *replica* of a data-structure which is stored on a server, for example news databases, calendars and e-mail. In this paper we explore the question of the optimal strategy for synchronising such replicas. We utilise probabilistic models to represent how the data-structures evolve and to model user behaviour. We then formulate objective functions which can be minimised with respect to the synchronisation timings. We demonstrate, using two real world data-sets, that a user can obtain more up-to-date information using our approach.Tue, 01 Jan 2002 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-sync01.html
http://inverseprobability.com/publications/lawrence-sync01.htmlLawrence:sync01Sparse Bayesian Learning: The Informative Vector MachineTue, 01 Jan 2002 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-sparse02.html
http://inverseprobability.com/publications/lawrence-sparse02.htmllawrence:sparse02A Comparison of State-of-the-Art Classification Techniques with Application to CytogeneticsSeveral state-of-the-art techniques: a neural network, Bayesian neural network, support vector machine and naive Bayesian classifier are experimentally evaluated in discriminating fluorescence in-situ hybridization (FISH) signals. Highly accurate classification of signals from real data and artefacts of two cytogenetic probes (colours) is required for detecting abnormalities in the data. More than 3,100 FISH signals are classified by the techniques into colour and as real or artefact with accuracies of around 98% and 88%, respectively. The results of the comparison also show a trade-off between simplicity represented by the naive Bayesian classifier and high classification performance represented by the other techniques.Sun, 01 Apr 2001 00:00:00 +0000
http://inverseprobability.com/publications/lerner-comparison01.html
http://inverseprobability.com/publications/lerner-comparison01.htmlLerner-comparison01Probabilistic Modelling of Replica DivergenceIt is common in distributed systems to replicate data. In many cases this data evolves in a consistent fashion and this evolution can be modelled. A *probabilistic model* of the evolution allows us to estimate the divergence of the replicas and can be used by the application to alter its behaviour, for example to control synchronisation times, to determine the propagation of writes, and to convey to the user information about how much the data may have evolved. In this paper, we describe how the evolution of the data may be modelled and outline how the probabilistic model may be utilised in various applications, concentrating on a news database example.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/rowstron-sync01.html
http://inverseprobability.com/publications/rowstron-sync01.htmlRowstron:sync01The Structure of Neural Network PosteriorsExact inference in Bayesian neural networks is non analytic to compute and as a result approximate approaches such as the evidence procedure, Monte-Carlo sampling and variational inference have been proposed. In this paper we explore the structure of the posterior distributions in such a model through a new approximating distribution based on *mixtures* of Gaussian distributions and show how it may be implemented.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-structure01.html
http://inverseprobability.com/publications/lawrence-structure01.htmlLawrence:structure01Node Relevance DeterminationHierarchical Bayesian inference in parameterised models offers an approach for controlling complexity. In this paper we utilise a novel prior for the leaning of a model’s structure. We call the prior *node relevance determination*. It is applicable in a range of models including sigmoid belief networks and Boltzmann machines. We demonstrate how the approach may be applied to determine structure in a multi-layer perceptron.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nrd01.html
http://inverseprobability.com/publications/lawrence-nrd01.htmlLawrence:nrd01Estimating a Kernel Fisher Discriminant in the Presence of Label NoiseData noise is present in many machine learning problems domains, some of these are well studied but others have received less attention. In this paper we propose an algorithm for constructing a kernel Fisher discriminant (KFD) from training examples with *noisy labels*. The approach allows to associate with each example a probability of the label being flipped. We utilise an expectation maximization (EM) algorithm for updating the probabilities. The E-step uses class conditional probabilities estimated as a by-product of the KFD algorithm. The M-step updates the flip probabilities and determines the parameters of the discriminant. We have applied the approach to two real-world data-sets. The results show the feasibility of the approach.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-noisy01.html
http://inverseprobability.com/publications/lawrence-noisy01.htmlLawrence:noisy01Variational Learning for Multi-layer networks of Linear Threshold UnitsLinear threshold units were originally proposed as models of biological neurons. They were widely studied in the context of the perceptron @Rosenblatt:book62. Due to the difficulties of finding a general algorithm for networks with hidden nodes, they never passed into general use. We derive an algorithm in the context of graphical models and show how it may be applied in multi-layer networks of linear threshold units. We demonstrate the algorithm through three well known datasets.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ltu01.html
http://inverseprobability.com/publications/lawrence-ltu01.htmlLawrence:ltu01A Sparse <span>B</span>ayesian Compression Scheme — The Informative Vector MachineKernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.Mon, 01 Jan 2001 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-informative01.html
http://inverseprobability.com/publications/lawrence-informative01.htmlLawrence:informative01Variational Learning for Multi-layer networks of Linear Threshold UnitsLinear threshold units were originally proposed as models of biological neurons. They were widely studied in the context of the perceptron @Rosenblatt:book62. Due to the difficulties of finding a general algorithm for networks with hidden nodes, they never passed into general use. We derive an algorithm in the context of graphical models and show how it may be applied in multi-layer networks of linear threshold units. We demonstrate the algorithm through three well known datasets.Sat, 05 Feb 2000 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ltu_report00.html
http://inverseprobability.com/publications/lawrence-ltu_report00.htmlLawrence:ltu_report00Variational Bayesian Independent Component AnalysisBlind separation of signals through the info-max algorithm may be viewed as maximum likelihood learning in a latent variable model. In this paper we present an alternative approach to maximum likelihood learning in these models, namely Bayesian inference. It has already been shown how Bayesian inference can be applied to determine latent dimensionality in principal component analysis models @Bishop:bayesPCA98. In this paper we derive a similar approach for removing unecessary source dimensions in an independent component analysis model. We present results on a toy data-set and on some artificially mixed images.Sat, 01 Jan 2000 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-ica99.html
http://inverseprobability.com/publications/lawrence-ica99.htmlLawrence:ICA99A Variational Bayesian Committee of Neural NetworksExact inference in Bayesian neural networks is non analytic to compute and
as a result approximate approaches such as the evidence procedure, Monte-Carlo sampling
and variational inference have been proposed. In this paper we present a general
overview of the Bayesian approach with a particular emphasis on the variational
procedure. We then present a new approximating distribution based on *mixtures*
of Gaussian distributions and show how it may be implemented. We present results
on a simple toy problem and on two real world data-sets.
Fri, 17 Sep 1999 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-nnmixtures99.html
http://inverseprobability.com/publications/lawrence-nnmixtures99.htmlLawrence:nnmixtures99Mixture Representations for Inference and Learning in Boltzmann MachinesBoltzmann machines are undirected graphical models with two-state
stochastic variables, in which the logarithms of the clique
potentials are quadratic functions of the node states. They have
been widely studied in the neural computing literature, although
their practical applicability has been limited by the difficulty of
finding an effective learning algorithm. One well-established
approach, known as mean field theory, represents the stochastic
distribution using a factorized approximation. However, the
corresponding learning algorithm often fails to find a good
solution. We conjecture that this is due to the implicit
uni-modality of the mean field approximation which is therefore
unable to capture multi-modality in the true distribution. In this
paper we use variational methods to approximate the stochastic
distribution using multi-modal *mixtures* of factorized
distributions. We present results for both inference and learning to
demonstrate the effectiveness of this approach.
Thu, 01 Jan 1998 00:00:00 +0000
http://inverseprobability.com/publications/lawrence-mixtures98.html
http://inverseprobability.com/publications/lawrence-mixtures98.htmlLawrence:mixtures98Markovian inference in belief networksBayesian belief networks can represent the complicated probabilistic
processes that form natural sensory inputs. Once the parameters of
the network have been learned,nonlinear inferences about the input
can be made by computing the posterior distribution over the hidden
units (e.g., depth in stereo vision) given the input. Computing the
posterior distribution exactly is not practical in richly-connected
networks, but it turns out that by using a variational (a.k.a., mean
field) method, it is easy to find a product-form distribution that
approximates the true posterior distribution. This approximation
assumes that the hidden variables are independent given the current
input. In this paper, we explore a more powerful variational
technique that models the posterior distribution using a Markov
chain. We compare this method with inference using mean fields and
mixtures of mean fields in randomly generated networks.
Thu, 01 Jan 1998 00:00:00 +0000
http://inverseprobability.com/publications/frey-markovian98.html
http://inverseprobability.com/publications/frey-markovian98.htmlFrey:Markovian98Approximating Posterior Distributions in Belief Networks using MixturesExact inference in densely connected Bayesian networks is
computationally intractable, and so there is considerable interest
in developing effective approximation schemes. One approach which
has been adopted is to bound the log likelihood using a mean-field
approximating distribution. While this leads to a tractable
algorithm, the mean field distribution is assumed to be factorial
and hence unimodal. In this paper we demonstrate the feasibility of
using a richer class of approximating distributions based on
*mixtures* of mean field distributions. We derive an efficient
algorithm for updating the mixture parameters and apply it to the
problem of learning in sigmoid belief networks. Our results
demonstrate a systematic improvement over simple mean field theory
as the number of mixture components is increased.
Thu, 01 Jan 1998 00:00:00 +0000
http://inverseprobability.com/publications/bishop-mixtures97.html
http://inverseprobability.com/publications/bishop-mixtures97.htmlBishop:mixtures97