Setup

pods

In Sheffield we created a suite of software tools for ‘Open Data Science.’ Open data science is an approach to sharing code, models and data that should make it easier for companies, health professionals and scientists to gain access to data science techniques.

You can also check this blog post on Open Data Science.

The software can be installed using

%pip install --upgrade git+https://github.com/lawrennd/ods

from the command prompt where you can access your python installation.

The code is also available on github: https://github.com/lawrennd/ods

Once pods is installed, it can be imported in the usual manner.

import pods

mlai

[edit]

The mlai software is a suite of helper functions for teaching and demonstrating machine learning algorithms. It was first used in the Machine Learning and Adaptive Intelligence course in Sheffield in 2013.

The software can be installed using

%pip install --upgrade git+https://github.com/lawrennd/mlai.git

from the command prompt where you can access your python installation.

The code is also available on github: https://github.com/lawrennd/mlai

Once mlai is installed, it can be imported in the usual manner.

import mlai

%pip install gpy

GPy: A Gaussian Process Framework in Python

[edit]

Gaussian processes are a flexible tool for non-parametric analysis with uncertainty. The GPy software was started in Sheffield to provide a easy to use interface to GPs. One which allowed the user to focus on the modelling rather than the mathematics.

Figure: GPy is a BSD licensed software code base for implementing Gaussian process models in Python. It is designed for teaching and modelling. We welcome contributions which can be made through the Github repository https://github.com/SheffieldML/GPy

GPy is a BSD licensed software code base for implementing Gaussian process models in python. This allows GPs to be combined with a wide variety of software libraries.

The software itself is available on GitHub and the team welcomes contributions.

The aim for GPy is to be a probabilistic-style programming language, i.e. you specify the model rather than the algorithm. As well as a large range of covariance functions the software allows for non-Gaussian likelihoods, multivariate outputs, dimensionality reduction and approximations for larger data sets.

The documentation for GPy can be found here.

Robot Wireless Data

[edit]

The robot wireless data is taken from an experiment run by Brian Ferris at University of Washington. It consists of the measurements of WiFi access point signal strengths as Brian walked in a loop. It was published at IJCAI in 2007 (Ferris et al., 2007).

import pods
import numpy as np

data=pods.datasets.robot_wireless()

The ground truth is recorded in the data, the actual loop is given in the plot below.

Robot Wireless Ground Truth

Figure: Ground truth movement for the position taken while recording the multivariate time-course of wireless access point signal strengths.

We will ignore this ground truth in making our predictions, but see if the model can recover something similar in one of the latent layers.

Robot WiFi Data

One challenge with the data is that the signal strength ‘drops out.’ This is because the device only tracks a limited number of wifi access points, when one of the access points falls outside the track, the value disappears (in the plot below it reads -0.5). The data is missing, but it is not missing at random because the implication is that the wireless access point must be weak to have dropped from the list of those that are tracked.

Figure: Output dimension 1 from the robot wireless data. This plot shows signal strength changing over time.

Bayesian GP-LVM fit to the Robot Wireless Data

[edit]

Set up a Bayesian GP-LVM with four latent dimensions.

import GPy

model = GPy.models.BayesianGPLVM(data['Y'], 4, num_inducing=25)

Optimize the model.

model.optimize(messages=True, max_f_eval=10000)

Figure: Visualisation of the latent space of the Bayesian GP-LVM model applied to the robot wireless data.

References

Ferris, B.D., Fox, D., Lawrence, N.D., 2007. WiFi-SLAM using Gaussian process latent variable models, in: Veloso, M.M. (Ed.), Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007). pp. 2480–2485.