## pods

In Sheffield we created a suite of software tools for ‘Open Data Science.’ Open data science is an approach to sharing code, models and data that should make it easier for companies, health professionals and scientists to gain access to data science techniques.

You can also check this blog post on Open Data Science.

The software can be installed using

%pip install --upgrade git+https://github.com/lawrennd/ods

from the command prompt where you can access your python installation.

The code is also available on github: https://github.com/lawrennd/ods

Once pods is installed, it can be imported in the usual manner.

import pods

## mlai

The mlai software is a suite of helper functions for teaching and demonstrating machine learning algorithms. It was first used in the Machine Learning and Adaptive Intelligence course in Sheffield in 2013.

The software can be installed using

%pip install --upgrade git+https://github.com/lawrennd/mlai.git

from the command prompt where you can access your python installation.

The code is also available on github: https://github.com/lawrennd/mlai

Once mlai is installed, it can be imported in the usual manner.

import mlai
%pip install gpy

## GPy: A Gaussian Process Framework in Python

Gaussian processes are a flexible tool for non-parametric analysis with uncertainty. The GPy software was started in Sheffield to provide a easy to use interface to GPs. One which allowed the user to focus on the modelling rather than the mathematics.

GPy is a BSD licensed software code base for implementing Gaussian process models in python. This allows GPs to be combined with a wide variety of software libraries.

The software itself is available on GitHub and the team welcomes contributions.

The aim for GPy is to be a probabilistic-style programming language, i.e. you specify the model rather than the algorithm. As well as a large range of covariance functions the software allows for non-Gaussian likelihoods, multivariate outputs, dimensionality reduction and approximations for larger data sets.

The documentation for GPy can be found here.

## OSU Motion Capture Data: Run 1

Motion capture data the Open Motion Data Project by The Ohio State University Advanced Computing Center for the Arts and Design, http://accad.osu.edu/research/mocap/mocap_data.htm.

import pods

You can download any subject and motion from the data set. Here we will download motion 01 from subject 35.

data = pods.datasets.osu_run1()

The data dictionary contains the keys ‘Y’ and ‘connect,’ which represent the data and connections that can be used to create the skeleton.

data['Y'].shape

The data has often been used in talks demonstrating GP-LVM models and comparing variants such as back constrained and temporal models.

print(data['citation'])

And extra information about the data is included, as standard, under the keys under details.

print(data['details'])

## OSU Run 1 Motion Capture Data with Bayesian GP-LVM

import GPy
from GPy.models import BayesianGPLVM
import numpy as np
q = 6
kernel = GPy.kern.RBF(q, lengthscale=np.repeat(.5, q), ARD=True)
model = BayesianGPLVM(data['Y'], q,
init="PCA",
num_inducing=20, kernel=kernel)

model.data = data

Variational methods decompose the lower bound on the log likelihood (or ELBO) into a term which represents the expectation of the log likelihood under the approximation posterior and a KL divergence between the prior and the approximate posterior. A common local minimum is to ignore the log likelihood and set the approximate posterior equal to the prior. To avoid this we initialise with low Gaussian noise, which emphasises the expectation of the log likelihood under the posterior. Here it is set to a noise variance of 0.001.

model.likelihood.variance = 0.001
model.optimize('bfgs', messages=True, max_iters=5e3, bfgs_factor=10)