at St George’s Church Lecture Theatre, University of Sheffield on Sep 6, 2012 [pdf][reveal]
Neil D. Lawrence, Department of Computer Science, University of Sheffield

#### Abstract

What is Machine Learning? Why is it useful for us? Machine learning algorithms are the engines that are driving forward an intelligent internet. They are allowing us to uncover the causes of cancer and helping us understand the way the universe is put together. They are suggesting who your friends are on facebook, enabling driverless cars and causing flagging potentially fraudulent transactions on your credit card. To put it simply, machine learning is about understanding data. In this lecture I will try and give a sense of the challenges we face in machine learning, with a particular focus on those that have inspired my research. We will look at applications of data modelling from the early 18th century to the present, and see how they relate to modern machine learning. There will be a particular focus on dealing with uncertainty: something humans are good at, but an area where computers have typically struggled. We will emphasize the role of uncertainty in data modelling and hope to persuade the audience that correct handling of uncertainty may be one of the keys to intelligent systems.

## Discovery of Ceres

On New Year’s Eve in 1800, Giuseppe Piazzi, an Italian Priest, born in Lombardy, but installed in a new Observatory at the University viewed a faint smudge through his telescope.

Piazzi was building a star catalogue.

Unbeknownst to him, Piazzi was also participating in an international search. One that he’d been volunteered for by the Hungarian astronomer Franz von Zach. But without even knowing that he’d joined the search party, Piazzi had discovered their target a new planet.

The planet’s location was a prediction. It was a missing planet, other planets had been found through a formula, a law, that represented their distance from the sun:
a = 0.4 + 0.3 × 2m
where m =  − ∞, 0, 1, 2, ….

import numpy as np
m = np.asarray([-np.inf, 0, 1, 2, 3, 4, 5, 6])
index = np.asarray(range(len(m)))
planets = ['Mercury', 'Venus', 'Earth', 'Mars', '*', 'Jupiter', 'Saturn', 'Uranus']
a = 0.5 + 0.3*2**m

When this law was published it fitted all known planets: Mercury, Venus, Earth, Mars, Jupiter and Saturn. Although there was a gap between the fourth and fifth planets (between Mars and Jupiter). In 1781 William Herschel discovered Uranus. It was located in the position predicted by the formula. One of the originators of the formula, Johann Elert Bode urged astronomers to search for the missing planet, to be situated between Mars and Jupiter. Franz Xaver von Zach formed the United Astronomical Society, also known as the Celestial Police. But before this celestial police managed to start their search, Piazzi, without even knowing he was a member completed the search. Piazzi first observed the new planet in the early hours of January 1st 1801. He continued to observe it over the next .. days. Initially he thought it may be a comet, but as he watched it he became convinced he’d found a planet. The international search was over before it started.

Unfortunately, there was a problem. Once he’d found the planet, Piazzi promptly lost it. Piazzi was keen not just to discover the planet, but to to be known as the determiner of its orbit. He took observations across the months of January and February, working to find the orbit. Unfortunately, he was unable to pin it down. He became ill, and by the time the dat awas revealed to the wider community through von Zach’s journal, Monatlicher Correspondenz, the new planet had been lost behind the sun.

%pip install --upgrade git+https://github.com/sods/ods
import urllib.request
urllib.request.urlretrieve('http://server3.sky-map.org/imgcut?survey=DSS2&img_id=all&angle=4&ra=3.5&de=17.25&width=1600&height=1600&projection=tan&interpolation=bicubic&jpeg_quality=0.8&output_type=png,ceresSkyBackground.png','ceres-sky-background.png')
import pods
data = pods.datasets.ceres()
right_ascension = data['data']['Gerade Aufstig in Zeit']
declination = data['data']['Nordlich Abweich']

Piazzi was able to attempt to predict the orbit because of Kepler’s laws of planetary motion. Johannes Kepler had outlined the way in which planets move according to elliptical shapes, and comets move according to parabolic shapes.

Later Isaac Newton was able to describe the underlying laws of motion that underpinned Kepler’s laws. This was the enlightenment. An age of science and reason driven by reductionist approaches to the natural world. The enlightement scientists were able to read and understand each others’ work through the invention of the printing press. Kepler died in 1630, 12 years before Newton was born in 1642. But Kepler’s ideas were able to influence Newton and his peers, and the understanding of gravity became an objective of the nascent Royal Society.

The sharing of information in printed form had evolved by the time of Piazzi, and the collected discoveries of the astronimic world were being shared in Franz von Zach’s monthly journal. It was here that Piazzi’s observations were eventually published, some 7 months after the planet was lost.

It was also here that a young German mathematician read about the international hunt for the lost planet. Carl Friedrich Gauss was a 23 year old mathematician working from Goetingen. He combined Kepler’s laws with Piazzi’s data to make predictions about where the planet would be found. In doing so, he also developed the method of least squares, and incredibly was able to fit the relatively complex model to the data with a high enough degree of accuracy that astronomers were able to look to the skys to try to recover the planet.

Almost exactly one year after it was lost, Ceres was recovered by Franz von Zach. Gauss had combined model with data to make a prediction and in doing so a new planet was discovered (Gauss, n.d., @Gauss:astronomische02).

It is this vital combination of model and data that underpins machine learning, but notice that here it has also been delivered through a mechanistic understanding of the way the planets move. This understanding is derived from natural laws that are explicitly incorporated into the model. Kepler’s laws derive from Newton’s mathematical representation of gravity.

But there was a problem. The laws down’t precisely fit the data.

Unfortunately, the story doesn’t end so well for Bode’s law. In 1846 Neptune was discovered, not in the place predicted by Bodes law (it should be closer to where Pluto was eventually found). And Ceres was found to be merely the larges object in the asteroid belt. It was recategorised as a Dwarf planet.

## Overdetermined System

The challenge with a linear model is that it has two unknowns, m, and c. Observing data allows us to write down a system of simultaneous linear equations. So, for example if we observe two data points, the first with the input value, $\inputScalar_1 = 1$ and the output value, $\dataScalar_1 =3$ and a second data point, $\inputScalar = 3$, $\dataScalar=1$, then we can write two simultaneous linear equations of the form.

point 1: $\inputScalar = 1$, $\dataScalar=3$
3 = m + c
point 2: $\inputScalar = 3$, $\dataScalar=1$
1 = 3m + c

The solution to these two simultaneous equations can be represented graphically as

The challenge comes when a third data point is observed and it doesn’t naturally fit on the straight line.

point 3: $\inputScalar = 2$, $\dataScalar=2.5$
2.5 = 2m + c

Now there are three candidate lines, each consistent with our data.

This is known as an overdetermined system because there are more data than we need to determine our parameters. The problem arises because the model is a simplification of the real world, and the data we observe is therefore inconsistent with our model.

The solution was proposed by Pierre-Simon Laplace. His idea was to accept that the model was an incomplete representation of the real world, and the manner in which it was incomplete is unknown. His idea was that such unknowns could be dealt with through probability.

### Pierre-Simon Laplace

Famously, Laplace considered the idea of a deterministic Universe, one in which the model is known, or as the below translation refers to it, “an intelligence which could comprehend all the forces by which nature is animated”. He speculates on an “intelligence” that can submit this vast data to analysis and propsoses that such an entity would be able to predict the future.

Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective situation of the beings who compose it—an intelligence sufficiently vast to submit these data to analysis—it would embrace in the same formulate the movements of the greatest bodies of the universe and those of the lightest atom; for it, nothing would be uncertain and the future, as the past, would be present in its eyes.

This notion is known as Laplace’s demon or Laplace’s superman.

Unfortunately, most analyses of his ideas stop at that point, whereas his real point is that such a notion is unreachable. Not so much superman as strawman. Just three pages later in the “Philosophical Essay on Probabilities” (Laplace 1814), Laplace goes on to observe:

The curve described by a simple molecule of air or vapor is regulated in a manner just as certain as the planetary orbits; the only difference between them is that which comes from our ignorance.

Probability is relative, in part to this ignorance, in part to our knowledge.

In other words, we can never make use of the idealistic deterministic Universe due to our ignorance about the world, Laplace’s suggestion, and focus in this essay is that we turn to probability to deal with this uncertainty. This is also our inspiration for using probability in machine learning.

The “forces by which nature is animated” is our model, the “situation of beings that compose it” is our data and the “intelligence sufficiently vast enough to submit these data to analysis” is our compute. The fly in the ointment is our ignorance about these aspects. And probability is the tool we use to incorporate this ignorance leading to uncertainty or doubt in our predictions.

Laplace’s concept was that the reason that the data doesn’t match up to the model is because of unconsidered factors, and that these might be well represented through probability densities. He tackles the challenge of the unknown factors by adding a variable, $\noiseScalar$, that represents the unknown. In modern parlance we would call this a latent variable. But in the context Laplace uses it, the variable is so common that it has other names such as a “slack” variable or the noise in the system.

point 1: $\inputScalar = 1$, $\dataScalar=3$
$$3 = m + c + \noiseScalar_1$$
point 2: $\inputScalar = 3$, $\dataScalar=1$
$$1 = 3m + c + \noiseScalar_2$$
point 3: $\inputScalar = 2$, $\dataScalar=2.5$
$$2.5 = 2m + c + \noiseScalar_3$$

Laplace’s trick has converted the overdetermined system into an underdetermined system. He has now added three variables, $\{\noiseScalar_i\}_{i=1}^3$, which represent the unknown corruptions of the real world. Laplace’s idea is that we should represent that unknown corruption with a probability distribution.

## A Probabilistic Process

However, it was left to an admirer of Gauss to develop a practical probability density for that purpose. It was Carl Friederich Gauss who suggested that the Gaussian density (which at the time was unnamed!) should be used to represent this error.

The result is a noisy function, a function which has a deterministic part, and a stochastic part. This type of function is sometimes known as a probabilistic or stochastic process, to distinguish it from a deterministic process.

## Hydrodynamica

When Laplace spoke of the curve of a simple molecule of air, he may well have been thinking of Daniel Bernoulli (1700-1782). Daniel Bernoulli was one name in a prodigious family. His father and brother were both mathematicians. Daniel’s main work was known as “Hydrodynamica”.

Daniel Bernoulli described a kinetic theory of gases, but it wasn’t until 170 years later when these ideas were verified after Einstein had proposed a model of Brownian motion which was experimentally verified by Jean Baptiste Perrin.

import numpy as np
p = np.random.randn(10000, 1)
xlim = [-4, 4]
x = np.linspace(xlim[0], xlim[1], 200)
y = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x*x)

Another important figure for Cambridge was the first to derive the probability distribution that results from small balls banging together in this manner. In doing so, James Clerk Maxwell founded the field of statistical physics.

Many of the ideas of early statistical physicists were rejected by a cadre of physicists who didn’t believe in the notion of a molecule. The stress of trying to have his ideas established caused Botlzamann to commit suicide in 1906, only two years before the same ideas became widely accepted.

The important point about the uncertainty being represented here is that it is not genuine stochasticity, it is a lack of knowledge about the system. The techniques proposed by Maxwell, Botlzmann and Gibbs allow us to exactly represent the state of the system through a set of parameters that represent the sufficient statistics of the physical system. We know these values as the volume, temperature and pressure. The challenge for us, when approximating the physical world with the techniques we will use is that we will have to sit somewhere between the deterministic and purely stochastic worlds that these different scientists described.

One ongoing characteristic of people who study probability and uncertainty is the confidence with which they hold opinions about it. Another leader of the Cavendish laboratory expressed his support of the second law of thermodynamics (which can be proven through the work of Gibbs/Boltzmann with ean emphatic statement at the beginning of his book.

The same Eddington is also unfortunately famous for dismissing the ideas of a young Chandrasekhar who had come to Cambridge to study in the Cavendish lab. Chandrasekhar demonstrated the limit at which a star would coallapse under its own weight to a singularity, but when he presented the work to Eddington, he was dismissive suggesting that there “must be some natural law that prevents this abomination from happening”.

Presumably he meant that the creation of a black hole seemed to transgress the second law of thermodynamics, although later Hawking was able to show that blackholes do evaporate, only the time scales at which this evaporation occurs is many orders of magnitude slower than other processes in the universe.

trim=0cm 20cm 0cm 20cm, clip]

[trim=0cm 25cm 0cm 15cm, clip]

## Data Fit Term

[angle=90]


importTool('GPmat')
gpmatToolboxes
randn('seed', 1e4);
rand('seed', 1e4);
lineWidth = 4;
textWidth = 13;
colWidth = 5;
markerSize = 2;
markerWidth = 4;
markerType = 'kx';

redColor = [1, 0, 0];
greenColor = [0, 1, 0];
blueColor = [0, 0, 1];
magentaColor = [1, 0, 1];
blackColor = [0, 0, 0];
fillColor = [0.7 0.7 0.7];
negative = false;
if blackBackground
negative = true;
redColor = 1-redColor;
greenColor = 1-greenColor;
blueColor =  1-blueColor;
magentaColor = 1-magentaColor;
blackColor = 1- blackColor;
fillColor = 1-fillColor;
end
directory = '../slides/diagrams/sysbio/';

ODE1 ARTIFICIAL EXAMPLE



predt = [linspace(0, 18, 100)]';
figure(1), clf
order = [4 3 2];
lin1 = [];
colors = {blueColor, greenColor, redColor};
hold on;
for i = 1:length(order)
lin1 = [lin1; plot(t, truey(:, order(i)-1), 'color', colors{i})];
end
set(gca, 'ylim', [0 8]);
xlim = get(gca, 'xlim');
ylim = get(gca, 'ylim')
line([xlim(1) xlim(1)], ylim, 'color', blackColor)
line(xlim, [ylim(1) ylim(1)], 'color', blackColor)

hold on
lin2 = [];
counter = 0;
presentOrder = 0;
for i = order
presentOrder = presentOrder + 1;
for j = 1:length(model.timesCell{i})
counter = counter + 1;
indices(counter, 1) = i;
indices(counter, 2) = j;
indices(counter, 3) = presentOrder;
end
end
ind = [1];
tvals = cell(length(model.timesCell));
includeText = [];
for i = 1:size(indices, 1)
offset = 0;
for j = 1:indices(i, 1)-1
offset = offset + length(model.timesCell{j});
end
ind = [ind indices(i, 2)+offset];
tvals{indices(i, 1)} = [tvals{indices(i, 1)}; model.timesCell{indices(i, 1)}(indices(i, ...
2))];

proteinKern = model.kern.comp{1};
K = rbfKernCompute(proteinKern, 0, predt);
counter = 0;
for j=order
counter = counter + 1;
if ~isempty(tvals{j})
K = [K; real(simXrbfKernCompute(model.kern.comp{j}, proteinKern, ...
tvals{j}, predt))];
end
end
invK = pdinv(model.K(ind, ind));
obsY = model.m(ind, 1);
predF = K'*invK*obsY;
varF = kernDiagCompute(proteinKern, predt) - sum(K.*(invK*K), 1)';

figure(1)
lin2 = [ plot(model.timesCell{indices(i, 1)}(indices(i, 2)), ...
[repmat(NaN, 1, indices(i, 3)-1) model.y(ind(end) - 1)], 'x', ...
'color', colors{indices(i, 3)})];
set(lin1, 'lineWidth', 2);
set(lin1, 'markersize', 10);
set(lin2, 'lineWidth', 4);
set(lin2, 'markersize', 10);
%set(gca, 'fontname', 'arial', 'fontsize', 24, 'xlim', xlim, 'ylim', [0 8])
fileName = ['toyGeneData' num2str(i)];
printLatexPlot(fileName, directory, 0.4*textWidth);
includeText = [includeText '###\n\n']
includeText = [includeText '<img class="" src="' directory fileName '" width="45%" height="auto" align="left" style="background:none; border:none; box-shadow:none; display:block; margin-left:auto; margin-right:auto;vertical-align:middle">'];

figure(2), clf
hold on
stdVals = sqrt(varF);
fillColor = [0.7 0.7 0.7];
patch([predt; predt(end:-1:1)], ...
[predF; predF(end:-1:1)] ...
+ 2*[stdVals; -stdVals(end:-1:1)], ...
fillColor,'edgecolor',fillColor)
% fill(, ...
%      fillColor,'EdgeColor',fillColor)
lin = plot(t, truef, '-', 'color', blueColor);
lin = [lin plot(predt, predF, '-', 'color', blackColor)];
set(lin, 'lineWidth', 4);
set(lin, 'markersize', 10);
%set(gca, 'fontname', 'arial', 'fontsize', 24, 'xlim', xlim)
ylim = [-2 4];
xlim = get(gca, 'xlim');
set(gca, 'ylim', ylim)
line([xlim(1) xlim(1)], ylim, 'color', blackColor)
line(xlim, [ylim(1) ylim(1)], 'color', blackColor)
fileName = ['groundTruthTFData' num2str(i)];
printLatexPlot(fileName, directory, 0.4*textWidth);
includeText = [includeText '<img class="" src="' directory fileName '" width="45%" height="auto" align="right" style="background:none; border:none; box-shadow:none; display:block; margin-left:auto; margin-right:auto;vertical-align:middle">'];
end
printLatexText(includeText, 'infer-tf-from-gene-text.md', directory)


include{../slides/diagrams/sysbio/infer-tf-from-gene-text.md}

Eddington, Arthur Stanley. 1929. The Nature of the Physical World. Dent (London). https://doi.org/10.2307/2180099.

Gauss, Carl Friederich. 1802. “Astronomische Untersuchungen Und Rechnungen Vornehmlich über Die Ceres Ferdinandea.”

———. n.d. “Fortgesetzte Nachrichten über Den Längst Vermutheten Neuen Haupt-Planeten Unseres Sonnen-Systems.” In, 638–49.

Goodsell, David S. 1999. “The Molecular Perspective: P53 Tumor Suppressor.” The Oncologist, Vol. 4, No. 2, 138-139, April 1999 4 (2): 138–39.

Laplace, Pierre Simon. 1814. Essai Philosophique Sur Les Probabilités. 2nd ed. Paris: Courcier.

Mikhailov, G. K. n.d. “Daniel Bernoulli, Hydrodynamica (1738).” In.

Piazzi, Giuseppe. n.d. “Fortgesetzte Nachrichten über Den Längst Vermutheten Neuen Haupt-Planeten Unseres Sonnen-Systems.” In, 279–83.