Outlook for UK AI and Machine Learning

Neil D. Lawrence

2018-05-11

HM Treasury

The Gartner Hype Cycle

Gartner Hype Cycle

What is Machine Learning?

\[ \text{data} + \text{model} \xrightarrow{\text{compute}} \text{prediction}\]

data : observations, could be actively or passively acquired (meta-data).

model : assumptions, based on previous experience (other data! transfer learning etc), or beliefs about the regularities of the universe. Inductive bias.

prediction : an action to be taken or a categorization or a quality score.

Royal Society Report: Machine Learning: Power and Promise of Computers that Learn by Example

What is Machine Learning?

\[\text{data} + \text{model} \xrightarrow{\text{compute}} \text{prediction}\]

To combine data with a model need:
a prediction function \(\mappingFunction (\cdot)\) includes our beliefs about the regularities of the universe
an objective function \(\errorFunction (\cdot)\) defines the cost of misprediction.

Embodiment Factors


compute	\[\approx 100 \text{ gigaflops}\]	\[\approx 16 \text{ petaflops}\]
communicate	\[1 \text{ gigbit/s}\]	\[100 \text{ bit/s}\]
(compute/communicate)	\[10^{4}\]	\[10^{14}\]

See “Living Together: Mind and Machine Intelligence” Lawrence (2017)

Evolved Relationship

Societal Effects

This phenomenon has already revolutionised biology.
- Large scale data acquisition and distribution.
- Transcriptomics, genomics, epigenomics, ‘rich phenomics’.
Great promise for personalized health.

Societal Effects

Automated decision making within the computer based only on the data.
Subjective biases need to be better understood.
Particularly important where treatments are being prescribed.
- Interventions could be far more subtle.

Societal Effects

Shift in dynamic:
- from direct human-data to indirect human-computer-data
- modern data analysis is mediated by the machine
This change of dynamics gives us the modern and emerging domain of data science

Human Communication

For sale: baby shoes, never worn

Heider and Simmel (1944)

There are three types of lies: lies, damned lies and statistics

??

There are three types of lies: lies, damned lies and statistics

Benjamin Disraeli

There are three types of lies: lies, damned lies and statistics

Benjamin Disraeli 1804-1881

There are three types of lies: lies, damned lies and ‘big data’

Neil Lawrence 1972-?

Mathematical Statistics

‘Mathematical Data Science’

Machine Learning

Driver of two different domains:
1. Data Science: arises from the fact that we now capture data by happenstance.
2. Artificial Intelligence: emulation of human behaviour.
Connection: Internet of Things

Machine Learning

Driver of two different domains:
1. Data Science: arises from the fact that we now capture data by happenstance.
2. Artificial Intelligence: emulation of human behaviour.
Connection: Internet of ~~Things~~

Machine Learning

Driver of two different domains:
1. Data Science: arises from the fact that we now capture data by happenstance.
2. Artificial Intelligence: emulation of human behaviour.
Connection: Internet of People

Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data (1981/1/28)

What does Machine Learning do?

ML Automates through Data
- Strongly related to statistics.
- Field underpins revolution in data science and AI
With AI:
- logic, robotics, computer vision, speech
With Data Science:
- databases, data mining, statistics, visualization

What does Machine Learning do?

Automation scales by codifying processes and automating them.
Need:
- Interconnected components
- Compatible components
Early examples:
- cf Colt 45, Ford Model T

Codify Through Mathematical Functions

How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression

\[ \text{odds} = \frac{p(\text{bought})}{p(\text{not bought})} \]

\[ \log \text{odds} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}.\]

Codify Through Mathematical Functions

How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression

\[ p(\text{bought}) = \sigmoid{\beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}}.\]

Codify Through Mathematical Functions

How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression

\[ p(\text{bought}) = \sigmoid{\boldsymbol{\beta}^\top \inputVector}.\]

Codify Through Mathematical Functions

How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression

\[ \dataScalar = \mappingFunction\left(\inputVector, \boldsymbol{\beta}\right).\]

We call \(\mappingFunction(\cdot)\) the prediction function.

Fit to Data

Use an objective function

\[\errorFunction(\boldsymbol{\beta}, \dataMatrix, \inputMatrix)\]

E.g. least squares \[\errorFunction(\boldsymbol{\beta}, \dataMatrix, \inputMatrix) = \sum_{i=1}^\numData \left(\dataScalar_i - \mappingFunction(\inputVector_i, \boldsymbol{\beta})\right)^2.\]

Two Components

Prediction function, \(\mappingFunction(\cdot)\)
Objective function, \(\errorFunction(\cdot)\)

Deep Learning

These are interpretable models: vital for disease modeling etc.
Modern machine learning methods are less interpretable
Example: face recognition

DeepFace

Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Color illustrates feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected.

Source: DeepFace (Taigman et al., 2014)

Deep Learning as Pinball

Uncertainty and Learning

In this “vanilla” form these machines “don’t know when they don’t know”.
Doubt is vital in real world decision making.
Incorporating this in systems is a long time focus of my technical research.

Comparison with Human Learning & Embodiment

The emulation of intelligence does not exhibit all the meta-modelling humans perform.

Data Science and Professionalisation

Industrial Revolution 4.0?
Industrial Revolution (1760-1840) term coined by Arnold Toynbee (1852-1883).
Maybe: But this one is dominated by data not capital
A revolution in information rather than energy.
That presents challenges and opportunities
Consider Apple vs Nokia: How you handle disruption.

compare digital oligarchy vs how Africa can benefit from the data revolution

A Time for Professionalisation?

New technologies historically led to new professions:
- Brunel (born 1806): Civil, mechanical, naval
- Tesla (born 1856): Electrical and power
- William Shockley (born 1910): Electronic
- Watts S. Humphrey (born 1927): Software

Why?

Codification of best practice.
Developing trust

Where are we?

Perhaps around the 1980s of programming.
- We understand if, for, and procedures
- But we don’t share best practice.
Let’s avoid the over formalisation of software engineering.

The Software Crisis

The major cause of the software crisis is that the machines have become several orders of magnitude more powerful! To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem.

Edsger Dijkstra (1930-2002), The Humble Programmer

The Data Crisis

The major cause of the data crisis is that machines have become more interconnected than ever before. Data access is therefore cheap, but data quality is often poor. What we need is cheap high-quality data. That implies that we develop processes for improving and verifying data quality that are efficient.

There would seem to be two ways for improving efficiency. Firstly, we should not duplicate work. Secondly, where possible we should automate work.

Me

Information Barons threaten our Privacy

{Our current information infrastructure bears a close relation with feudal systems of government. In the feudal system a lord had a duty of care over his serfs and vassals, a duty to protect subjects. But in practice there was a power-asymetry. In feudal days protection was against Viking raiders, today, it is against information raiders. However, when there is an information leak, when there is some failure in protections, it is already too late.

Alternatively, our data is publicly shared, as in an information commons. Akin to common land of the medieval village. But just as commons were subject to overgrazing and poor management, so it is that much of our data cannot be managed in this way. In particularly personal, sensitive data.

I explored this idea further in this .

Rest of the Talk

Importance of data infrastructure

Understanding Patient Data

WannaCry

Bush Pilot Model

The difference between capability and intent.

Motivation

Indsidious decision-making that has downstream instrumental effects we don’t control.
A power-asymmetry between data-controllers and data-subjects
A loss of personhood in the re-representation of ourselves in the digital world.
The GDPR’s endeavour to curb contractual freedom cannot by itself reverse the power-asymmetry between data-controllers and data-subjects.

Analogy

Digital Democracy vs Digital Oligarchy Lawrence (2015a) or Information Feudalism Lawrence (2015b)
Data subjects, data controllers and data processors.

Legal Mechanism of Trusts

Fiduciary responsibility of Trustees.
- Burden of proof in negligence is reversed.
Trustees are data controllers
Beneficiaries are data subjects
Power of data accumulation wielded on the beneficiaries behalf
See Edwards (2004), Delacroix and Lawrence (2019) and Lawrence (2016)

Thanks!

twitter: @lawrennd
podcast: The Talking Machines
Guardian article on 2015/nov/16/information-barons-threaten-autonomy-privacy-online
Guardian article on Data Trusts
Guardian article on Digital Oligarchies
Guardian article on Information Feudalism
Blog post on What is Machine Learning?
Blog post on System Zero
Blog post on Lies, Damned Lies and Big Data
Andrej Karpathy’s Medium Post

Delacroix, S., Lawrence, N.D., 2019. Bottom-up data trusts: Disturbing the “one size fits all” approach to data governance. International Data Privacy Law. https://doi.org/10.1093/idpl/ipz014

Edwards, L., 2004. The problem with privacy. International Review of Law Computers & Technology 18, 263–294.

Lawrence, N.D., 2017. Living together: Mind and machine intelligence. arXiv.

Lawrence, N.D., 2016. Data trusts could allay our privacy fears.

Lawrence, N.D., 2015a. Beware the rise of the digital oligarchy.

Lawrence, N.D., 2015b. The information barons threaten our autonomy and our privacy.

Taigman, Y., Yang, M., Ranzato, M., Wolf, L., 2014. DeepFace: Closing the gap to human-level performance in face verification, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2014.220