Neil D. Lawrence
@lawrennd
inverseprobability.com
\[ \text{data} + \text{model} \rightarrow \text{prediction}\]
Normal ML (& stats?) focus: model
In real world need more focus on: data
motivation for data science
The pervasiveness of data brings forward particular challenges.
Emerging themes: Devolving compute onto device.
Data preprocessing: Internet of Intelligence.
compute | ~10 gigaflops | ~ 1000 teraflops? |
communicate | ~1 gigbit/s | ~ 100 bit/s |
embodiment (compute/communicate) |
10 | ~ 1013 |
Paradoxes of the Data Society
Quantifying the Value of Data
Privacy, loss of control, marginalisation
Paradoxes of the Data Society
Quantifying the Value of Data
Privacy, loss of control, marginalisation
Machine Learning Systems Design
Able to quantify to a greater and greater degree the actions of individuals
But less able to characterize society
As we measure more, we understand less
Perhaps greater preponderance of data is making society itself more complex
Therefore traditional approaches to measurement are failing
Curate’s egg of a society: it is only ‘measured in parts’
Election polls (UK 2015 elections, EU referendum, US 2016 elections)
Clinical trial and personalized medicine
Social media memes
Filter bubbles and echo chambers
\[ \mathbf{Y} = \begin{bmatrix} y_{1, 1} & y_{1, 2} &\dots & y_{1,p}\\ y_{2, 1} & y_{2, 2} &\dots & y_{2,p}\\ \vdots & \vdots &\dots & \vdots\\ y_{n, 1} & y_{n, 2} &\dots & y_{n,p} \end{bmatrix} \in \Re^{n\times p} \]
\[ \mathbf{Y} = \begin{bmatrix} \mathbf{y}^\top_{1, :} \\ \mathbf{y}^\top_{2, :} \\ \vdots \\ \mathbf{y}^\top_{n, :} \end{bmatrix} \in \Re^{n\times p} \]
\[ \mathbf{Y} = \begin{bmatrix} \mathbf{y}_{:, 1} & \mathbf{y}_{:, 2} & \dots & \mathbf{y}_{:, p} \end{bmatrix} \in \Re^{n\times p} \]
\[p(\mathbf{Y}|\boldsymbol{\theta}) = \prod_{i=1}^n p(\mathbf{y}_{i, :}|\boldsymbol{\theta})\]
\[p(\mathbf{Y}|\boldsymbol{\theta}) = \prod_{i=1}^n p(\mathbf{y}_{i, :}|\boldsymbol{\theta})\]
\[\log p(\mathbf{Y}|\boldsymbol{\theta}) = \sum_{i=1}^n \log p(\mathbf{y}_{i, :}|\boldsymbol{\theta})\]
Typically \(\boldsymbol{\theta} \in \Re^{\mathcal{O}(p)}\)
Consistency reliant on large sample approximation of KL divergence
\[ \text{KL}(P(\mathbf{Y})|| p(\mathbf{Y}|\boldsymbol{\theta}))\]
Minimization is equivalent to maximization of likelihood.
A foundation stone of classical statistics.
For large \(p\) the parameters are badly determined.
Large \(p\) small \(n\) problem.
Easily dealt with through definition.
\[p(\mathbf{Y}|\boldsymbol{\theta}) = \prod_{j=1}^p p(\mathbf{y}_{:, j}|\boldsymbol{\theta})\]
\[\log p(\mathbf{Y}|\boldsymbol{\theta}) = \sum_{j=1}^p \log p(\mathbf{y}_{:, j}|\boldsymbol{\theta})\]
Modern Measurement deals with depth (many subjects) … or breadth lots of detail about subject.
Massively missing data.
Classical bias towards tables.
Streaming data.
A better characterization of human (see later)
There’s a sea of data, but most of it is undrinkable
We require data-desalination before it can be consumed!
How do we measure value in the data economy?
How do we encourage data workers: curation and management
Incentivization for sharing and production.
Quantifying the value in the contribution of each actor.
Data Readiness Levels (see also arxiv)
Three Bands of Data Readiness:
Band C - accessibility
Band B - validity
Band A - usability
Band A may also require
active collection of new data.
annotation of data by human experts (Enrica)
revisiting the collection (and running through the appropriate stages again)
Encourage greater interaction between application domains and data scientists
Encourage visualization of data
Incentivise the delivery of data.
Analogies: For SDEs describe data science as debugging.
Fog computing: barrier between cloud and device blurring.
Stuxnet: Adversarial and Security implications for intelligent systems.
Complex feedback between algorithm and implementation
Major new challenge for systems designers.
Internet of Intelligence but currently:
They are componentwise built from ML Capabilities.
Each capability is independently constructed and verified.
Road line detection
Important for verification purposes.
Whole systems are being deployed.
But they change their environment.
The experience evolved adversarial behaviour.
There is a massive need for turn around and update
Interface between security engineering and machine learning.
Many solutions rely on education and awareness