Neil D. Lawrence
@lawrennd
inverseprobability.com
\[ \text{data} + \text{model} \rightarrow \text{prediction}\]
\[\text{data} + \text{model} \rightarrow \text{prediction}\]
... of two different domains
Data Science: arises from the fact that we now capture data by happenstance.
Artificial Intelligence: emulation of human behaviour.
compute | ~10 gigaflops | ~ 1000 teraflops? |
communicate | ~1 gigbit/s | ~ 100 bit/s | (compute/communicate) | 10 | ~ 1013 |
How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression
\[ \text{odds} = \frac{\text{bought}}{\text{not bought}} \]
\[ \log \text{odds} = \beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}\]
How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression
\[ p(\text{bought}) = {f}\left(\beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}\right)\]
How does machine learning work?
Jumper (jersey/sweater) purchase with logistic regression
\[ p(\text{bought}) = {f}\left(\boldsymbol{\beta}^\top {{\bf {x}}}\right)\]
We call \({f}(\cdot)\) the prediction function
\[{E}(\boldsymbol{\beta}, {\mathbf{Y}}, {{\bf X}})\]
\[{E}(\boldsymbol{\beta}) = \sum_{i=1}^{n}\left({y}_i - {f}({{\bf {x}}}_i)\right)^2\]
Prediction function, \({f}(\cdot)\)
Objective function, \({E}(\cdot)\)
These are interpretable models: vital for disease etc.
Modern machine learning methods are less interpretable
Example: face recognition
Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Color illustrates feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected.
Source: DeepFace
|
Image from Wikimedia Commons http://bit.ly/16kMKHQ |
Can a Deep Gaussian process help?
Deep GP is one GP feeding into another.
Challenges in deploying AI.
Currently this is in the form of "machine learning systems"
Major new challenge for systems designers.
Deep nets are powerful approach to images, speech, language.
Proposal: Deep GPs may also be a great approach, but better to deploy according to natural strengths.
Probabilistic numerics, surrogate modelling, emulation, and UQ.
Not a fan of AI as a term.
But we are faced with increasing amounts of algorithmic decision making.
When trading off decisions: compute or acquire data?
There is a critical need for uncertainty.
Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known.
Designing an F1 Car requires CFD, Wind Tunnel, Track Testing etc.
How to combine them?
\[{{\bf {x}}}_{t+1} = {f}({{\bf {x}}}_{t},\textbf{u}_{t})\]
where \(\textbf{u}_t\) is the action force, \({{\bf {x}}}_t = (p_t, v_t)\) is the vehicle state
\[\pi({{\bf {x}}},\theta)= \theta_0 + \theta_p p + \theta_vv.\]
\[\theta^* = arg \max_{\theta} R_T(\theta).\]
For standard Bayesian Optimization ignored dynamics of the car.
For more data efficiency, first emulate the dynamics.
Then do Bayesian optimization of the emulator.
Use a Gaussian process to model \[\Delta v_{t+1} = v_{t+1} - v_{t}\] and \[\Delta x_{t+1} = p_{t+1} - p_{t}\]
Two processes, one with mean \(v_{t}\) one with mean \(p_{t}\)
Used 500 randomly selected points to train emulators.
Can make proces smore efficient through experimental design.
Our emulator used only 500 calls to the simulator.
Optimizing the simulator directly required 37,500 calls to the simulator.
500 calls to the simulator vs 37,500 calls to the simulator
\[{f}_i\left({{\bf {x}}}\right) = \rho{f}_{i-1}\left({{\bf {x}}}\right) + \delta_i\left({{\bf {x}}}\right)\]
\[{f}_i\left({{\bf {x}}}\right) = {g}_{i}\left({f}_{i-1}\left({{\bf {x}}}\right)\right) + \delta_i\left({{\bf {x}}}\right),\]
250 observations of high fidelity simulator and 250 of the low fidelity simulator
Artificial Intelligence and Data Science are fundamentally different.
In one you are dealing with data collected by happenstance.
In the other you are trying to build systems in the real world, often by actively collecting data.
Our approaches to systems design are building powerful machines that will be deployed in evolving environments.