From Innovation to Deployment
Auto AI and Machine Learning Systems Design
Neil D. Lawrence
2019-10-24
Data Science Africa, Ashesi University
Project Description
It used to be true that computers only did what we programmed them to do, but today AI systems are learning from our data. This introduces new problems in how these systems respond to their environment.
We need to better monitor how data is influencing decision making and take corrective action as required.
Aim
Scale safe and reliable AI solutions.
Move from Auto ML to Auto AI
Bayesian Optimisation to Bayesian System Optimisation
SafeBoda
With road accidents set to match HIV/AIDS as the highest cause of death in low/middle income countries by 2030, SafeBoda’s aim is to modernise informal transportation and ensure safe access to mobility.
Turing AI Fellowship
and
Your Name Here
Inclusive Project
There is no way that the team we’re building will be able to deliver on this agenda alone, so please join us in addressing these challenges!
Announcement
Five year program in collaboration with
and
Inclusive Project
There is no way that the team we’re building will be able to deliver on this agenda alone, so please join us in addressing these challenges!
Ride Allocation Prediction
The Promise of AI
Automation forces humans to adapt, we serve.
We can only automate by systemizing and controlling environment.
AI promises to be first wave of automation that adapts to us rather than us to it.
That Promise …
… will remain unfulfilled with current systems design.
Artificial vs Natural Systems
Consider natural intelligence, or natural systems
Contrast between an artificial system and an natural system.
The key difference between the two is that artificial systems are designed whereas natural systems are evolved .
Natural Systems are Evolved
Survival of the fittest
?
Natural Systems are Evolved
Survival of the fittest
Herbet Spencer , 1864
Natural Systems are Evolved
Non-survival of the non-fit
Mistake we Make
Equate fitness for objective function.
Assume static environment and known objective.
.
Technical Consequence
Classical systems design assumes decomposability .
Data-driven systems interfere with decomponsability.
Bits and Atoms
The gap between the game and reality.
The need for extrapolation over interpolation.
Computer Science Paradigm Shift
Von Neuman Architecture:
Code and data integrated in memory
Today:
Code and data separated for security
Computer Science Paradigm Shift
Machine learning:
Machine learning is a high level breach of the code/data separation.
An Intelligent System
Joint work with M. Milo
An Intelligent System
Joint work with M. Milo
Peppercorns
A new name for system failures which aren’t bugs.
Difference between finding a fly in your soup vs a peppercorn in your soup.
The Three Ds of Machine Learning Systems Design
Three primary challenges of Machine Learning Systems Design.
Decomposition
Data
Deployment
The Three Ds of Machine Learning Systems Design
Three primary challenges of Machine Learning Systems Design.
Decomposition
Data
Deployment
Premise
Our machine learning is based on a software systems view that is 20 years out of date.
Continuous Deployment
Deployment of modeling code.
Data dependent models in production need continuous monitoring .
Continous monitoring implies statistical tests rather than classic software tests.
You can also check my
Continuous Monitoring
Continuous deployment:
We’ve changed the code, we should test the effect.
Continuous Monitoring:
The world around us is changing, we should monitor the effect.
Update our notions of testing: progression testing
Data Oriented Architectures
Convert data to a first-class citizen .
View system as operations on data streams .
Expose data operations in a programmatic way.
Data Orientated Architectures
Historically we’ve been software first
A necessary but not sufficient condition for data first
Move from
service orientated architectures
data orientated architectures
Streaming System
Move from pull updates to push updates.
Operate on rows rather than columns.
Lead to stateless logic: persistence handled by system.
Example Apache Kafka + Apache Flink
Streaming Architectures
AWS Kinesis, Apache Kafka
Not just about streaming
Nodes in the architecture are stateless
They persist through storing state on streams
This brings the data inside out
Apache Flink
Streams and transformations
a stream is a (potentially never-ending) flow of data records
a transformation: streams as input, produces transformed streams as output
Join
stream.join(otherStream)
.where(<KeySelector>)
.equalTo(<KeySelector>)
.window(<WindowAssigner>)
.apply(<JoinFunction>)
Milan
Data Oriented Programming Language and runtime.
DSL Embedded in Scala converts to an intermediate langugage.
Intermediate language for compilation on different platforms (currently Flink)
https://github.com/amzn/milan
Trading System
High frequency share trading.
Stream of prices with millisecond updates.
Trades required on millisecond time line
Hypothetical Streams
Real stream — share prices
derived hypothetical stream — share prices in future.
Hypothetical constrained by
input constraints.
decision functional
computational requirements (latency)
Hypothetical Advantage
Modelling is now required.
But modelling is declared in the ecosystem.
If it’s manual, warnings can be used
calibration, fairness, dataset shift
Opens door to Auto AI.
Ride Sharing: Service Oriented
Ride Sharing: Data Oriented
Ride Sharing: Hypothetical
Bayesian System Optimization
Aim: maintain interpretable components.
Monitor downstream/upstream effects through emulation.
Optimize individual components considering upstream and downstream.
Auto AI
Auto ML is great but not sufficient
Interacting components in an ML system
Identify problems, and automatically deploy solutions
Conclusion
Challenges in decomposition , data and model deployment for ML.
Data oriented architectures and data first thinking are the solution.
Data oriented programming creates systems that are ready to deploy.
Opens the door to AutoAI and information dynamics analysis.