Data First Culture

Post-Digital Transformation and Intellectual Debt

Neil D. Lawrence

Advanced Leadership Programme, Judge Business School, Cambridge

Henry Ford’s Faster Horse

Introduction

Neil Lawrence
Neil Lawrence
Professor of Machine Learning

The Gartner Hype Cycle

Cycle for ML Terms

What is Machine Learning?

What is Machine Learning?

\[ \text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

  • data : observations, could be actively or passively acquired (meta-data).
  • model : assumptions, based on previous experience (other data! transfer learning etc), or beliefs about the regularities of the universe. Inductive bias.
  • prediction : an action to be taken or a categorization or a quality score.

What is Machine Learning?

\[\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}\]

  • To combine data with a model need:
  • a prediction function \(f(\cdot)\) includes our beliefs about the regularities of the universe
  • an objective function \(E(\cdot)\) defines the cost of misprediction.

Artificial Intelligence and Data Science

  • AI aims to equip computers with human capabilities
    • Image understanding
    • Computer vision
    • Speech recognition
    • Natural language understanding
    • Machine translation

Supervised Learning for AI

  • Dominant approach today:
    • Generate large labelled data set from humans.
    • Use supervised learning to emulate that data.
      • E.g. ImageNet Russakovsky et al. (2015)
  • Significant advances due to deep learning
    • E.g. Alexa, Amazon Go

Data Science

  • Arises from happenstance data.
  • Differs from statistics in that the question comes after data collection.

Exercise: Score Yourself

  • I am a data science:
  1. follower (no visibility/influence)
  2. some visibilty/influence
  3. visibility and some influence
  4. leader (lead on data and AI developments)

Intellectual Debt

Information and Embodiment

Claude Shannon

Embodiment Factors

bits/min billions 2,000
billion
calculations/s
~100 a billion
embodiment 20 minutes 5 billion years

Evolved Relationship with Information

New Flow of Information

Evolved Relationship

Evolved Relationship

There are three types of lies: lies, damned lies and statistics

??

There are three types of lies: lies, damned lies and statistics

Benjamin Disraeli

There are three types of lies: lies, damned lies and statistics

Benjamin Disraeli 1804-1881

There are three types of lies: lies, damned lies and ‘big data’

Neil Lawrence 1972-?

Mathematical Statistics

‘Mathematical Data Science’

Heider and Simmel (1944)

The Big Data Paradox

  • We collect more data, but we understand less.

Wood or Tree

Big Model Paradox

  • Add complexity to the model to make it realistic.
  • Move model “beyond human intuition”
  • But model still falls well short of mark in terms of representing reality

Complexity in Action

Data Selective Attention Bias

BMI Steps Data

BMI Steps Data Analysis

A Hypothesis as a Liability

“ ‘When someone seeks,’ said Siddhartha, ‘then it easily happens that his eyes see only the thing that he seeks, and he is able to find nothing, to take in nothing. […] Seeking means: having a goal. But finding means: being free, being open, having no goal.’ ”

Hermann Hesse

The Scientific Process

Number Theatre

Data Theatre

The Art of Statistics

David Spiegelhalter

Conclusion

See the Gorilla don’t be the Gorilla.

Thanks!

Heider, F., Simmel, M., 1944. An experimental study of apparent behavior. The American Journal of Psychology 57, 243–259. https://doi.org/10.2307/1416950
Lawrence, N.D., 2010. Introduction to learning and inference in computational systems biology.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L., 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 211–252. https://doi.org/10.1007/s11263-015-0816-y