Challenges and Opportunities in Machine Learning and Artificial Intelligence

Neil D. Lawrence


ARM Data Science Conference


Neil D. Lawrence

Amazon and University of Sheffield


Gartner Hype Cycle

What about IoT?

Background: Big Data

  • The pervasiveness of data brings forward particular challenges.

  • Emerging themes: Devolving compute onto device.

  • Data preprocessing: Internet of Intelligence.

“Embodiment Factors”

compute ~10 gigaflops ~ 1000 teraflops?
communicate ~1 gigbit/s ~ 100 bit/s
10 ~ 1013

Evolved Relationship


  • This phenomenon has already revolutionised biology.

  • Large scale data acquisition and distribution.

  • What does it mean for IoT

Internet of People

  • Fog computing: barrier between cloud and device blurring.

  • Stuxnet: Adversarial and Security implications for intelligent systems.

  • Complex feedback between algorithm and implementation


  1. Paradoxes of the Data Society

  2. Quantifying the Value of Data

  3. Privacy, loss of control, marginalisation

Wood or Tree

  • Can either see a wood or a tree.


  • Election polls (UK 2015 elections, EU referendum, US 2016 elections)

  • Clinical trials vs personalized medicine: Obtaining statistical power where interventions are subtle. e.g. social media

Breadth vs Depth

  • Modern Measurement deals with depth (many subjects) … or breadth lots of detail about subject.

  • But what about
    • \(p\approx n\)?
    • Stratification of populations: batch effects etc.
  • Will summarization be devolved to the device?

  • Advantages for privacy and latency.

Also need

  • More classical statistics!
    • Like the ‘paperless office’
  • A better characterization of human

Quantifying the Value of Data

There’s a sea of data, but most of it is undrinkable

We require data-desalination before it can be consumed!

Data — Quotes from NIPS Workshop on ML for Healthcare

  • 90% of our time is spent on validation and integration (Leo Anthony Celi)
  • “The Dirty Work We Don’t Want to Think About” (Eric Xing)
  • “Voodoo to get it decompressed” (Francisco Giminez)
  • In health care clinicians collect the data and often control the direction of research through guardianship of data.


  • How do we measure value in the data economy?
  • How do we encourage data workers: curation and management
    • Incentivization for sharing and production.
    • Quantifying the value in the contribution of each actor.

Embodiment: Data Readiness Levels

  • Three Bands of Data Readiness:

  • Band C - accessibility

  • Band B - validity

  • Band A - usability

Accessibility: Band C

  • Hearsay data.
  • Availability, is it actually being recorded?
  • privacy or legal constraints on the accessibility of the recorded data, have ethical constraints been alleviated?
  • Format: log books, PDF …
  • limitations on access due to topology (e.g. it’s distributed across a number of devices)

Validity: Band B

  • faithfulness and representation
  • visualisations.
  • noise characterisation.
  • Missing values.
  • Example, was a column or columns accidentally perturbed (e.g. through a sort operation that missed one or more columns)? Or was a gene name accidentally converted to a date?

Usability: Band A

  • The usability of data
  • Band A is about data in context.
  • Consider appropriateness of a given data set to answer a particular question or to be subject to a particular analysis.

Recursive Effects

  • Band A may also require
    • active collection of new data.
    • annotation of data by human experts
    • revisiting the collection (and running through the appropriate stages again)

Also …

  • Encourage greater interaction between application domains and data scientists

  • Encourage visualization of data

  • Incentivise the delivery of data.

See Also …

  • Data Joel Tests proposal by Damon Civin (ARM)

Privacy, Loss of Control and Marginalization

  • Society is becoming harder to monitor

  • Individual is becoming easier to monitor


  • Marketing can become more sinister when the target of the marketing is well understood and the (digital) environment of the target is also so well controlled

  • Potential for explicit and implicit discrimination on the basis of race, religion, sexuality, health status

  • All prohibited under European law, but can pass unawares, or be implicit


  • Credit scoring, insurance, medical treatment
  • What if certain sectors of society are under-represented in our aanalysis?
  • What if Silicon Valley develops everything for us?

Digital Revolution and Inequality?


  • Work to ensure individual retains control of their own data
  • We accept privacy in our real lives, need to accept it in our digital
  • Control of persona and ability to project

  • Need better technological solutions: trust and algorithms.

Finally: Machine Learning Systems Design

  • Major new challenge for systems designers.

  • Internet of Intelligence but currently:

    • AI systems are currently fragile

Fragility of AI Systems

  • They are componentwise built from ML Capabilities.

  • Each capability is independently constructed and verified.

  • Pedestrian detection
  • Road line detection

  • Important for verification purposes.

Rapid Reimplementation

  • Whole systems are being deployed.

  • But they change their environment.

  • The experience evolved adversarial behaviour.

Machine Learning Systems Design

Turnaround And Update

  • There is a massive need for turn around and update

  • A redeploy of the entire system.
    • This involves changing the way we design and deploy.
  • Early Example: Stuxnet.


  • Data science offers a great deal of promise for personalized health
  • There are challenges and pitfalls
  • It is incumbent on us to avoid them

Many solutions rely on education and awareness

  • There are particular challenges around the Internet of Intelligence.