Challenges and Opportunities in Machine Learning and Artificial Intelligence

Neil D. Lawrence

2017-03-13

ARM Data Science Conference

2017-02-28

Neil D. Lawrence

Amazon and University of Sheffield

@lawrennd inverseprobability.com

Gartner Hype Cycle

What about IoT?

Background: Big Data

  • The pervasiveness of data brings forward particular challenges.

  • Emerging themes: Devolving compute onto device.

  • Data preprocessing: Internet of Intelligence.

“Embodiment Factors”

compute ~10 gigaflops ~ 1000 teraflops?
communicate ~1 gigbit/s ~ 100 bit/s
embodiment
(compute/communicate)
10 ~ 1013

Evolved Relationship

Effects

  • This phenomenon has already revolutionised biology.

  • Large scale data acquisition and distribution.

  • What does it mean for IoT

Internet of People

  • Fog computing: barrier between cloud and device blurring.

  • Stuxnet: Adversarial and Security implications for intelligent systems.

  • Complex feedback between algorithm and implementation

Challenges

  1. Paradoxes of the Data Society

  2. Quantifying the Value of Data

  3. Privacy, loss of control, marginalisation

Wood or Tree

  • Can either see a wood or a tree.

Examples

  • Election polls (UK 2015 elections, EU referendum, US 2016 elections)

  • Clinical trials vs personalized medicine: Obtaining statistical power where interventions are subtle. e.g. social media

Breadth vs Depth

  • Modern Measurement deals with depth (many subjects) … or breadth lots of detail about subject.

  • But what about
    • \(p\approx n\)?
    • Stratification of populations: batch effects etc.
  • Will summarization be devolved to the device?

  • Advantages for privacy and latency.

Also need

  • More classical statistics!
    • Like the ‘paperless office’
  • A better characterization of human

Quantifying the Value of Data

There’s a sea of data, but most of it is undrinkable

We require data-desalination before it can be consumed!

Data — Quotes from NIPS Workshop on ML for Healthcare

  • 90% of our time is spent on validation and integration (Leo Anthony Celi)
  • “The Dirty Work We Don’t Want to Think About” (Eric Xing)
  • “Voodoo to get it decompressed” (Francisco Giminez)
  • In health care clinicians collect the data and often control the direction of research through guardianship of data.

Value

  • How do we measure value in the data economy?
  • How do we encourage data workers: curation and management
    • Incentivization for sharing and production.
    • Quantifying the value in the contribution of each actor.

Embodiment: Data Readiness Levels

  • Three Bands of Data Readiness:

  • Band C - accessibility

  • Band B - validity

  • Band A - usability

Accessibility: Band C

  • Hearsay data.
  • Availability, is it actually being recorded?
  • privacy or legal constraints on the accessibility of the recorded data, have ethical constraints been alleviated?
  • Format: log books, PDF …
  • limitations on access due to topology (e.g. it’s distributed across a number of devices)

Validity: Band B

  • faithfulness and representation
  • visualisations.
  • noise characterisation.
  • Missing values.
  • Example, was a column or columns accidentally perturbed (e.g. through a sort operation that missed one or more columns)? Or was a gene name accidentally converted to a date?

Usability: Band A

  • The usability of data
  • Band A is about data in context.
  • Consider appropriateness of a given data set to answer a particular question or to be subject to a particular analysis.

Recursive Effects

  • Band A may also require
    • active collection of new data.
    • annotation of data by human experts
    • revisiting the collection (and running through the appropriate stages again)

Also …

  • Encourage greater interaction between application domains and data scientists

  • Encourage visualization of data

  • Incentivise the delivery of data.

See Also …

  • Data Joel Tests proposal by Damon Civin (ARM)

Privacy, Loss of Control and Marginalization

  • Society is becoming harder to monitor

  • Individual is becoming easier to monitor

Discrimination

  • Marketing can become more sinister when the target of the marketing is well understood and the (digital) environment of the target is also so well controlled

  • Potential for explicit and implicit discrimination on the basis of race, religion, sexuality, health status

  • All prohibited under European law, but can pass unawares, or be implicit

Marginalization

  • Credit scoring, insurance, medical treatment
  • What if certain sectors of society are under-represented in our aanalysis?
  • What if Silicon Valley develops everything for us?

Digital Revolution and Inequality?

Amelioration

  • Work to ensure individual retains control of their own data
  • We accept privacy in our real lives, need to accept it in our digital
  • Control of persona and ability to project

  • Need better technological solutions: trust and algorithms.

Finally: Machine Learning Systems Design

  • Major new challenge for systems designers.

  • Internet of Intelligence but currently:

    • AI systems are currently fragile

Fragility of AI Systems

  • They are componentwise built from ML Capabilities.

  • Each capability is independently constructed and verified.

  • Pedestrian detection
  • Road line detection

  • Important for verification purposes.

Rapid Reimplementation

  • Whole systems are being deployed.

  • But they change their environment.

  • The experience evolved adversarial behaviour.

Machine Learning Systems Design

Turnaround And Update

  • There is a massive need for turn around and update

  • A redeploy of the entire system.
    • This involves changing the way we design and deploy.
  • Early Example: Stuxnet.

Conclusion

  • Data science offers a great deal of promise for personalized health
  • There are challenges and pitfalls
  • It is incumbent on us to avoid them

Many solutions rely on education and awareness

  • There are particular challenges around the Internet of Intelligence.

Thanks!