Data Science: Where Computation and Statistics Meet?

RSS Meeting, September 6th

Neil D. Lawrence

Amazon Cambridge and University of Sheffield

@lawrennd inverseprobability.com

Background

  • Data is Pervasive phenomenon that affects all aspects of our activities
  • Data diffusiveness is both a challenge and an opportunity

Evolved Relationship

Societal Effects

  • Automated decision making within the computer based only on the data
  • A requirement to better understand our own subjective biases to ensure that the human to computer interface formulates the correct conclusions from the data

Societal Effects

  • This process has already revolutionised biology

  • Shift in dynamic from the direct pathway between human and data to indirect pathway between human and data via the computer

  • This change of dynamics gives us the modern and emerging domain of data science

Challenges

  1. Paradoxes of the Data Society

  2. Quantifying the Value of Data

  3. Privacy, loss of control, marginalization

Paradoxes of the Data Society

Breadth vs Depth Paradox

  • Able to quantify to a greater and greater degree the actions of individuals

  • But less able to characterize society

  • As we measure more, we understand less

Wood or Tree

  • Can either see a wood or a tree.

What?

  • Perhaps greater preponderance of data is making society itself more complex

  • Therefore traditional approaches to measurement are failing

Wood or Tree

  • Examples
    • 2015 UK election polls
    • Clinical trial and personalized medicine

Rapidly Evolving Digital Society

  • Causes
    • Social media memes
    • Filter bubbles and echo chambers
    • Better stratification of populations, giving fewer trial subjects, less power
  • Curate’s egg of a society: it is only ‘measured in parts’

Solutions

  • More classical statistics!
    • Like the ‘paperless office’
  • A better characterization of human needs and flaws

  • Larger studies (100,000 genome)

Quantifying the Value of Data

There’s a sea of data, but most of it is undrinkable

We require data-desalination before it can be consumed!

Value

  • How do we measure value in the data economy?
  • How do we encourage data workers: curation and management
    • Incentivization
    • Quantifying the value in their contribution

Credit Allocation

  • Direct work on data generates an enormous amount of ‘value’ in the data economy but this is unaccounted in the economy

  • Hard because data is difficult to ‘embody’

Solutions

  • Encourage greater interaction between application domains and data scientists

  • Encourage visualization of data

  • Adoption of ‘data readiness levels’

  • Implications for incentivization schemes

Privacy, Loss of Control and Marginalization

  • Society is becoming harder to monitor

  • Individual is becoming easier to monitor

Hate Speech or Political Dissent?

  • social media monitoring for ‘hate speech’ can be easily turned to political dissent monitoring

Marketing

  • can become more sinister when the target of the marketing is well understood and the (digital) environment of the target is also so well controlled

Free Will

  • What does it mean if a computer can predict our individual behavior better than we ourselves can?

Discrimination

  • Potential for explicit and implicit discrimination on the basis of race, religion, sexuality, health status

  • All prohibited under European law, but can pass unawares, or be implicit

Marginalization

  • Credit scoring, insurance, medical treatment
  • What if certain sectors of society are under-represented in our aanalysis?
  • What if Silicon Valley develops everything for us?

Digital Revolution and Inequality?

Amelioration

  • Work to ensure individual retains control of their own data
  • We accept privacy in our real lives, need to accept it in our digital
  • Control of persona and ability to project

Awareness

  • Need to increase awareness of the pitfalls among researchers
  • Need to ensure that technological solutions are being delivered not merely for few (#FirstWorldProblems)
  • Address a wider set of challenges that the greater part of the world’s population is facing

Conclusion

  • Data science offers a great deal of promise
  • There are challenges and pitfalls
  • It is incumbent on us to avoid them

Thanks!