Data Science: Where Computation and Statistics Meet?
Neil D. Lawrence
Amazon Cambridge and University of Sheffield
@lawrennd
inverseprobability.com
Background
- Data is Pervasive phenomenon that affects all aspects of our activities
- Data diffusiveness is both a challenge and an opportunity
Evolved Relationship
Societal Effects
- Automated decision making within the computer based only on the data
- A requirement to better understand our own subjective biases to ensure that the human to computer interface formulates the correct conclusions from the data
Societal Effects
This process has already revolutionised biology
Shift in dynamic from the direct pathway between human and data to indirect pathway between human and data via the computer
This change of dynamics gives us the modern and emerging domain of data science
Challenges
Paradoxes of the Data Society
Quantifying the Value of Data
Privacy, loss of control, marginalization
Paradoxes of the Data Society
Breadth vs Depth Paradox
Able to quantify to a greater and greater degree the actions of individuals
But less able to characterize society
As we measure more, we understand less
Wood or Tree
- Can either see a wood or a tree.
Wood or Tree
- Examples
- 2015 UK election polls
- Clinical trial and personalized medicine
Rapidly Evolving Digital Society
- Causes
- Social media memes
- Filter bubbles and echo chambers
- Better stratification of populations, giving fewer trial subjects, less power
- Curate’s egg of a society: it is only ‘measured in parts’
Quantifying the Value of Data
There’s a sea of data, but most of it is undrinkable
We require data-desalination before it can be consumed!
Value
- How do we measure value in the data economy?
- How do we encourage data workers: curation and management
- Incentivization
- Quantifying the value in their contribution
Solutions
Encourage greater interaction between application domains and data scientists
Encourage visualization of data
Adoption of ‘data readiness levels’
Implications for incentivization schemes
Privacy, Loss of Control and Marginalization
Hate Speech or Political Dissent?
- social media monitoring for ‘hate speech’ can be easily turned to political dissent monitoring
Marketing
- can become more sinister when the target of the marketing is well understood and the (digital) environment of the target is also so well controlled
Free Will
- What does it mean if a computer can predict our individual behavior better than we ourselves can?
Discrimination
Potential for explicit and implicit discrimination on the basis of race, religion, sexuality, health status
All prohibited under European law, but can pass unawares, or be implicit
Marginalization
- Credit scoring, insurance, medical treatment
- What if certain sectors of society are under-represented in our aanalysis?
- What if Silicon Valley develops everything for us?
Digital Revolution and Inequality?
Amelioration
- Work to ensure individual retains control of their own data
- We accept privacy in our real lives, need to accept it in our digital
- Control of persona and ability to project
Awareness
- Need to increase awareness of the pitfalls among researchers
- Need to ensure that technological solutions are being delivered not merely for few (#FirstWorldProblems)
- Address a wider set of challenges that the greater part of the world’s population is facing
Conclusion
- Data science offers a great deal of promise
- There are challenges and pitfalls
- It is incumbent on us to avoid them