# Modelling in the Context of Massively Missing Data

[edit]

**Max Planck Institute, Tübingen**on Mar 18, 2015 [pdf][jupyter]

#### Abstract

In the age of large streaming data it seems appropriate to revisit the foundations of what we think of as data modelling. In this talk I’ll argue that traditional statistical approaches based on parametric models and i.i.d. assumptions are inappropriate for the type of large scale machine learning we need to do in the age of massive streaming data sets. Particularly when we realise that regardless of the size of data we have, it pales in comparison to the data we could have. This is the domain of *massively missing data*. I’ll be arguing for flexible non-parametric models as the answer. This presents a particular challenge, non parametric models require data storage of the entire data set, which presents problems for massive, streaming data. I will present a potential solution, but perhaps end with more questions than we started with.