On Wednesday last week I attended an “Open Meeting” organised by the UK’s EPSRC Research Council on the Alan Turing Institute. The Turing Institute is a new government initiative that stems from a letter from our Chief Scientific advisor to our prime minister about the “age of algorithms”. It aims to provide an international centre of excellence in data science.

The government has provided 42 million pounds of funding (about 60-70 million dollars) and Universities interested in partnering in the Turing Institute are expected to bring 5 million pounds (8 million dollars) to the initiative, to be spent over 5 years.

It seemed clear that the EPSRC will require that the institute is located in one place, and there was much talk of ‘critical mass’, which made me think of what ‘critical mass’ is in data science, after all, we aren’t building a large hadron collider, and one of the most interesting challenges of the new age of data is its distributed nature. I asked a question about this and was given the answers you might expect: flagship international centre of excellence, stimulating environment, attracting the best of the best etc. Nothing was particularly specific to data science.

In my own area of machine learning the UK has a lot of international recognition, but one of the features I’ve always enjoyed is the distributed nature of the expertise. The groups that spring first to mind are Cambridge (Engineering), Edinburgh (Informatics), UCL (Computer Science and Gatsby) and recently Oxford has expanded significantly (CS, Engineering and Statistics). I’ve always enjoyed the robustness that such a network of leading groups brings. It’s evolved over a period of 20 years, and those of us that have watched it grow are incredibly proud of what the UK has been able to achieve with relatively few people.

Data science requires strong interactions between statisticians and computer scientists. It requires knowledge of classical techniques and modern computational capabilities. The pool of expertise is currently rather small relative to the demand. As a result I find my self constantly in demand within my own University, mainly to advise on the capabilities that current approaches to analysis have. A recent xkcd comic cleverly reminded us of how hard it can be to explain the gap between those things that are easy and those things that are virtually impossible. Although in many cases where advice is need it’s not the full explanation that’s required, just the knowledge. Many expensive errors can be avoided by just a little access to this knowledge. Back in July I posted a position paper on this that was targeting exactly this problem and in Sheffield we are pursuing the “Open Data Science” agenda I proposed with vigour. Indeed, I sometimes wonder if my group is not more useful for this advice (which rarely involves any intellectual novelty) than for the ideas we push forward in our research. However, our utility as advisors is much more difficult to quantify, particularly because it often won’t lead to a formal collaboration.

I like analogies, but I think that ‘critical mass’ here is the wrong one. To give better access to expertise, what is required is a higher surface area to volume ratio, not a greater mass. Communication between experts is important, but we are fortunate in the UK to have a geographically close network of well connected Universities. Many international visitors take the time to visit two or three of the leading groups when they are here, so I think the idea of analogy of a lung is a far better one for describing what is required for UK data science. I’m pleased the government has recognised the importance of data science, I just hope that in their rush to create a flagship institute, with a large headline grabbing investment figure associated, they don’t switch off the incubator that sustains our developing lungs.

Update 10th May 2015 They announced the joint venture partners, and went with the minumum suggested number of five. They selected the four groups I mentioned above: Oxford, Cambridge, UCL and Edinburgh, and also Warwick, which has probably the leading statistics department in the UK and has invested heavily in this area. These are certainly the five I would have selected from the UK, so that’s an exciting start. It will be interesting to see how they manage to bootstrap the process and engage other institutions in the UK.