Data Analysis, NHS and Industrial Partners

Update: I tidied up some of these thoughts and on May 5th they were published in a Guardian article here

I was asked by samim and Sam Shead to pass comment on a recent development in the DeepMind Health agenda (in particular this article written by Sam Shead).

As I mentioned on twitter, I’m not close to these sort of assessments, so I only have general thoughts and concerns (some of which might be allayed if I knew how these processes were being conducted).

I don’t have a particular issue with DeepMind, they seem like nice people and they’ve got some great stuff going on there. I’ve also met Mustafa and I think he has a genuine interest in helping out here. They also interact very nicely with the academic community. But not all companies are DeepMind, and in the end DeepMind are subject to a wider regime through Google and the general demand for companies to deliver to share holders.

There is some history to this. The NHS has a scheme, care.data, that suffered a disastrous launch as they only offered a one time opt out to patients. They had also been selling data to insurers.

You can read about the scheme here. I was optimistic about this idea and the potential of access to such data. The UK science/health journalist and doctor Ben Goldacre was also very optimistic about the potential for this scheme. However, this is his reaction to the scheme as it was being deployed. And I felt exactly the same. First the optimism and then the horror of the slow car crash.

Now I actually suspect that there are a number of lessons learned from that fiasco. But it was absolutely shocking the extent to which those in charge of the scheme were insensitive to the challenges.

It is one of my main research goals to be able to access such data and produce insights on disease. But I don’t want to do it under any circumstances. The current system is that the NHS is the arbiter of our data, but they seem ill equipped. We should have some control of our own data. I had hoped that care.data would draw people’s attention to the quantity of data the NHS held and raise their interest in ensuring something useful was done with it. I could then imagine individuals allowing private companies to access their data on the recommendation of their doctor or in response to particular developments in their own health status. It doesn’t seem to me that this is the way things are panning out.

There is another component to the challenge, one that is particularly difficult with health data. I believe that there is a moral duty to ensure that we are applying ‘best of class’ approaches to this data. This is vital to ensure we are obtaining the best individual outcomes. But as we’ve already seen from the field of computational biology, ‘best of class’ requires a community effort. In a rapidly moving field it’s highly unlikely that any individual company has all the correct people to provide the right solutions without any interaction with the wider international community. The wider community also provides oversight and comment. But this requires working in the open. However, there is a natural, and important, tension between that type of openness and the need for individual privacy.

This circle is difficult to square, but industrial companies in particular have no incentive to broach this problem. The very existence of privacy concerns allows them to lock down their activities, they can then market themselves without any oversight of what they are doing. This feels a bit like a regression to a world before drug trials. Claims of efficacy tha are never subject to scientific scrutiny. Imagine if drug trials were done in private, without known results. It would be unacceptable. But it seems algorithmic deployment is unlikely to be subject to the same public scrutiny that drugs trials are.

Big challenges, and maybe someone has half an eye on them. But past experience indicates that very unsophisticated thinking can dominate in large organisations.

So in general I’m concerned that there will be a lack of transparency about how our data is processed and what techniques are being used. In particular, I think many companies involved in this process do not have ‘best of class’ employees, or ‘best of class’ practices with regard to data analytics. There is a moral obligation once the data is inside the company to ensure that the very best analysis is done. I’m not sure how we are controlling for that with our current approaches.