Was at the allotment the other day, and my son Frederick asked how the seeds we plant could ever survive when it took so much work and preparation to plant and support them. I said it was because they’ve been selected (by breeding) to produce high yield, and that tends to make them less robust (in comparison to e.g. weeds). So he asked why don’t we breed in robustness. I instinctively said that you can’t do that, because breeding involves selecting for a characteristic, whereas (I think) robustness implies performance under a range of different conditions, some of which will not even be known to us. Of course, I agree you can breed in resistance to a particular circumstance, but I think robustness is about resistance to many circumstances. I think a robust population will include wide variation in characteristics, whereas selection by breeding tends to refine the characteristics, reducing variation. My reply was instinctive, but I think it’s broadly speaking correct, although it would be nice to find some counter examples!
The Guardian’s media network published a short article I wrote for them on 5th March. They commissioned an article of about 600 words, that appeared on the Guardian’s site, but the original version I wrote was around 1400. I agreed a week’s exclusivity with the Guardian, but now that’s up, the longer version is below (it’s about twice as long).
On a recent visit to Genova, during a walk through the town with my colleague Lorenzo, he pointed out what he said was the site of the world’s first commercial bank. The bank of St George, located just outside the city’s old port, grew to be one of the most powerful institutions in Europe, it bankrolled Charles V and governed many of Genova’s possessions on the republic’s behalf. The trust that its clients placed in the bank is shown in records of its account holders. There are letters from Christopher Columbus to the bank instructing them in the handling of his affairs. The influence of the bank was based on the power of accumulated capital. Capital they could accumulate through the trust of a wealthy client base. The bank was so important in the medieval world that Machiavelli wrote that “if even more power was ceded by the Genovan republic to the bank, Genova would even outshine Venice amongst the Italian city states.” The Bank of St George was once one of the most influential private institutions in Europe.
Today the power wielded by accumulated capital can still dominate international affairs, but a new form of power is emerging, that of accumulated data. Like Hansel and Grettel trailing breadcrumbs into the forest, people now leave a trail of data-crumbs wherever we travel. Supermarket loyalty cards, text messages, credit card transactions, web browsing and social networking. The power of this data emerges, like that of capital, when it’s accumulated. Data is the new currency.
Where does this power come from? Cross linking of different data sources can give deep insights into personality, health, commercial intent and risk. The aim is now to understand and characterize the population, perhaps down to the individual level. Personalization is the watch word for your search results, your social network news feed, your movie recommendations and even your friends. This is not a new phenomenon, psychologists and social scientists have always attempted to characterize the population, to better understand how to govern or who to employ. They acquired their data by carefully constructed questionnaires to better understand personality and intelligence. The difference is the granularity with which these characterizations are now made, instead of understanding groups and sub-groups in the population, the aim is to understand each person. There are wonderful possibilities, we should better understand health, give earlier diagnoses for diseases such as dementia and provide better support to the elderly and otherwise incapacitated people. But there are also major ethical questions, and they don’t seem to be adequately addressed by our current legal frameworks. For Columbus it was clear, he was the owner of the money in his accounts. His instructions to the bank tell them how to distribute it to friends and relations. They only held his capital under license. A convenient storage facility. Ownership of data is less clear. Historically, acquiring data was expensive: questionnaires were painstakingly compiled and manually distributed. When answering, the risk of revealing too much of ourselves was small because the data never accumulated. Today we leave digital footprints in our wake, and acquisition of this data is relatively cheap. It is the processing of the data that is more difficult.
I’m a professor of machine learning. Machine learning is the main technique at the heart of the current revolution in artificial intelligence. A major aim of our field is to develop algorithms that better understand data: that can reveal the underlying intent or state of health behind the information flow. Already machine learning techniques are used to recognise faces or make recommendations, as we develop better algorithms that better aggregate data, our understanding of the individual also improves.
What do we lose by revealing so much of ourselves? How are we exposed when so much of our digital soul is laid bare? Have we engaged in a Faustian pact with the internet giants? Similar to Faust, we might agree to the pact in moments of levity, or despair, perhaps weakened by poor health. My father died last year, but there are still echoes of him on line. Through his account on Facebook I can be reminded of his birthday or told of common friends. Our digital souls may not be immortal, but they certainly outlive us. What we choose to share also affects our family: my wife and I may be happy to share information about our genetics, perhaps for altruistic reasons, or just out of curiosity. But by doing so we are also sharing information about our children’s genomes. Using a supermarket loyalty card gains us discounts on our weekly shop, but also gives the supermarket detailed information about our family diet. In this way we’d expose both the nature and nurture of our children’s upbringing. Will our decisions to make this information available haunt our children in the future? Are we equipped to understand the trade offs we make by this sharing?
There have been calls from Elon Musk, Stephen Hawking and others to regulate artificial intelligence research. They cite fears about autonomous and sentient artificial intelligence that could self replicate beyond our control. Most of my colleagues believe that such breakthroughs are beyond the horizon of current research. Sentient intelligence is still not at all well understood. As Ryan Adams, a friend and colleague based at Harvard tweeted:
The current "AI scare" going on feels a bit like kids playing with Legos and worrying about accidentally creating a nuclear bomb.
— Ryan Adams (@ryan_p_adams) February 5, 2015
Personally, I worry less about the machines, and more about the humans with enhanced powers of data access. After all, most of our historic problems seem to have come from humans wielding too much power, either individually or through institutions of government or business. Whilst sentient AI does seem beyond our horizons, one aspect of it is closer to our grasp. An aspect of sentient intelligence is ‘knowing yourself’, predicting your own behaviour. It does seem to me plausible that through accumulation of data computers may start to ‘know us’ even better than we know ourselves. I think that one concern of Musk and Hawking is that the computers would act autonomously on this knowledge. My more immediate concern is that our fellow humans, through the modern equivalents of the bank of St George, will be exploiting this knowledge leading to a form of data-oligarchy. And in the manner of oligarchies, the power will be in the hands of very few but wielded to the effect of many.
How do we control for all this? Firstly, we need to consider how to regulate the storage of data. We need better models of data-ownership. There was no question that Columbus was the owner of the money in his accounts. He gave it under license, and he could withdraw it at his pleasure. For the data repositories we interact with we have no right of deletion. We can withdraw from the relationship, and in Europe data protection legislation gives us the right to examine what is stored about us. But we don’t have any right of removal. We cannot withdraw access to our historic data if we become concerned about the way it might be used. Secondly, we need to increase transparency. If an algorithm makes a recommendation for us, can we known on what information in our historic data that prediction was based? In other words, can we know how it arrived at that prediction? The first challenge is a legislative one, the second is both technical and social. It involves increasing people’s understanding of how data is processed and what the capabilities and limitations of our algorithms are.
There are opportunities and risks with the accumulation of data, just as there were (and still are) for the accumulation of capital. I think there are many open questions, and we should be wary of anyone who claims to have all the answers. However, two directions seem clear: we need to both increase the power of the people; we need to develop their understanding of the processes. It is likely to be a fraught process, but we need to form a data-democracy: data governance for the people by the people and with the people’s consent.
Neil Lawrence is a Professor of Machine Learning at the University of Sheffield. He is an advocate of “Open Data Science” and an advisor to a London based startup, CitizenMe, that aims to allow users to “reclaim their digital soul”.
1) Are there applications that suit Gaussian processes well? Would they typically replace the neural network layers in a deep learning system or would they possibly be mixed and matched with neural layers, perhaps as preprocessors or using the neural layers for stuff like feature extraction (assuming that training algorithms allow for this)?
2) Are there training algorithms that allow Gaussian processes to be used today for deep-learning type applications or is this where work needs to be done?
3) Is the computational load similar to that of deep-learning neural networks or are the applications sufficiently different that a comparison is meaningless?
4) I think I saw a suggestion that GPs are reasonably robust when trained with small datasets – do they represent a way in for smaller organisation without bags of data? Is access to data a key problem when dealing with these data science techniques?
5) On a more general point that I think can be explored within this feature, are techniques such as Gaussian processes at a disadvantage in computer science because of their heavy mathematical basis? (I’ve had interviews with people like Donald Knuth and Erol Gelenbe in the past where the idea has come up that computer science and maths should, if not merge, interact a lot more).
There are now quite a few blog posts on the NIPS experiment, I just wanted to put a place together where I could link to them all. It’s a great set of posts from community mainstays, newcomers and those outside our research fields.
Just as a reminder, Corinna and I were extremely open about the entire review process, with a series of posts about how we engaging the reviewers and processing the data. All that background can be found through a separate post here.
At the time of writing there is also still quite a lot of twitter traffic on the experiment.
List of Blog Posts
- Eric Price’s original blog which seemed to have the largest impact in making the world aware of the experiment.
- John Langford, long time ML blogger has his say on the ACM site.
- Lance Fortnow from the computational complexity community adds his thoughts.
- Bert Huang, who was actually on the program committee gives his perspective.
- A really early post from Aaron Defazio was one of the authors of a duplicated paper. He writes about his experience from before the results were widely known.
- The experiment triggers a set of broader musings on peer review from popsci.
- Boaz Barak, who has experience of chairing FOCS, a major CS Theory conference, brings his perspective here.
- Yisong Yue gives the perspective of one of the attendees.
Thanks to an introduction to the Sage Math team by Fernando Perez, I just had the pleasure of participating in a large scale collaborative grant proposal construction exercise, co-ordinated Nicolas Thiéry. I’ve collaborated on grants before, but for me this was a unique experience because the grant writing was carried out in the open, on github.
The proposal, ‘OpenDreamKit’ is principally about doing as much as possible to smooth collaboration between mathematicians so that advances in maths can be delivered as rapidly as possible to teachers, researchers, technologists etc. Although, of course, I don’t have to tell you because you can read it on github.
It was a wonderful social experiment, and I think it really worked, although a lot of credit to that surely goes to the people involved (most of whom were there before I came aboard). I really hope this is funded, because collaborating with these people is going to be great.
For the first time on a proposal, I wasn’t the one who was most concerned about the latex template (actually second time … I’ve worked on a grant once with Wolfgang Huber). But this took things to another level, as soon as a feature was required the latex template seemed to be updated, almost in real time, I think mainly by Michael Kohlhase.
Socially it was very interesting, because the etiquette of how to interact (on the editing side) was not necessarily clear at the outset. For example, at one point I was tasked with proof reading a section, but ended up doing a lot of rephrasing. I was worried about whether people would be upset that their text had been changed, but actually there was a positive reaction (at least from Nicolas and Hans Fangohr!), which emboldened me to try more edits. As deadline approached I think others went through a similar transition, because the proposal really came together in the last few days. It was a little like a school dance, where at the start we were all standing at the edge of the room, eyeing each other up, but as DJ Nicolas ramped things up and the music became a little more hardcore (as dawn drew near), barriers broke down and everyone went a little wild. Nicolas produced a YouTube video, visualising the github commits.
As Alex Konovalov pointed out, we look like bees pollinating each other’s flowers!
I also discovered great new (for me) tools like appear.in that we used for brainstorming on ‘Excellence’ with Nicolas and Hans: much more convenient than Skype or Hangouts.
Many thanks to Nicolas, and all of the collaborators. I think it takes an impressive bunch of people to pull off such a thing, and regardless of outcome, which I very much hope will be positive, I look forward to further collaborations within this grouping.
Just back from NIPS where it was really great to see the results of all the work everyone put in. I really enjoyed the program and thought the quality of all presented work was really strong. Both Corinna and I were particularly impressed by the work that put in by oral presenters to make their work accessible to such a large and diverse audience.
We also released some of the figures from the NIPS experiment, and there was a lot of discussion at the conference about what the result meant.
As we announced at the conference the consistency figure was 25.9%. I just wanted to confirm that in the spirit of openness that we’ve pursued across the entire conference process Corinna and I will provide a full write up of our analysis and conclusions in due course!
Some of the comment in the existing debate is missing out some of the background information we’ve tried to generate, so I just wanted to write a post that summarises that information to highlight its availability.
With the help of Nicolo Fusi, Charles Twardy and the entire Scicast team we launched a Scicast question a week before the results were revealed. The comment thread for that question already had an amount of interesting comment before the conference. Just for informational purposes before we began reviewing Corinna forecast this figure would be 25% and I forecast it would be 20%. The box plot summary of predictions from Scicast is below.
Comment at the Conference
There was also an amount of debate at the conference about what the results mean, a few attempts to answer this question (based only on the inconsistency score and the expected accept rate for the conference) are available here in this little Facebook discussion and on this blog post.
Background Information on the Process
Just to emphasise previous posts on this year’s conference see below:
- NIPS Decision Time
- Reviewer Calibration for NIPS
- Reviewer Recruitment and Experience
- Paper Allocation for NIPS
Software on Github
And finally there is a large amount of code available on a github site for allowing our processes to be recreated. A lot of it is tidied up, but the last sections on the analysis are not yet done because it was always my intention to finish those when the experimental results are fully released.
On Wednesday last week I attended an “Open Meeting” organised by the UK’s EPSRC Research Council on the Alan Turing Institute. The Turing Institute is a new government initiative that stems from a letter from our Chief Scientific advisor to our prime minister about the “age of algorithms”. It aims to provide an international centre of excellence in data science.
The government has provided 42 million pounds of funding (about 60-70 million dollars) and Universities interested in partnering in the Turing Institute are expected to bring 5 million pounds (8 million dollars) to the initiative, to be spent over 5 years.
It seemed clear that the EPSRC will require that the institute is located in one place, and there was much talk of ‘critical mass’, which made me think of what ‘critical mass’ is in data science, after all, we aren’t building a large hadron collider, and one of the most interesting challenges of the new age of data is its distributed nature. I asked a question about this and was given the answers you might expect: flagship international centre of excellence, stimulating environment, attracting the best of the best etc. Nothing was particularly specific to data science.
In my own area of machine learning the UK has a lot of international recognition, but one of the features I’ve always enjoyed is the distributed nature of the expertise. The groups that spring first to mind are Cambridge (Engineering), Edinburgh (Informatics), UCL (Computer Science and Gatsby) and recently Oxford has expanded significantly (CS, Engineering and Statistics). I’ve always enjoyed the robustness that such a network of leading groups brings. It’s evolved over a period of 20 years, and those of us that have watched it grow are incredibly proud of what the UK has been able to achieve with relatively few people.
Data science requires strong interactions between statisticians and computer scientists. It requires knowledge of classical techniques and modern computational capabilities. The pool of expertise is currently rather small relative to the demand. As a result I find my self constantly in demand within my own University, mainly to advise on the capabilities that current approaches to analysis have. A recent xkcd comic cleverly reminded us of how hard it can be to explain the gap between those things that are easy and those things that are virtually impossible. Although in many cases where advice is need it’s not the full explanation that’s required, just the knowledge. Many expensive errors can be avoided by just a little access to this knowledge. Back in July I posted a position paper on this that was targeting exactly this problem and in Sheffield we are pursuing the “Open Data Science” agenda I proposed with vigour. Indeed, I sometimes wonder if my group is not more useful for this advice (which rarely involves any intellectual novelty) than for the ideas we push forward in our research. However, our utility as advisors is much more difficult to quantify, particularly because it often won’t lead to a formal collaboration.
I like analogies, but I think that ‘critical mass’ here is the wrong one. To give better access to expertise, what is required is a higher surface area to volume ratio, not a greater mass. Communication between experts is important, but we are fortunate in the UK to have a geographically close network of well connected Universities. Many international visitors take the time to visit two or three of the leading groups when they are here, so I think the idea of analogy of a lung is a far better one for describing what is required for UK data science. I’m pleased the government has recognised the importance of data science, I just hope that in their rush to create a flagship institute, with a large headline grabbing investment figure associated, they don’t switch off the incubator that sustains our developing lungs.