DALI Meeting + Views on Machine Learning and Artificial Intelligence

I just got back from the first DALI meeting, held in La Palma. I was a co-organiser with Zoubin Ghahramani, Thomas Hoffman and Bernhard Schoelkopf. The original vision was mainly driven by Bernhard, and the meeting is an attempt to recapture the spirit of some of the early NIPS conferences and the Snowbird meeting: a smaller meeting with some focus and a lot of informal debate. A schedule designed to encourage discussion and for people to engage across different fields and sub-fields.

The meeting was run as a day of workshops, followed by a day of plenary sessions and a further day of workshops. Zoubin organised the workshop schedule, and Thomas the plenary sessions. For the workshops we decided on topics and invited organisers who themselves invited the attendees, we heard about Probabilistic Programming, Networks and Causality, Deep Learning for Vision, Probabilistic Numerics and Statistical Learning Theory. We had plenaries from experts in machine learning as well as one by Metin Sitti on Mini/Micro/Nanorobotics. Thomas ended the plenary session with a panel discussion with Alex Graves, Ralf Herbrich, Yann LeCun, Bernhard Schoelkopf, Zoubin Ghahramani and myself, chaired by Thomas.

Thomas seeded the panel discussion by asking us to make three minute statements. He asked about several things, but the one that caught my eye was machine learning and artificial intelligence. Everyone had interesting things to day, and I don’t want to paraphrase them too much, but it distilled some of my thinking (being asked to summarize in 3 minutes) so I wanted to reflect that here.

I will only mention other’s views briefly, because I don’t want to misrepresent what they might have said, and that’s easy to do. But I’m happy for any of them to comment on the below. They had many interesting things to say about the topics also (probably much more so than me!).

I only had two ‘notes’ for the discussion which I spoke to off the cuff, so I’ll split the thoughts into those two sections. Those who know me know I can talk for a long time, and I was trying to limit this tendency!

Note 1: Perception and Self Perception

This note meant to me that perception was an area where we’ve been successful, but self-perception less so. I’ll try and clarify.

I’m probably using these terms too loosely, so let me define what I mean by ‘perception’.I mean the sensing of objects and our environment. The particular recent success of deep learning has been on sensing the environment, categorising objects, locating pedestrians. I’ve always felt the mathematical theory of how we should aim to do this was fairly clear: it’s summarised by Bayes’ rule which is widely used in robotics, vision, speech etc. The big recent change from the deep learning community has been the complexity of the mappings that we use to form this perception and our ability to learn from data. So I see this as a success.

For self-perception I mean the sensing of our selves, our prediction of our own interactions with the environment. How what we might do could affect the environment and how we will react to those effects. This has an interesting flavour of infinite regress. If we try and model ourselves and the environment we need a model that is larger than ourselves and the environment. However, that model is part of us, so we need another model on top of that. This is the infinite regress, and it’s non convergent. It strikes me that the only way we can get around that is to use a ‘compression’ of ourselves, i.e. have a model within our model in order to predict our interactions with the environment. This compressed model of ourselves will not be entirely accurate, and may mis-predict our own behaviour, but it is necessary to make the problem tractable.

A further complication is that our environment also contains other complex intelligent entities that try to second guess our behaviour. We need to also model them. I think one way we do this is by projecting our own model of ourselves onto them, i.e. using our own model of our own motivations, with appropriate modifications, to incorporate other people in our predictions. I see this as some form of ‘self-sensing’ and also sensing of others. I think doing it well may lead naturally to good planning algorithms, and planning was something that Yann mentioned we do badly. I don’t think we’re very good at this yet, and I think we would benefit from more open interaction with cognitive scientists and neuroscientists in understanding how humans do this. I know there’s a lot of research in this area in those fields, but I’m not an expert. Having a mathematical framework which shows how we can avoid this infinite regress through compression would be great.

These first thoughts were very much my thoughts about challenges for AI. The next thought tries to address AI in society.

Note 2: Creeping and Creepy AI

I think what we are seeing with successful AI is that it is emerging slowly, and without most people noticing. Large amounts of our interactions with computers are dictated by machine learning algorithms. We were lucky to have Lars Backstrom at the meeting who leads the team at Facebook that decides how to rank our news feed on the site. This is done by machine learning, but most people would be unaware that there is some ‘Artificial Intelligence’ underpinning it. Similarly, the ads we view across all sites are ranked by AI. Machine learning also recommends products on Amazon. Machine learning is becoming a core computational technique. I was sitting next to Ralf when Amazon launched their new machine learning services on AWS. Driverless cars are another good example, they are underpinned by a lot of machine learning ideas, but those technologies are also already appearing in normal cars. ‘Creeping AI’ is enhancing human abilities, improving us rather than replacing us. Allowing a seemless transition between what is human and what is computer. It demands better interaction between the human and computer, and better understanding between them.

However, this leads to another affect that could be seen as ‘creepy AI’. When the transition between computer and human is done well, it can be difficult to see when the human stops and the machine learning starts. Learning systems are already very capable of understanding our personalities and desires. They do this in very different ways to how humans do it (see self perception above!). They use large amounts of data about our previous behaviour and that of other humans to make predictions about our future behaviour. This can be seen as creepy. How do we avoid this? We need to improve people’s understanding of when AI is being used and what it is doing, improve their ability to control it. Improving our control of our data and developing legislation to protect us are things I think we need to do to address that.

We can avoid AI being creepy by remaining open to debate, understanding what users want, but also giving them what they need. In the long term they need a better understanding of our methodologies and their implications, as well as better control of how their data is being used. This is one of motivations of our open data science agenda.

Questions from the Audience

There were several questions from the audience, but the two that stuck out most for me were from Uli von Luxburg and Chris Watkins. Uli asked if we had a responsibility to worry about the moral side when developing these methods. I believe she phrased her question as to how much we should be worrying about ‘creepy AI’. I didn’t get my answer in initially, and before I could there was a follow up question from Chris about how we deal with the natural data monopoly. I’ve addressed these ideas before in the digital oligarchies post. Uli’s question is coming up more often, and a common answer to it is “this is something for society to decide”. I want to react strongly against that answer. Society is made up of people, who include experts. Those experts have a deeper understanding of the issues and implications than the general population. It’s true that there are philosophers and social scientists who can make important contributions to the debate, but it’s also true that amongst those with the best understanding of the implications of technology are those who are expert in it. If some of us don’t engage in the debate, then others will fill the vacuum. Uli’s question was probably more about whether an individual researcher should worry about these issues, rather than whether we should engage in debate. However, even if we don’t choose to contribute to the debate, I feel there is an obligation on us to be considering these issues in our research. In particular, the challenges we are creating by developing and sharing these technologies will require technological solutions as well as legislative change. These go hand in hand. Certainly those of us who are academics, and funded by the public, would not be doing our job well if we weren’t anticipating these needs and driving the technology towards answering them.

The good news is as follows, meetings like DALI are excellent for having such debates and engaging with different communities. I think when Bernhard initially envisaged the meeting, this atmosphere was what he was hoping for. That is also what got Thomas, Zoubin and myself excited about it. I think the meeting really achieved that.

The Meeting as a Whole

I haven’t mentioned too much of the thoughts as others, because they were offered informally, and often as a means to developing debate, but if I’ve misrepresented anything above please feel free to comment below. I also apologise for omitting all the interesting ideas others spoke about, but again I didn’t want to endanger the open atmosphere of the meeting by mistakenly misrepresenting someone else’s point of view (which may also have been presented in the spirit of the devil’s advocate). I think the meeting was a great success and we were already talking about venue for next year.

Legislation for Personal Data: Magna Carta or Highway Code?

Karl Popper is perhaps one of the most important thinkers from the 20th century. Not purely for his philosophy of science, but for giving a definitive answer to a common conundrum: “Which comes first, the chicken or the egg?”. He says that they were simply preceded by an ‘earlier type of egg’. I take this to mean that the answer is neither: they actually co-evolved. What do I mean by co-evolved? Well broadly speaking there once were two primordial entities which weren’t very chicken-like or egg-like at all, over time small changes occurred, supported by natural selection, rendering those entities unrecognisable from their origins into two of our most familiar foodstuffs of today.

I find the process of co-evolution remarkable, and to some extent unimaginable, or certainly it seems to me difficult to visualise the intermediate steps. Evolution occurs by natural selection: selection by the ‘environment’, but when we refer to co-evolution we are clarifying that this is a complex interaction. The primordial entities effect the environment around them, therefore changing the ‘rules of the game’ as far as survival is concerned. In such a convolved system certainties about the right action disappear very quickly.

What use are chickens and eggs when talking about personal data? Well, Popper used the question to illustrate a point about scientific endeavour. He was talking about science and reflecting on how scientific theories co-evolve with experiments. However, that’s not the point I’d like to make here. Co-evolution is very general, one area it arises is when technological advance changes society to such an extent that existing legislative frameworks become inappropriate. Tim Berners Lee has called for a Magna Carta for the digital age, and I think this is a worthy idea, but is it the right idea? A digital bill of rights may be the right idea in the longer run, but I don’t think we are ready to draft it yet. My own research is machine learning, the main technology underpinning the current AI revolution. A combination of machine learning, fast computers, and interconnected data means that the technological landscape is changing so fast that it is effecting society around us in ways that no one envisaged twenty years ago.

Even if we were to start with the primordial entities that presaged the chicken and the egg, and we knew all about the process of natural selection, could we have predicted or controlled the animal of the future that would emerge? We couldn’t have done. The chicken exists today as the product of its environmental experience, an experience that was unique to it. The end point we see is one of is highly sensitive to very small perturbations that could have occurred at the beginning.

So should we be writing legislation today which ties down the behaviour of future generations? There is precedent for this from the past. Before the printing press was introduced, no one would have begrudged the monks’ right to laboriously transcribe the books of the day. Printing meant it was necessary to protect the “copy rights” of the originator of the material. No one could have envisaged that those copyright laws would also be used to protect software, or digital music. In the industrial revolution the legal mechanism of ‘letters patent’ evolved to protect creative insight. Patents became protection of intellectual property, ensuring that inventors’ ideas could be shared under license. These mechanisms also protect innovation in the digital world. In some jurisdictions they are now applied to software and even user interface designs. Of course even this legislation is stretched in the face of digital technology and may need to evolve, as it has done in the past.

The new legislative challenge is not in protecting what is innovative about people, but what is commonplace about them. The new value is in knowing the nature of people: predicting their needs and fulfilling them. This is the value of interconnection of personal data. It allows us to make predictions about an individual by comparing him or her to others. It is the mainstay of the modern internet economy: targeted advertising and recommendation systems. It underpins my own research ideas in personalisation of health treatments and early diagnosis of disease. But it leads to potential dangers, particularly where the uncontrolled storage and flow of an individual’s personal information is concerned. We are reaching the point where some studies are showing that computer prediction of our personality is more accurate than that of our friends and relatives. How long before an objective computer prediction of our personality can outperform our own subjective assessment of ourselves? Some argue those times are already upon us. It feels dangerous for such power to be wielded unregulated by a few powerful groups. So what is the answer? New legislation? But how should it come about?

In the long term, I think we need to develop a set of rules and legislation, that include principles that protect our digital rights. I think we need new models of ownership that allow us to control our private data. One idea that appeals to me is extending data protection legislation with the right not only to view data held about us, but to also ask for it to be deleted. However, I can envisage many practical problems with that idea, and these need to be resolved so we can also enjoy the benefits of these personalised predictions.

As wonderful as some of the principles in the Magna Carta are, I don’t think it provides a good model for the introduction of modern legislation. It was actually signed under duress: under a threat of violent revolution. The revolution was threatened by a landed gentry, although the consequences would have been felt by all. Revolutions don’t always end well. They occur because people can become deadlocked: they envisage different futures for themselves and there is no way to agree on a shared path to different end points. The Magna Carta was also a deal between the king and his barons. Those barons were asking for rights that they had no intention of extending within their fiefdoms. These two characteristics: redistribution of power amongst a powerful minority, with significant potential consequences for the a disenfranchised majority, make the Magna Carta, for me, a poor analogy for how we would like things to proceed.

The chicken and the egg remind us that the actual future will likely be more remarkable than any of us can currently imagine. Even if we all seek a particular version of the future this version of the future is unlikely to ever exist in the form that we imagine. Open, receptive and ongoing dialogue between the interested and informed parties is more likely to bring about a societal consensus. But can this happen in practice? Could we really evolve a set of rights and legislative principles which lets us achieve all our goals? I’d like to propose that rather than taking as our example a mediaeval document, written on velum, we look to more recent changes in society and how they have been handled. In England, the Victorians may have done more than anyone to promote our romantic notion of the Magna Carta, but I think we can learn more by looking at how they dealt with their own legislative challenges.

I live in Sheffield, and cycle regularly in the Peak District national park. Enjoyment of the Peak Park is not restricted to our era. At 10:30 on Easter Monday in 1882 a Landau carriage, rented by a local cutler, was heading on a day trip from Sheffield to the village of Tideswell, in the White Peak. They’d left Sheffield via Ecclesall Road, and as they began to descend the road beneath Froggatt Edge, just before the Grouse Inn they encountered a large traction engine towing two trucks of coal. The Landau carriage had two horses and had been moving at a brisk pace of four and a half miles an hour. They had already passed several engines on the way out of Sheffield. However, as they moved out to pass this one, it let out a continuous blast of steam and began to turn across their path into the entrance of the inn. One of the horses took fright pulling the carriage up a bank, throwing Ben Deakin Littlewood and Mary Coke Smith from the carriage and under the wheels of the traction engine. I cycle to work past their graves every day. The event was remarkable at the time, so much so that is chiselled into the inscription on Ben’s grave.

The traction engine was preceded, as legislation since 1865 had dictated, by a boy waving a red flag. It was restricted to two and a half miles an hour. However, the boy’s role was to warn oncoming traffic. The traction engine driver had turned without checking whether the road was clear of overtaking traffic. It’s difficult to blame the driver though. I imagine that there was quite a lot involved in driving a traction engine in 1882. It turned out that the driver was also preoccupied with a broken wheel on one of his carriages. He was turning into the Grouse to check the wheel before descending the road.

This example shows how legislation can sometimes be extremely restrictive, but still not achieve the desired outcome. Codification of the manner in which a vehicle should be overtaken came later, at a time when vehicles were travelling much faster. The Landau carriage was overtaking about 100 meters after a bend. The driver of the traction engine didn’t check over his shoulder immediately before turning, although he claimed he’d looked earlier. Today both drivers’ responsibilities are laid out in the “Highway Code”. There was no “Mirror, Signal, Manoeuvre” in 1882. That came later alongside other regulations such as road markings and turn indicators.

The shared use of our road network, and the development of the right legislative framework might be a good analogy for how we should develop legislation for protecting our personal privacy. No analogy is ever perfect, but it is clear that our society both gained and lost through introduction of motorised travel. Similarly, the digital revolution will bring advantages but new challenges. We need to have mechanisms that allow for negotiated solutions. We need to be able to argue about the balance of current legislation and how it should evolve. Those arguments will be driven by our own personal perspectives. Our modern rules of the road are in the Highway Code. It lists responsibilities of drivers, motorcyclists, cyclists, mobility scooters, pedestrians and even animals. It gives legal requirements and standards of expected behaviour. The Highway Code co-evolved with transport technology: it has undergone 15 editions and is currently being rewritten to accommodate driverless cars. Even today we still argue about the balance of this document.

In the long term, when technologies have stabilised, I hope we will be able to distill our thinking to a bill of rights for the internet. But such a document has a finality about it which seems inappropriate in the face of technological uncertainty. Calls for a Magna Carta provide soundbites that resonate and provide rallying points. But they can polarise, presaging unhelpful battles. Between the Magna Carta and the foundation of the United States the balance between the English monarch and his subjects was reassessed through the English Civil War and the American Revolution. Wven tI don’t think we can afford such discord when drafting the rights of the digital age. We need mechanisms that allow for open debate, rather than open battle. Before a bill of rights for the internet, I think we need a different document. I’d like to sound the less resonant call for a document that allows for dialogue, reflecting concerns as they emerge. It could summarise current law and express expected standards of behaviour. With regular updating it would provide an evolving social contract between all the users of the information highway: people, governments, businesses, hospitals, scientists, aid organisations. Perhaps instead of a Magna Carta for the internet we should start with something more humble: the rules of the digital road.

This blog post is an extended version of an written for the Guardian’s media network: “Let’s learn the rules of the digital road before talking about a web Magna Carta”

NIPS Experiment Analysis

Sorry for the relative silence on the NIPS experiment. Corinna and I have both done some analysis on the data. Over the Christmas break I focussed an analysis on the ‘raw numbers’ which people have been discussing. In particular I wanted to qualify the certainties that people are placing on these numbers. There are a couple of different ways of doing this, bootstrap, or a Bayesian analysis. I went for the latter. Corinna has also been doing a lot of work on how the scores correlate, and the ball is in my court to pick up on that. However, before doing that I wanted to make the initial Bayesian analysis of the data. In doing so, we’re also releasing a little bit more information on the numbers.

Headline figure is that if we re-ran the conference we would expect anywhere between 38% and 64% of the same papers to have been presented again. This is the figure that several commentators mentioned that is the one attendees are really interested in. Of course, when you think about it, you also realise it is a difficult figure to estimate because you reduce the power of the study because the figure is based only on papers which had at least one accept or more (rather than the full 168 papers used in the study).

Anyway details of the Bayesian analysis are available in a Jupyter notebook on github.

Proceedings of Machine Learning Research

Back in 2006 when the wider machine learning community was becoming aware of Gaussian processes (mainly through the publication of the Rasmussen and WIlliams book). Joaquin Quinonero Candela, Anton Schwaighofer and I organised the Gaussian Processes in Practice workshop at Bletchley Park. We planned a short proceedings for the workshop, but when I contacted Springer’s LNCS proceedings, a rather dismissive note came back with an associated prohibitive cost. Given that the ranking of LNCS wasn’t (and never has been) that high, this seemed a little presumptuous on their part. In response I contacted JMLR and asked if they’d ever considered a proceedings track. The result was that I was asked by Leslie Pack Kaelbling to launch the proceedings track.

JMLR isn’t just open access, but there is no charge to authors. It is hosted by servers at MIT and managed by the community.

We launched the proceedings in March 2007 with the first volume from the Gaussian Processes in Practice workshop. Since then there have been 38 volumes including two volumes in the pipeline. The proceedings publishes several leading conferences in machine learning including AISTATS, COLT and ICML.

From the start we felt that it was important to share the branding of JMLR with the proceedings, to show that the publication was following the same ethos as JMLR. However, this led to the rather awkward name: JMLR Workshop and Conference Proceedings, or JMLR W&CP. Following discussion with the senior editorial board of JMLR we now feel the time is right to rebrand with the shorter “Proceedings of Machine Learning Research”.

As part of the rebranding process the editorial team for the Proceedings of Machine Learning Research (which consists of Mark Reid and myself) is launching a small consultation exercise looking for suggestions on how we can improve the service for the community. Please feel free to leave comments on this blog post or via Facebook or Twitter to let us have feedback!

Can you select for ‘robustness’?

20150315_165626
My mum and son ensuring preparing the ground for non-robust seeds

Was at the allotment the other day, and my son Frederick asked how the seeds we plant could ever survive when it took so much work and preparation to plant and support them. I said it was because they’ve been selected (by breeding) to produce high yield, and that tends to make them less robust (in comparison to e.g. weeds). So he asked why don’t we breed in robustness. I instinctively said that you can’t do that, because breeding involves selecting for a characteristic, whereas (I think) robustness implies performance under a range of different conditions, some of which will not even be known to us. Of course, I agree you can breed in resistance to a particular circumstance, but I think robustness is about resistance to many circumstances. I think a robust population will include wide variation in characteristics, whereas selection by breeding tends to refine the characteristics, reducing variation. My reply was instinctive, but I think it’s broadly speaking correct, although it would be nice to find some counter examples!

Beware the Rise of the Digital Oligarchy

The Guardian’s media network published a short article I wrote for them on 5th March. They commissioned an article of about 600 words, that appeared on the Guardian’s site, but the original version I wrote was around 1400. I agreed a week’s exclusivity with the Guardian, but now that’s up, the longer version is below (it’s about twice as long).

On a recent visit to Genova, during a walk through the town with my colleague Lorenzo, he pointed out what he said was the site of the world’s first commercial bank. The bank of St George, located just outside the city’s old port, grew to be one of the most powerful institutions in Europe, it bankrolled Charles V and governed many of Genova’s possessions on the republic’s behalf. The trust that its clients placed in the bank is shown in records of its account holders. There are letters from Christopher Columbus to the bank instructing them in the handling of his affairs. The influence of the bank was based on the power of accumulated capital. Capital they could accumulate through the trust of a wealthy client base. The bank was so important in the medieval world that Machiavelli wrote that “if even more power was ceded by the Genovan republic to the bank, Genova would even outshine Venice amongst the Italian city states.” The Bank of St George was once one of the most influential private institutions in Europe.

Today the power wielded by accumulated capital can still dominate international affairs, but a new form of power is emerging, that of accumulated data. Like Hansel and Grettel trailing breadcrumbs into the forest, people now leave a trail of data-crumbs wherever we travel. Supermarket loyalty cards, text messages, credit card transactions, web browsing and social networking. The power of this data emerges, like that of capital, when it’s accumulated. Data is the new currency.

Where does this power come from? Cross linking of different data sources can give deep insights into personality, health, commercial intent and risk. The aim is now to understand and characterize the population, perhaps down to the individual level. Personalization is the watch word for your search results, your social network news feed, your movie recommendations and even your friends. This is not a new phenomenon, psychologists and social scientists have always attempted to characterize the population, to better understand how to govern or who to employ. They acquired their data by carefully constructed questionnaires to better understand personality and intelligence. The difference is the granularity with which these characterizations are now made, instead of understanding groups and sub-groups in the population, the aim is to understand each person. There are wonderful possibilities, we should  better understand health, give earlier diagnoses for diseases such as dementia and provide better support to the elderly and otherwise incapacitated people. But there are also major ethical questions, and they don’t seem to be adequately addressed by our current legal frameworks. For Columbus it was clear, he was the owner of the money in his accounts. His instructions to the bank tell them how to distribute it to friends and relations. They only held his capital under license. A convenient storage facility. Ownership of data is less clear. Historically, acquiring data was expensive: questionnaires were painstakingly compiled and manually distributed. When answering, the risk of revealing too much of ourselves was small because the data never accumulated. Today we leave digital footprints in our wake, and acquisition of this data is relatively cheap. It is the processing of the data that is more difficult.

I’m a professor of machine learning. Machine learning is the main technique at the heart of the current revolution in artificial intelligence. A major aim of our field is to develop algorithms that better understand data: that can reveal the underlying intent or state of health behind the information flow. Already machine learning techniques are used to recognise faces or make recommendations, as we develop better algorithms that better aggregate data, our understanding of the individual also improves.

What do we lose by revealing so much of ourselves? How are we exposed when so much of our digital soul is laid bare? Have we engaged in a Faustian pact with the internet giants? Similar to Faust, we might agree to the pact in moments of levity, or despair, perhaps weakened by poor health. My father died last year, but there are still echoes of him on line. Through his account on Facebook I can be reminded of his birthday or told of common friends. Our digital souls may not be immortal, but they certainly outlive us. What we choose to share also affects our family: my wife and I may be happy to share information about our genetics, perhaps for altruistic reasons, or just out of curiosity. But by doing so we are also sharing information about our children’s genomes. Using a supermarket loyalty card gains us discounts on our weekly shop, but also gives the supermarket detailed information about our family diet. In this way we’d expose both the nature and nurture of our children’s upbringing. Will our decisions to make this information available haunt our children in the future? Are we equipped to understand the trade offs we make by this sharing?

There have been calls from Elon Musk, Stephen Hawking and others to regulate artificial intelligence research. They cite fears about autonomous and sentient artificial intelligence that  could self replicate beyond our control. Most of my colleagues believe that such breakthroughs are beyond the horizon of current research. Sentient intelligence is  still not at all well understood. As Ryan Adams, a friend and colleague based at Harvard tweeted:

Personally, I worry less about the machines, and more about the humans with enhanced powers of data access. After all, most of our historic problems seem to have come from humans wielding too much power, either individually or through institutions of government or business. Whilst sentient AI does seem beyond our horizons, one aspect of it is closer to our grasp. An aspect of sentient intelligence is ‘knowing yourself’, predicting your own behaviour. It does seem to me plausible that through accumulation of data computers may start to ‘know us’ even better than we know ourselves. I think that one concern of Musk and Hawking is that the computers would act autonomously on this knowledge. My more immediate concern is that our fellow humans, through the modern equivalents of the bank of St George, will be exploiting this knowledge leading to a form of data-oligarchy. And in the manner of oligarchies, the power will be in the hands of very few but wielded to the effect of many.

How do we control for all this? Firstly, we need to consider how to regulate the storage of data. We need better models of data-ownership. There was no question that Columbus was the owner of the money in his accounts. He gave it under license, and he could withdraw it at his pleasure. For the data repositories we interact with we have no right of deletion. We can withdraw from the relationship, and in Europe data protection legislation gives us the right to examine what is stored about us. But we don’t have any right of removal. We cannot withdraw access to our historic data if we become concerned about the way it might be used. Secondly, we need to increase transparency. If an algorithm makes a recommendation for us, can we known on what information in our historic data that prediction was based? In other words, can we know how it arrived at that prediction? The first challenge is a legislative one, the second is both technical and social. It involves increasing people’s understanding of how data is processed and what the capabilities and limitations of our algorithms are.

There are opportunities and risks with the accumulation of data, just as there were (and still are) for the accumulation of capital. I think there are many open questions, and we should be wary of anyone who claims to have all the answers. However, two directions seem clear: we need to both increase the power of the people; we need to develop their understanding of the processes. It is likely to be a fraught process, but we need to form a data-democracy: data governance for the people by the people and with the people’s consent.

Neil Lawrence is a Professor of Machine Learning at the University of Sheffield. He is an advocate of “Open Data Science” and an advisor to a London based startup, CitizenMe, that aims to allow users to “reclaim their digital soul”.

Questions on Deep Gaussian Processes

I was recently contacted by Chris Edwards, he’s putting together an article for Communications of the ACM on Deep Learning and had a few questions on deep Gaussian processes. He kindly agreed to let me use his questions and my answers in a blog post.
1) Are there applications that suit Gaussian processes well? Would they typically replace the neural network layers in a deep learning system or would they possibly be mixed and matched with neural layers, perhaps as preprocessors or using the neural layers for stuff like feature extraction (assuming that training algorithms allow for this)?
Yes, I think there are applications that suit Gaussian processes very well. In particular applications where data is scarce (this doesn’t necessarily mean small data sets, but when data is scarce relative to the complexity of the system being modeled). In these scenarios, handling uncertainty in the model appropriately becomes very important. Two examples which have exploited this characteristic in practice are GaussianFace by Lu & Tang, and Bayesian optimization (e.g. Snoek, Larochelle and Adams). Almost all my own group’s work also exploits this characteristic. A further manifestation of this effect is what I call “massively missing data”. Although we are getting a lot of data at the moment, when you think about it you realise that almost all the things we would like to know are still missing almost all of the time. Deep models have performed well in situations where data sets are very well characterised and labeled. However, one of the domains that inspires me is clinical data where this isn’t the case. In clinical data most people haven’t had most clinical tests applied to them most of the time. Also, the nature of clinical tests evolve (as do the diseases that affect patients). This is an example of massively missing data. I think Gaussian processes provide a very promising approach to handling this data.
With regard to whether they are a replacement for deep neural networks, I think in the end they may well be mixed and matched. From a Gaussian process perspective the neural network layers could be seen as a type of ‘mean function’ (a Gaussian process is defined by its mean function and its covariance function). So they can be seen as part of the deep GP framework: deep Gaussian processes enhance the toolkit available. So there is no conceptual reason why they shouldn’t be mixed and matched. I think you’re quite right that it might be that the low level feature extraction is still done by parametric models like neural networks, but it’s certainly important that we use the right techniques in the right domains and being able to interchange ideas enables that.
2) Are there training algorithms that allow Gaussian processes to be used today for deep-learning type applications or is this where work needs to be done?
There are algorithms, yes, we have three different approaches right now and its also clear that work in doubly stochastic variational inference (see for example Kingma and Welling  or Rezende, Mohamed and Wierstra) could also be applicable. But more work still needs to be done. In particular, a lot of the success of deep learning has been down to the engineering of the system. How to implement these models on GPUs and scale them to billions of data. We’ve been starting to look at this (Dai, Damianou, Hensman and Lawrence) but there’s no doubt we are far behind and it’s a steep learning curve! We also don’t have quite the same computational resource of Facebook, Microsoft and Google!
3) Is the computational load similar to that of deep-learning neural networks or are the applications sufficiently different that a comparison is meaningless?
We carry an additional algorithmic burden, that of propagating uncertainty around the network. This is where the algorithmic problems begin, but is also where we’ve had most of the breakthroughs. Propagating this uncertainty will always come with an additional load for a particular network, but it has particular advantages like dealing with the massively missing data I mentioned above and automatic regularisation of the system. This has allowed us to automatically determine aspects like the number of layers in the network and the number of hidden nodes in each layer. This type of structural learning is very exciting and was one of the original motivations for considering these models. This has enabled us to develop variants of Gaussian processes that can be used for multiview learning (Damianou, Ek, Titsias and Lawrence), we intend to apply these ideas to deep GPs also.
4) I think I saw a suggestion that GPs are reasonably robust when trained with small datasets – do they represent a way in for smaller organisation without bags of data? Is access to data a key problem when dealing with these data science techniques?
I think it’s a very good question, it’s an area we’re particularly interested in addressing. How can we bring data science to smaller organisations? I think it might relates to our ‘open data science’ initiative (see this blog post here). I refer to this idea as ‘analysis empowerment’. However, I hadn’t particularly thought deep GPs in this way before, but can I hazard a possible yes to that? Certainly with GaussianFace we saw they could outperform DeepFace (from Facebook) with a small fraction of the data. For us it wasn’t the main motivation for developing deep GPs, but I’d like to think it might be a characteristic of the models. The motivating examples we have are more in the domain of applications that the current generation of supervised deep learning algorithms can’t address: like interconnection of data sets in health. Many of my group’s papers are about interconnecting different views of the patient (genotype, environmental background, clinical data, survival information … with luck even information from social networks and loyalty cards). We approach this through Gaussian process frameworks to ensure that we can build models that will be fully interconnected in application. We call this approach “deep health”. We aren’t there yet, but I feel there’s a lot of evidence so far that we’re working with a class of models that will do the job. My larger concern is the ethical implications of pulling this scale and diversity of information together. I find the idea of a world where we have computer models outperforming humans in predicting their own behavior (perhaps down to the individual) quite disturbing. It seems to me that now the technology is coming within reach, we need to work hard to also address these ethical questions. And it’s important that this debate is informed by people who actually understand the technology.
5) On a more general point that I think can be explored within this feature, are techniques such as Gaussian processes at a disadvantage in computer science because of their heavy mathematical basis? (I’ve had interviews with people like Donald Knuth and Erol Gelenbe in the past where the idea has come up that computer science and maths should, if not merge, interact a lot more).
Yes, and no. It is true that people seem to have some difficulty with the concept of Gaussian processes. But it’s not that the mathematics is more complex than people are using (at the cutting edge) for deep neural networks. Any of the researchers leading the deep revolution could easily turn their hands to Gaussian processes if they chose to do so. Perhaps at ‘entry’ the concepts seem simpler in deep neural networks, but as you peer ‘deeper’ (forgive the pun) into those models it actually becomes a lot harder to understand what’s going on. The leading people (Hinton, Bengio, LeCun, etc) seem to have really good intuitions, but these are not always easy to teach. Certainly when Geoff Hinton explains something to me I always feel I’ve got a very good grasp of it at the time, but later, when I try and explain the same concept to someone else, I find I can’t always do it (i.e., he’s got better intuitions than me, and he’s better at explaining than I am). There may be similar issues for explaining deep GPs, but my hope is that once the conceptual hurdle of a GP is surmounted, the resulting models are much easier to analyze. Such analysis should also feed back into the wider deep learning community. I’m pleased that this is already starting to happen (see Duvenaud, Rippel, Adams and Ghahramani). Gaussian processes also generalise many different approaches to learning and signal processing (including neural networks), so understanding Gaussian processes well gives you an ‘in’ for many different areas. I agree, though, that the perception in the wider community matches your analysis. This is a major reason for the program of summer schools we’ve developed in Gaussian Processes. So far we’ve taught over 200 students, and we have two further schools planned for 2015 with a developing program for 2016. We’ve made material freely available on line including lectures (on YouTube) and lab notes. So I hope we are doing something to address the perception that these models are harder mathematically!
I totally agree on the Maths/CS interface. It is, however, slightly frustrating (and perhaps inevitable) how much different academic disciplines become dominated by a particular culture of research. This can create barriers, particularly when it comes to formal publication (e.g. in the ‘leading’ journals). My group’s been working very hard over the last decade to combat this through organization of workshops and summer schools that bridge the domains. It always seems to me that meeting people face to face helps us gain a shared understanding. For example, a lot of confusion can be generated by the slightly different ways we use technical terminology, it leads to a surprising number of misunderstandings that do take time to work through. However, through these meetings I’ve learned an enormous amount, particularly from the statistics community. Unfortunately, formal outlets and funding for this interface are still surprisingly difficult to find. This is not helped by the fact that the traditional professional societies don’t necessarily bridge the intellectual ground and sometimes engage in their own fights for territory. These cultural barriers also spill over into organization of funding. For example, in the UK it’s rare that my grant proposals are refereed by colleagues from Maths/Stats community or that their grant proposals are refereed by me. They actually go two totally separate parts of the relevant UK funding body. As a result both sets of proposals can be lost in the wider Maths and CS communities, which is not always conducive to expanding the interface. In the UK I’m hoping that the recent founding of the Alan Turing Institute will cause a bit of a shake up in this area, and that some of these artificial barriers will fall away. But in summary, I totally agree with the point, but also recognize that on both sides of the divide we have created communities which can make collaboration harder.