OpenAI won't benefit humanity without data-sharing

An op-ed piece I wrote inspired by the launch of OpenAI appeared in the Guardian today. Here it is reposted below.

Artificial intelligence experts welcome the launch of the Elon Musk-backed venture, but open algorithms are only the first step

OpenAI is supported by some of Silicon Valley’s biggest names, including Elon Musk (above) and Peter Thiel. Photograph: Francois Mori/AP

There is a common misconception about what drives the digital-intelligence revolution. People seem to have the idea that artificial intelligence researchers are directly programming an intelligence; telling it what to do and how to react. There is also the belief that when we interact with this intelligence we are processed by an “algorithm” – one that is subject to the whims of the designer and encodes his or her prejudices.

OpenAI, a new non-profit artificial intelligence company that was founded on Friday, wants to develop digital intelligence that will benefit humanity. By sharing its sentient algorithms with all, the venture, backed by a host of Silicon Valley billionaires, including Elon Musk and Peter Thiel, wants to avoid the existential risks associated with the technology.

OpenAI’s launch announcement was timed to coincide with this year’s Neural Information Processing Systems conference: the main academic outlet for scientific advances in machine learning, which I chaired. Machine learning is the technology that underpins the new generation of AI breakthroughs.

One of OpenAI’s main ideas is to collaborate openly, publishing code and papers. This is admirable and the wider community is already excited by what the company could achieve.

OpenAI is not the first company to target digital intelligence, and certainly not the first to publish code and papers. Both Facebook and Google have already shared code. They were also present at the same conference. All three companies hosted parties with open bars, aiming to entice the latest and brightest minds.

However, the way machine learning works means that making algorithms available isn’t necessarily as useful as one might think. A machine- learning algorithm is subtly different from popular perception.

Unlike the first generation of AI approaches, in machine-learning decisions aren’t encoded directly by a programmer in the algorithm. Machine decisions result from learning from data. The programmer gives the computer an approach to learning. The computer then combines this with data. This is a little like baking. The programmer provides a recipe, but you don’t turn it into a cake until you actually combine the raw materials.

Just as in baking we don’t have control over how the cake will emerge from the oven, in machine learning we don’t control every decision that the computer will make. In machine learning the quality of the ingredients, the quality of the data provided, has a massive impact on the intelligence that is produced.

For intelligent decision-making the recipe needs to be carefully applied to the data: this is the process we refer to as learning. The result is the combination of our data and the recipe. We need both to make predictions.

By sharing their algorithms, Facebook and Google are merely sharing the recipe. Someone has to provide the eggs and flour and provide the baking facilities (which in Google and Facebook’s case are vast data-computation facilities, often located near hydroelectric power stations for cheaper electricity).

So even before they start, an open question for OpenAI is how will it ensure it has access to the data on the necessary scale to make progress?

The machine-learning ideas that underpin the current revolution in digital intelligence were all academic innovations. What constrained progress was the lack of data and access to large computational facilities. Reading the recipe may whet the appetite, but it doesn’t satiate our hunger.

This is probably why Facebook and Google have so freely shared their methodologies: they know that the real value in their companies is the vast quantities of data they retain about each one of us.

OpenAI has so far focused on recruiting leading young researchers in a sub-field of machine learning known as deep learning. Deep-learning algorithms do involve very complex recipes, and the first round of appointments are already of the highest quality. But to be Michelin starred you also need a source of high-quality local ingredients.

In practice most machine-learning in most companies is not as complex as deep learning. In terms of personnel the requirement is more for pizza-chef than pastry-chef. Algorithms that make money; those that predict your interests and target you with adverts, are fed by vast amounts of data. Google and Facebook reassess your tastes on a daily basis.

It is early days yet, and OpenAI may announce new partnerships with data providers. But opening the benefits of AI to all requires that everyone has a source of high-quality data: like a local farmer’s market.

For truly OpenAI, creating the right infrastructure for sharing data may prove much harder than an infrastructure for sharing recipes.

Neil Lawrence is a professor of machine learning. He was general chair of the 2015 NIPS conference and in 2014 founded the Open Data Science Initiative, which takes seriously the challenges of data availability alongside algorithm development.