Review of Super Intelligence

Perhaps the most often quoted academic source on the potential dangers of artificial intelligence is Nick Bostrom, author of the book “Superintelligence”. Although I’ve never seen myself as an AI researcher, it seems machine learning is the new AI, and there are direct consequences of conflating these ideas, particularly in the wider public mind where our technologies are seen as artificial intelligence. Therefore, whether he intended it to be or not, Nick’s book is, too a large extent, perceived as being about machine learning.

There are some consequences for this situation, in that Nick’s background is as a philosopher, not a social scientist or a machine learning researcher, yet his written word seems to be quoted as gospel for this area. With this in mind it seemed important to read the book.

The book makes a lot of use of technical jargon, now given that I have a technical bent, this isn’t a particular problem for me, but it does make it difficult to read. Mainly because it seems the jargon is often domain specific (e.g. “Bayesian rational agent”), however the way it’s wielded in the book is in general terms. Now a Bayesian rational agent is a particular technical idea, but when it is deployed in a more general context it can give the dressings of technical validation of an often woolly and poorly defined argument. Superintelligence has a repeated tendency to resort to this approach.

For the general reader this may have the effect of providing a layer of technical plausibility (although I suspect it makes it difficult to read). Conversely, for technical experts, this undermines the credibility of the technical arguments.

The themes are somewhat popular, indeed some of the ideas resonate in the manner of envisaged futures, just as Metropolis or Terminator resonated in their time. But the book fails to provide much more than a few resonant ideas with a superficial technical overlay. There is a pseudoplausibility about some of the arguments, but when subject to deeper scrutiny, they collapse like a house of cards.

I may be being unfair, because I appreciate that Nick is looking further into the future than I would normally be comfortable with, but he does so by attempting to construct narrative threads that build on a causal chains of events which seem to me to be particularly fragile.

I persevered with the book mainly in the hope that there would arise a point in the work when the plausibility of the route by which we would travel would be rendered uneccessary by intelligent commentary of the nature of the destination. But the final destination that Nick is interested in is that of a ‘cognitive galaxy’ populated by human intelligences. I began to suspect that the book is more about Bostrom’s envisaged future rather than the tangiable challenges of artificial intelligence in the near future and the likely actual future.

Potential Futures

As a first reflection, when we look to the past, we see that people were normally overly optimistic about how rapidly new advances would be assimilated. Xerox PARC, for example, had the idea that the office of the future would be paperless, a sensible projection, but before it came about (indeed it’s not quite here yet) there was an enormous proliferation of the use of paper, so the demand increased. In a similar way, in computational biology researchers have suggested that computational techniques would obviate the need for biological experiment, whereas the reality is that predictions require validation and as the complexity of predictions increases we become more and more reliant on advances in biological experiment side to verify.

Technological progress occurs through a complex dance between research, development and application. Each feeds the other in determining what is currently implementable, near term achievable and longer term conceivable. The back story of Superintelligence is, instead, developed through ‘the singularity’. An idea that seems to have taken on almost *** aura, a new religion for the areligious.

The notion of the singularity is that once we have a computer that is more intelligent than us, it will create other computers which are in turn more intelligent which in turn will create more intelligent machines. This leads to intelligence beyond our imagination.

This isn’t the first time that such a phenomenon has existed. Early engines and machines had hand made parts (screw threads were each hand turned, with only one nut fitting a bolt before Whitworth’s standardisation). As machining became mechanised and standardised we were able to develop higher precision tools which in turn could machine at higher precision. A similar singularity could have been envisaged, where each successive machine could approach ‘zero tolerance’. Of course, in fact, you never get as close as you hope because the challenge becomes harder and harder. There are some initial easy wins, but you just can’t use a classical mechanical device to shave off an atom. There are typically physical limits. We may not yet know the precise nature of all those limits, just as 19th century machinists didn’t yet know about atoms or quantum indecideability.

So we might expect limits to both the capabilities of intelligences we can create and limits to a computer’s ability to construct such intelligences. Some of these limitations arise from areas that are omissions in the book (for example uncertainty), but another challenge of the book is that the nature of the intelligence itself is never properly defined. Superintelligence is defined. It’s defined as a capability which dominates human intelligence,[^1] but intelligence itself is left undefined. This means that the initial premises that underpin the threads of the narrative Bostrom develops are extremely elusive. Some possibilities are discounted (such as hybrid human-computer systems) while other possibilities are exhaustively explored (such as whole brain emulation).

Whole Brain Emulation

This requires a premise that suggests we’d be able to simulate entire brains and yet we wouldn’t not be able to develop direct interfaces to them. This significantly misunderstands the nature of a simulation. To simulate at the required level of abstraction for each part of the brain we would have to have an incredibly deep understanding of what parts of the brain are relevant to intelligence and which are not. Much of this understanding is only likely to come about with direct measurement and intervention on brains which would require hybrid-like systems for experimentation. So hybrid systems are a necessary stop on the journey to developing whole brain emulation.

There are several through as SIngularity: had these arguments before, better machines could machine better parts which could make better machines which could make better parts. It doesn’t create a ‘singularity’, it creates rapid advance, but there’s typically a theoretical limit. … does it also apply to computers designing computers??

Something very similar is likely to happen with artificial intelligence technologies. As we develop them further, we will likely require more sophistication from our human side. For example, we won’t be able to replace doctors, but we will need doctors who have a more sophisticated understanding of interpretation of high resolution genetic testing. An ability to assimilate that understanding with their other knowledge.

One term that is regularly discussed is “AI Safety”. This seems to be quite an emotive term, with fears about embodied intelligences with their own independent ‘final goals’ dominating Bostrom’s thinking. Consequences of what happens when these goals are extrapolated.

Bostrom offers us a form of technopop philosophy, which builds on popular ideas (such as those explored by Asimov or Clarke) and combines them with a superficial technical basis. But the technical basis is often misdeployed or sometimes not deployed according to the convenience of the argument.

Let me give you an example, Bostrom hardly makes use of uncertainty in describing intelligence. In my own approaches uncertainty and correct handling of uncertainty is criticial. However, by ignoring it Bostrom can give the impression that a superintelligence would act with unerving confidence. The only point where I recollect uncertainty is really mentioned is when Bostrom refers to how he things a rational Bayesian agent would respond to being given a goal. Bostrom suggests that due to uncertainty it would believe it had never achieved its goal and continue to consume world resource in an effort to do so.

This idea of a idiot savant, is a convient combination of aspects of human and machine to bring about a terrifying consequence. It provides an interesting narrative, and in the manner of an Ian Fleming novel, it’s littered with technical detail to increase the plausibility of the reader. However, in the same way that so many of Blofeld’s schemes are quite fragile when exposed to deeper analysis, many of Bostrom’s ideas are as well. ¹

I don’t really think Bostrom actually ever gives a satisfactory definition of intelligence. Superintelligence is defined as outperforming humans in every intelligent capability (a term we would refer to as dominance in multi-objective optimisation). So a superintelligence is a dominating solution over human intelligence. However, if human intelligence isn’t defined then that definition is a little diffuse.

I like the following definition of intelligence: “Use of information to take decisions which save energy”. Here by information I might mean data or facts or rules. And saving energy I mean saving ‘free’ energy.¹

Accepting the lack of definition of intelligence, we can still consider the routes to superintelligence Bostrom proposes. He is talking about 30 year timescales, these are timescales which are difficult to predict over. And I think it is admirable that Nick is trying to address this, but I’m also keen to ensure that particular ideas which seem very implausible don’t become a particular meme in this debate.

Indeed, I think it’s worse than that, because many of the ideas, if not all are implausible. They are implausible at some different levels, let me start with a detailed criticism, which is highlights the way in which some of the threads of his narratives fall apart, and then turn to a more general criticism which I think does a lot to deflate the whole volume.

I’ve chosen one detailed criticism, as an exemplar, although I think there could be many more. Bostrom dismisses hybrid human-computer systems while taking very seriously the idea of full emmulation of the brain by computer (and achieving superintelligence by speeding up the emmulation or running multiple copies).

If we had the level of understanding we need to fully emulate the brain, then we would be a long way to being able to interface directly with the brain. It appears to me more likely, particularly given the presence of applications in patients with spinal problems, or motor neurone problems, that we will have devloped hybrid systems that interface directly with the brain a long time before we have managed a full emulation of the human brain.

This type of naive idea comes from a lack of understanding of what an emulation would involve. It would not involve an exact simulation of each neuron in the brain down to the quantum level (and if it did, it would be way more computationally demanding than is suggested). It would instead involve some level of abstraction. Abstraction as to what is important in generating our intelligence. Replacing mechanism with the summary measure that the brain finds useful. An understanding of this sort of abstraction is totally missing from the entire text, but is vital in modelling, and I believe intelligence. Such abstractions require a deep understanding of how the brain is working, and such understandings are exactly what Bostrom says are impossible to determine for hybrid systems. So the argument starts by chasing its tail and ends up biting its own arse.

The hybrid systems are important, because they would change the nature of the way society would evolve. If we had such hybrid systems there would certainly be many social issues, but the threats that Bostrom talks about would be different. Firstly, it’ wouldn’t be human vs computer, but augmented human vs computer (my understanding is that a skilled human with a computer can still beat the best chess computers at chess).

Machine intelligence is very different from human intelligence because it is disembodied. In particular, the rate at which computers can talk to each other fare exceeds that with which humans can interact. That is what makes our intelligence special. I’ve referred to it as ‘locked in’ intelligence in a previous blog. Bostrom doesn’t seem to acknowledge this, talking about collective emulation of brains as one potential future. Most of what we do in our brains in terms of communication (which includes second guessing those around us, and embodying their intelligence) is not necessary for a machine intelligence.

This brings me to the second major omission, and this one is ironic, because it is the fuel for the current breakthroughs in articificial intelligence. Those breakthroughs are driven by machine learning. And machine learning is driven by data. Very often our data. Machines do not need to exceed our capabilities in intelligence (become dominant) to have a highly significant social effect. They will already be able to second guess us through access to our data. The deep neural networks of today are not performant because someone did something new and clever. The neural network winter was not some form of oppression. Those methods did not work with the amount of data we had available then. They work with the quantity of data we have now. This is way above any level of data that a human uses to perform similar tasks. So already, the nature of the intelligence around is data dominated. Any future advances will capitalise on this further.

That data comes about because of rapid interconnectivity and high storage. It is the consequence of the successes of the past and it will feed the successes of the future. Because it’s based on data, there is opportunity to control by reformation of our rules on data. At the Future of AI meeting IBM publicly said they had access to over 100 million patients records, which they were then processing with ‘Watson’. They also said that they had obtained the data by purchasing companies. What control do those patients have over their data. Assuming this is the US, the answer is very little. What control do we have over how IBM is processing that data? Can we be sure they are extracting the best value out of it? Do we know they are protecting the privacy of the individuals? Well in actuality I suspect a very large portion of those patients are deceased, but what about the living patients. Where is the moral obligation on IBM to ensure they are using that data for best good? To me IBM Watson is short for IBM “Whats going on?” because it is so heavily marketed and conflated with a jeopardy winning computer that it is impossible for those of us outside to understand what.

is much simpler in its A second major omission is discussion of data. Data and access to data by machine.

Superintelligence

implausible. The arguments are also woolly because the lack of a definition of intelligence means that at any given point he will anthropomorphise the intelligence, or embody the intelligence, to make a particular scenario appear more menacing, and yet later exploit the interconnectivity of the intelligence and the power that comes with it. These interchanges may not be purposeful, but the reflect a lack of clarity in the thinking in the book.

think there is a regular interchange between embodied AI and a distributed intelligence which does not sit weIf I were to characterise the difference in the two meetings, I’d say that the AI debate is more about

Oddly the only mention or consideration of uncertainty seems to come by proposing that the intelligence may not even stop when its goal is complete (better make more paper clips just in case I haven’t really succeeded).

This simplistic thinking may come from a lack of experience in deploying systems in practice. While most of our machine learning systems have objective functions, these do not really map nicely to the idea of a ‘final goal’ and only really are effective for simplistic tasks, such as classification. Perhaps reinforcement learning is closer with its mechanism of reward for doing well, but this also does not map cleanly to a ‘final goal’.

My own belief is that if we are goal driven in our intelligence, then it is by sophisticated goals (akin to multi-objective optimisation) and each of us weight those goals according to sets of values that may themselves evolve. We are sophisticated in this way because our environment itself is evolving, and our ways of behaviour need to also.

Another aspect that isn’t explored is whether there is a fundamental limitation to intelligence. Singularities are often unsustainable in practice because the mechanisms which they exploit are rapidly exhausted after initial launch. My own belief is that we became intelligent through a need to model each other, and ourselves, to perform better planning. That would have evolved to collaborative planning and complex social interactions. The human social system though became continually more complex as we introduced more and more intelligence within the social group. As it becomes more complex it becomes difficult to compute. To project further ahead we need to compress that complexity and abstract it. That may well be going on at an abstract level. But those who have performed time series prediction will know how quickly uncertainty accumulates as you try to look forward. There is often a timeframe ahead of which things become too misty to compute any more. Further computational power doesn’t help you in this instance, uncertainties in the system dominate. If intelligence is viewed as predictive, then this gives limits to how much computation is worth doing.

Bostrom assumes speeding up intelligences will necessarily make them well beyond our comprehension. But this may not be the case, for example, IBM Watson’s Jeopardy win simply stored a lot more knowledge than we can imagine storing, and then used some simplistic techniques from language processing to recover those facts. That is not beyond our comprehension, that

My own belief is that this may well be the case for humans ability to predict each other, given our data constraints and our storage constraints.

Speeding up intelligence is what led to conv nets and recurrent nets working. (Nick Bostrom). Early examples of recurrent nets on auditory, conv on image and even text to speech … all our big successes are just more comptue and more data.

Data

Doughnut centres! Not just in big data at Sheffield, but maybe also with the hype … it may knock out the jammy centre (nod to MIT and Harvard as Boston Kreme).

The need for data scientists will go up in the early stages of AI development, just as it did for paper with computer printers (Xerox) and biological experiment with simulatin. We generate more hypothesis which need more testing. WE generate more analyses which will still need interpretation.

render your physical shell is enough to provoke a serious systems failure for our bodies, high accelleration is enough to cause your eyes to pop out (or

IBM

Strengths: a very well considered company with a large customer base, history of achievement in AI with Deep Blue and Watson in Jeopardy

Weaknesses: an overactive marketing campaign for Watson that makes it sound monolithic and implausible (see also this FT article. This makes the difficult to collaborate with and hard for customers to understand what they are buying in to. Customers who are data aware will also be worried about loss of control.

Opportunities: there are many companies that would rather ignore the data revolution and have someone else take care of it on their behalf. IBM are probably the leaders in providing that service. HP’s struggles to be seen as a competitor in that domain are well documented.

Threats: in the longer term the business strategy of farming out data may prove to be a foolish one, and the market place might shrink.

Apple

Strengths: combines the trick of being a prestige brand with being a best seller. This means they have a large amount of cash available. They seem to be positioing themselves as uninterested in your data. An interesting bet in an era where many of their competitors are busy exploiting it. This could prove to be another feather in the prestigious cap if the public becomes more sensitive to how their data is being used. Despite struggling to recruit directly from ML, due to lack of investment in conferences and a very closed approach to their technology, they have enough money to buy the pick of the available start ups as evidenced by their recent purchases of VocalIQ and Emotient. The recent addition of ad blocking capability to their phones also may indicate that they see direct marketing through data farming as less a part of a futures, and they are happy to spoil it for the other players.

Weaknesses: the current generation of artificial intelligence methods is based on machine learning and almost entirely data dependent. They are seen as a closed company that has only recently started officially appearing at the major conferences, although there are rumors they’ve been attending incognito for years.

One of their prestige offerings is that they don’t want your data.The next strand they are adding to their bow is the prestige of privacy.

Opportunities: Large

To a large extent we’ve heard these ideas before.

Validation of systems by statistics. Compare with Google driverless cars and evolving deployment of Tesla systems.Fairness and trust.

Highlights:Eric Schmidt, we want to solve hard problems like climate change. Thom dietterich in discussion: deep learning isn’t going to solve climate change.

Magpies (fighting over shiny objects) vs crows (sitting intelligently on fences).

Murray Shanhan: disagreement over quality of SR talk: Murray read Nick Bostrom. Murray thought that he got Yann to concede a few points.

Gaussian diagram… things you might expect to hear, and things you actually hear.

Betrand from Brown on the inaccessibility of the value function (over lunch).

Kelly from IBM harping on about Watson. (FT article)

Monologues at NIPS, lack of debate in the futures section. Yann on unsupervised learning, RIch Sutton speaking up for reinforcement learning [^1]: I’m using dominates in a technical sense here, although for once it is not a word that Bostrom uses. I’m extracting it from a a field known as multi-objective optimisation. In multi-objective optimisations you try to improve a system to fulfill more than one objective. Sometimes these objectives can conflict, for example design a car that is both light and strong. If a solution is dominated, it means that there is another care that is either just as light but stronger, or just as strong, but lighter. An intelligence that dominates would be one that outperforms us in each facet of our intelligence. However, unlike in multi-objective optimisation where definition of the objective function is a key step, for intelligence we are left with these facets as undefined.

Heat engines are systems for converting heat into useful work. By my definition intelligence is conducted through an inference engine that is for taking information and using it to conserve work (by avoiding things we didn’t need to do). ↩ ↩²