Digested Thoughts on the Panel on Future of Deep Learning
What a great privelege it was to be on the ICML panel on the future of Deep Learning on Saturday. It was a fantastic line up, kudos to Max Welling, as the panel chair, and to Kyunghyun Cho who I think did a lot of the work to pull it together. Also thanks to Durk Kingma, Yann, Yoshua and Geoff for pulling the workshop together.
The panel was Demis Hassabis, Yann LeCun, Yoshua Bengio, Kevin Murphy, Jurgen Schmidhuber and myself. To be honest, I think Kevin and I were there to bring some non-deep ballast to the proceedings, which I think was a good idea (not me in particular, but having other voices!). I dabble in deep so I’d see myself as a critical friend.
I would have liked to summarize the panel afterwards, but to be honest the questions came thick and fast and I lost track a bit of the details of what was said, so thanks to Cho for his summary here!
I enjoyed very much sitting next to Jurgen, I’d not met him before (nor Demis) but know the other panelists well. I was quite happy that Jurgen and I sat down together early on. Jurgen is probably the panelist who’s views are most disparate from mine, which made him the most interesting for me (the other views are interesting, it’s just they are more familiar to me!). There was a well documented little spat between Jurgen and some of the other panelists on the ‘history of deep learning’, and I’m certainly not an expert on that. However, I do know that for any story, there are many narratives, and we would do well to take account of all of them before moving forward, but it can take time to familiarise oneself with them all!
Motor Control
So my main digested thought was that I should have emphasised more a point I made during the discussion. Jurgen foresaw simulation of a capuchin monkey as a imminent goal (I think he said 5 or 10 years). He might be right. He also saw that as a step to human intelligence. Again he could be right. However, when I envisage these monkeys … they are easy to envisage if you ever watched early episodes of “Friends” because Ross had one: it was called Marcel. They were also widely used by organ grinders … so think organ grinder’s monkey. However, when I envisage an artificial Marcel, I always want him to be able to climb around in a tree (regardless of burden—he may just have pinched my sandwich or injury—he may have been caught pinching someone else’s sandwich), and I’m not sure that he can do that yet.
It was Carl Rasmussen who first pointed out to me (in a talk at the Bayes 250 Workshop in Edinburgh) that the motor task of moving chess pieces has proved much more challenging to replicate than the solution to the game itself (who would have predicted that in 1950?). I’m not an expert in this domain, but I’ve been keenly following Carl’s work as well as others in this domain. The key from Carl’s point of view is that you need to introduce uncertainty into the system identification. I’d strongly agree with this. You need uncertainty because in control you are trying to perform system identification and develop policies simultaneously. In the presence of limited data, you need to be able to integrate over all possible future outcomes to understand what your action need be. Your optimal control: stochastic optimal control. That’s a great field, and one that would seem to me vital for our fully artificial Marcel to be pinching our sandwiches.
Given that many animals have highly evolved motor systems, it feels to me that this type of uncertainty handling could be widely distributed across our neural systems, and replicating it could be a major component of also developing Marcel’s mind, particularly the planning part. I feel that, with some exceptions, the current generation of deep learning algorithms are somewhat limited in terms of their handling of uncertainty, and have felt this for a few years, this is why we are positioned in this area.
Emulated Intelligence
Another point I made, that might back up this thought further, was to do with where the big successes had been so far. In his write up Cho nicely summarised my message as the separation between the cognitive and non-cognitive applications of deep learning. I said we’d been successful in speech, vision and language, and these were all things that humans were good at, and it was a challenge to do something like health (where my research group’s own interests lie). In fact we don’t actually have exemplar systems that do the type of thing we’d like to do: i.e. assimilate millions of patients to develop an understanding of disease state. I was asked after if I was referring to the fact that we had no systems to emulate, and was that why cognitive tasks are easier. I think I was more referring to the fact that for these systems we have enormous amounts of data (gathered from humans) and that’s why things are working well. We are building systems to emulate human intelligence rather than systems that generate intelligence of their own. We need the example of the human output in order to achieve that emulation.
To move from intelligence emulation to intelligence production, I believe uncertainty handling is one of the key components. Thank again to the organisers for setting it up!