Just back from Cambridge where I attended a symposium that was in honour of the work of David MacKay. David was an incredible influence on my thinking and so many others’, it was really impressive to see the diversity of researchers he’s inspired talking about their work.

It was a fantastic two days with an excellent dinner at Trinity in David’s honour on Monday night. Several researchers shared their reminiscences of David. This was done in a broad range of styles, each of which was very effective. I scripted mine somewhat (although I also went off script). Here’s what I originally wrote.

MacKay Symposium: My Memories of the Inference Group in the late 1990s

David at the Trinity Dinner

In January 1998 I moved to the Computer Lab here in Cambridge to do my PhD. In those days, there was very little machine learning here outside Microsoft Research and David’s “Inference Group”, so I became an adoptee of David’s group. I’d cycle out from the New Museum’s site to the Cavendish Lab. I’d listen to talks on belief propagation, independent component analysis and low density parity check codes. On occasion, well I don’t know why I say on occasion, because I remember exactly how many occasions: twice, I also spoke myself.

David’s clarity of thought is unmatched by any other person I know. It is combined with a kindness and a depth of humanity that I think we have all experienced.

However, when presenting to David, you are mainly subjected to his clarity of thought.

My first presentation involved a lower bound on the mutual information of a mixture distribution.1 Midway through my talk, David said, “I think that this bound is only going to be tight when the components are far apart and you need it to be tight when they are close together, and because it isn’t, it’s not going to be useful.”

So, what I remember about that, is it actually took me about four weeks to understand the question, and about another four weeks to work out that his guess was wrong. However, that question was the key question I should have been asking myself. It was an extremely important lesson, and one that has stayed with me. David has a knack of getting to the core of the issue.

The second time, I spoke, I was very excited, because this time I’d developed a method for learning the number of units in the hidden layer of a neural network.2 The approach was inspired by David’s own work on automatic relevance determination and David had also previously written that learning the number of nodes in a neural network was one of the outstanding challenges of neural nets.

About halfway through the talk David interrupted, he said. “You’re doing the wrong thing, you shouldn’t be looking to find the number of hidden units, there number should be infinite and you should use them all.”

Slightly perturbed, I said “But in this paper you said that finding the number of hidden nodes was one of the main outstanding challenges of neural networks.” David replied, “Well I used to think that, but now I don’t”. It sounds silly, but that’s the first time I realized that you were actually allowed to change your mind about something in research. That discussion with David, was also a major influence on me switching focus to Gaussian process models, a focus I’ve retained for the last 15 years. I’m looking forward to dinner tonight when David tells me he’s changed his mind again.

A later academic memory of David was from the Gaussian Processes in Practice workshop we organised in 2006 at Bletchley Park. David kicked of the workshop with his talk “Gaussian Process Basics”, you can still view on video lectures today. Later Tony Sale kindly gave us a demonstration of his reconstruction of “Colossus”, the Tommy Flowers designed decoding system for the German High Command’s codes. Watching David watch the demonstration was like seeing history in the present. His mind querying the design of the machine that minds like his had made. Throughout Tony’s talk I had half an eye on the whirling tapes of the machine and another half on the whirling gears of David’s mind.

When I started in the field David was the torch by which a very large section of the UK machine learning community was guided. But he has changed his focus to other areas, and his thoughts shine just as brightly there. Gallagher codes, human computer interfaces, and most impressively his change of focus to energy. It was a great pleasure for me that my Dad, in his retirement was able to find as much inspiration in David as I had. He read David’s book on sustainable energy, and devoted himself to renewable energy at his local church, using the book as his guide.

David has an urgent desire to do the right thing in all aspects of his life. As a result, in questions of both research and life, I still consult my mental model of David for advice. I’m not one for bumper stickers, but if I were then my one would say “What would David do?”. I know I could never emulate all his thoughts and opinions, but through the conversations I have with him, I’ve developed an approximation that I hope steers me in the right direction.

  1. Here’s the paper, the bound was due to Tommi Jaakkola, I originally implemented it in sigmoid belief networks, then later in Boltzmann machines and neural nets. Here’s the first paper (which was also my first publication) 

  2. Here’s the paper that I eventually published, “Node Relevance Determination”