Originally posted my uSpace blog on 5th July 2012.

Just back from ICML 2012 this week, as usual it was good to see everyone and as ever it was difficult to keep track of all the innovations across what is a very diverse field.

One talk that was submitted as a paper but presented across the conference has triggered this blog entry. The talk was popular amoungst many attendees and seemed to reflect concerns some researchers in the field have. However, I felt it didn’t reflect my own perspective, and if it had done I wouldn’t have been at the conference in the first place. It was Kiri Wagstaff’s “Machine Learning that Matters”. Kiri made some good points and presented some evidence to back up her claims. Her main argument was that ICML doesn’t care enough about the applications. Kiri’s paper can be found here: http://arxiv.org/pdf/1206.4656.pdf. A comment from one audience member also seemed to indicate that he (the audience member) felt we (the ICML conferene) don’t do enough to engage with application areas.

As a researcher who spends a large amount of time working directly in application areas, I must admit to feeling a little peeved by what Kiri said. Whilst she characterized a portion of machine learning papers correctly, I believe that these papers are in a minority.  And I suspect an even larger proportion of such papers are submitted to the conference and then rightly rejected. The reason I attend machine learning conferences is that there are a large number of researchers who are active in trying to make a real difference in important application areas.

It was ironic that the speaker previous to Kiri was Yann Le Cun, who presented tailored machine learning hardware for real time segmentation of video images. Rather than focussing on this aspect of Yann’s work Kiri chose to mention the league table he maintains for the MNIST digits (something Yann does as a community service–I think he last published a paper on MNIST over 10 years ago). She presented the community’s use of the MNIST digits and UCI data sets as being indicative that we don’t care about `real applications’. Kiri found that 23% of ICML papers present results only on UCI and/or simulated data. However, given that ICML is a mainly methodological conference I do not find this surprising at all. I did find it odd that Kiri focussed only on classification as an application. I attended no talks on `classical’ classification at the conference (i.e. discriminative classification of vectorial data without missing labels, missing inputs or any other distinguishing features).  I see that very much as ‘yesterday’s problem’. An up to date critique might have focussed on Deep Learning, Latent topic models, compressed sensing or Bayesian non parametrics (and I’m sure we could make similar claims about those methods too).

However, even if the talk had focussed on more contemporary machine learning, I would still find Kiri’s criticisms misdirected. I’d like to use an analogy to explain my thinking. Machine learning is very much like the early days of engine design. From steam engines to jet engines the aim of engines is to convert heat into kinetic energy. The aim of machine learning algorithms is to convert data into actionable knowledge. Engine designers are concerned with aspects like power to weight ratio. They test these features through proscribed tests (such as maximum power output). These tests can only be indicative. For example high power output for an internal combustion engine (as measured on a ‘rig’) doesn’t give you the `drivability’ of that engine in a family car. That is much more difficult to guage and involves user tests. The MNIST data is like a power test: it is indicative  only (perhaps a necessary condition rather than sufficient), however it is still informative,

My own research is very much inspired by applications. I spend a large portion of my time working in computational biology and have always been inspired by challenges in computer vision. In my analogy this is like a particular engine designer being inspired by the challenges of aircraft engine design. Kiri’s talk seemed to be proposing that designing engines in itself isn’t worthwhile unless we simultaneously build the airplane around our engine. I’d think of such a system as a demonstrator for the engine, and building demonstrators is a very worthwhile endeavour (many early computers, such as the Manchester baby, were built as demonstrators of an important component such as memory systems). In my group we do try and do this, we make our methods available immediately on submission, often via MATLAB, and later in a (slightly!) more polished fashion through software such as Bioconductor. These are our demonstrators (of varying quality). However, I’d argue that in manycases the necessary characteristics of the engine being designed (power, efficiency, weight for engines; speed, accuracy, storage for ML) are so well understood that you don’t need the demonstrator. This is why I think Kiri’s criticisms, whilst well meaning, were misdirected. They were equivalent to walking into an engine development laboratory and shouting at them for not producing finished cars. An engine development lab’s success is determined by the demand for their engines. Our success is determined by the demand for our methods, which is high and sustained. It is absolutely true that we could do more to explain our engines to our user community, but we are a relatively small field (in terms of numbers, 700 at our international conference) and the burden of understanding our engines will also, necessarily, fall upon our potential users.

I know that you can find poorly motivated and undervalidated models in machine learning, but I try and avoid those papers. I would have preferred a presentation that focussed on succesful machine learning work that makes a serious difference in the real world. I hope that is a characteristic of my work, but I know it is a characteristic of many of my colleagues’.