This post contains my answers from a Quora session I did on machine learning and artificial intelligence. Each section contains a link to the original Quora question, the overall session can be found here.
Think carefully about what you actually want to achieve with it. I normally ask people who express interest (or even expertise in) machine learning to define whether they are (or want to be):
A machine learning generator
A machine learning consumer
Most fall into the latter camp, but it seems everyone fancies themselves as containing a bit of the former (particularly if they think they’re going to solve AI). To do the former well, in the international community, requires really good foundations (particularly in mathematics) followed by a PhD with a supervisor who has experience of how that community works.
Doing the second well is much easier from the perspective of learning machine learning. However, it leads me to a further question, are you (or do you want to be):
A data generator
A data consumer
A data generator would often be a scientist or company that is working in a particular application and wants answers. They need access to machine learning researchers or statisticians to give advice on how to answer those questions. They should try and collaborate with experts in data analytics and data science, but they should be careful, there is a lot of hype around the term ‘big data’ at the moment. It’s a difficult area to navigate. Data generators typically need an interface to consume machine learning (or statistics) effectively, if this interface is poorly chosen a lot of wasted resource can result (things get very expensive very quickly for a lot of data generators!).
A data consumer is where the largest demand is right at the moment, and should probably be the starting point for someone who wants to move in the right direction. An MSc in Data Science would be a good starting point. You can also use this experience to see if you want to transit into a machine learning generator (that’s basically what happened to me).
What are you passionate about? That is the route in to any subject. Is it a particular approach to learning or a particular application?
Read about what people are doing, look at blogs, follow interesting people on twitter. Try and distill your interests into something personal.
Look for Masters programs and online courses (such as Coursera and Andrew Ng’s course, or even my course! Neil Lawrence’s Machine Learning and Adaptive Intelligence Course).
You may now be ready to apply what you know in the wider world, or you could consider studying a PhD in machine learning or you could consider studying a PhD where you apply your machine learning knowledge to a particular application domain.
I first became interested in machine learning in 1995 when I was working on oil rigs. I bought myself a laptop, a copy of Borland C++ and tried to implement neural networks when I was stationed on the rig. I had a friend, Alex Rogers, who worked on the rigs alongside me. He was also interested in the area.
When I left the oil rigs I found a small company and worked on implementing neural networks for them for a short period. Then I got in contact with Niranjan, who was at Cambridge at the time, he advised me to apply for a PhD with Chris Bishop.
So you can see how important passion and interest were in getting to do what I did! Niranjan’s advice was excellent and I was very lucky to arrive at the Aston research group where Chris worked at a very exciting time in machine learning (although in those days we just called it neural networks!).
It was there I heard about Gaussian processes for the first time (from Chris Williams), and later when I moved to Cambridge I used to go to David MacKay’s group meetings. Listening to David was also very inspiring and further persuaded me that Gaussian processes might be interesting models.
So you can see, to a large extent I followed my nose! I met interesting people and took advice from those I trusted. There are many others who would come to influence me, but listening to the people above was very important on setting the direction in my early career.
For a beginner in machine learning, is it important to learn the available libraries in say, python or R, or implement the algorithms from scratch?
A lot depends on whether you want to develop new algorithms or are trying to interpret data yourself.
As a learning exercise though it’s a very good idea to have some experience of implementation. You normally learn better by doing rather than reading. In my own course I get people to do a lot of implementation. It’s also very satisfying to program something yourself!
However, if you are doing some work for a client, or a company, then you want the best implementation available. And that’s unlikely to be the version you write from scratch (even if you are a very good programmer). For this you want to know the different libraries, but also get good experience of interpreting the results of your implementations correctly.
I think predicting anything over longer than a 10 year horizon is very difficult. But there are a few thoughts I have on this question.
A lot of debates seem centered on the idea that something dramatic has happened in terms of technology. But it’s not clear to me that our rate of change of innovation is increasing. To me it feels sometimes painfully slow!
If we go back just over 50 years ago, there were people whose job it was to be a computer. When electronic computers took over, those people became the first computer operators. Eventually the low level tasks they performed became replaced by programming languages. So those jobs evolved too. They also became higher paid and more interesting. Depressingly, as they did so, the work moved from being predominantly women’s to mainly men’s.
I’m not saying that all jobs will get more interesting and higher paid (or that men will displace women!! I hope we’re evolving beyond that!), but there was some kind of bottle neck effect with programmers, as there is today with data scientists.
Early computers were invented to automate repetitive tasks (arithmetic, sorting) and yet most of the population’s experience is that computers today make us do more repetitive tasks: have you ever watched a biologist cutting and pasting data in Exccel because they don’t know how to script? Or have you ever had to rename a bunch of files or reorganise things into folders?
I don’t believe there’s going to be a dramatic shift where people’s existing roles become redundant overnight. In terms of the way in which things will change it’s going to be ‘more of the same’ (if you see what I mean). The nature of our society will continue to change, and so will the nature of the jobs within that society.
With machine intelligence, as we do start to better emulate various characteristics that today are considered distinctly human, I think the way in which we connect with other humans will evolve. You hear people talking in rather simplistic terms about ‘productive’ jobs being done by machines and humans becoming focused on ‘entertainment’. But those terms are rarely defined satisfactorily: what is a ‘productive’ job and what classifies as ‘entertaining’? My own job often feels like both, but sometimes feels like neither. Maybe I’m too optimistic, but I hope we’ll evolve to better understand, respect and enjoy each other and that the work we find satisfying will evolve to build on that understanding.
I was talking with the owner of my local coffee shop (Upshot Espresso) on Monday about single origin coffees and the importance of a story behind the farm where that coffee is grown. People’s expectations when they are drinking ‘craft coffee’ are rather different to the assumptions encoded in macroeconomics. There are many similar movements: e.g. craft beer. So maybe the real answer is that we’re all going to become a bit more hipster.
I think it’s very exciting. Clearly expectations are very high at the moment. It’s clear that the current generation of algorithms will not be able to deliver on all those expectations, so we’ve now become involved in a race to deliver new technologies fast enough to satiate society’s appetite for advance.
That’s a very dangerous game, somewhat akin to a dog chasing its tail. The faster we go in terms of driving forward the technology the greater expectations will be.
So I don’t think there’s any danger in predicting that there will be a backlash at some point (a deep winter?). It seems systemic in society to build something up to knock it down. So when a field of academic research starts developing its own folklore, you should probably be getting worried.
Having said that, I think these types of hypes are really important. They build enthusiasm and drive the field forward. Sometimes over ground that’s been covered before (indeed often over ground that’s been covered before) but with new eyes and different technologies. There are many more people involved in this visitation of the neural network than there were in the previous one in the ’80s and ’90s. They come with faster computers and more data. However, the limitations of those models have not fundamentally changed!
So what do we do about that? Well, some of us will enjoy the ride while it lasts (particularly those who joined the queue early and are at the front of the rollercoaster) but the rest should probably be thinking about where we might need to go next (the queues at the log flume look short!).
I really dislike the term Bayesian, it makes it sound like some form of cult or religion. Indeed it originates with Fisher who intended it to have negative connotations and be dismissive of that approach to analysis. So the very development of the term is divisive.
I think Fisher may have had good reasons for this given the context of what he was trying to achieve, but those reasons don’t apply so much today because Fisher was broadly successful in what he was trying to do.
About three years ago I was involved in preparing an ultimately unsuccessful programme grant with many of the leading UK researchers and DeepMind (before they were bought by Google! What hipsters we were!). One of the most unhelpful things was the feeling we had to deal with when writing our proposal was that along with proposing some really far reaching research (it was a really great proposal), we had to defend taking a ‘Bayesian approach’.
Laplace and Gauss naturally applied these approaches without much recourse to justification because it is the natural calculus of uncertainty. As Zoubin said when we were fretting over our programme grant: we don’t say I’m interested in rate of change so I’m going to use a ‘Newtonian’ approach or a ‘Leibnizian’ approach (although note we do use the term Newtonian in a slightly condescending way when we are describing the non-relativistic gravity).
Personally I’m deeply interested in the effect uncertainty has on decision making. That means that I should use the calculus of uncertainty. In particular probability. There are three fundamental rules of probability: normalization, sum rule and product rule. When you combine them and do a bit of algebra you end up with a simple formula that allows you to evaluate how your uncertainty evolves as you gain data. What could be easier? The maths involves only adding, dividing and multiplying. Fantastic!
The challenges are twofold.
- where does your (probabilistic) model come from?
- How do you compute the result of the sum rule when your state space (i.e. the space you are doing inference over) is very large?
I lack the imagination to develop totally new algorithmic approaches to every data analysis problem people mention to me, so the distillation of those analysis problems into those two questions is extremely helpful in giving me a ‘crutch’ to hobble forward within the vast landscape of potential solutions.
There are certainly limitations to framing every problem with those two questions, and that’s why I greatly value the contributions of those that bring different ideas to the machine learning community. Research is very hard, and having many different people attacking the challenges with a range of different techniques is vital for making progress. Just because Fisher chose to be divisive it doesn’t mean we should be.
This question is a little like asking me what my favourite food is. There is such a diversity of interesting people in machine learning!
Sometimes I imagine a machine learning village, a sort of research village where all the people who work in the field live together in an English countryside village, and we gather in the ‘church’ on Sundays to hear the sermon. In my mind, we are led by an inspirational local vicar who’s broad minded and welcoming. On Sundays he allows guest sermons, so we get to hear from a large range of different people.
There are many different streets in the village, and different areas of intellectual interest. Everyone in the community lives there (and by the last NIPS count, that was quite a lot of people!).
The interesting thing is that the community is much more important than any individual. So it’s vital that we retain the character of the village, which is open minded and welcoming, as new developments occur. It’s also important that we encourage regular trade with our neighboring villages, areas like computational biology, computer vision, speech and language. We are a market town for these related areas.
Perhaps the most exciting aspect is how the village is evolving and growing all the time. Fortunately our church is a broad church and it accepts many denominations (even if the vicar occasionally makes little jokes about some of the parishioners). A big topic for debate is when the current vicar retires, will we find a replacement who’s respected, broad minded and visionary enough to take his place? One who can manage all this in a period of rapid growth in our population. Or will we degrade into the partisanship that has been the undoing of so many other little villages?
I have at times found it frustrating that this class of models is not better understood by the wider community. For over a decade we’ve been running workshops and summer schools to try and correct this.
When I’m introducing them to non-technical audiences (or even technical audiences outside machine learning), I’ll typically use two slides. One which shows a bunch of sample functions from a Gaussian process prior, and then one that shows samples discarded which do not pass through a few example data points (typically 3).
Actually, this idea was borrowed from one of my old post-docs, Nicolas Durrande. He may have borrowed it from elsewhere!
I saw Alex Forrester (who uses Gaussian processes in design) introduce them in a related way. He called it “The Kriging Game”. He said it’s originally due to Donald Jones, and below is a slightly modified version of it.
We play a game, where there are two identical decks of cards. Each card has a function printed on it (i.e. a curve, not a programming language function).
You choose 1 card.
I ask you to give me the value of your function in the card at a particular point (e.g. what is the y value for an x value of 1?).
You give me the answer.
I throw away all cards that don’t match your answer.
The initial deck of cards is my prior belief of the functions (and in the game the prior is correct because it matches the actual distribution of functions you will choose from). The new selection of cards is the posterior belief over functions. I can either ask more questions (and discard more cards) to become more confident over what the function is, or I can make predictions based on my expectations given my current belief of what the function is.
In practice, the game is played with infinite cards in our deck, and we actually play the game against ‘mother nature’, who rather frustratingly doesn’t necessarily hold the same deck of cards we do.
So a large part of challenge is turning up to the game with the right pack of cards!
Actually the game isn’t unique to Gaussian processes, it’s describing the process of ‘Bayesian inference’, the Gaussian process is really a selection of a particular pack of cards to play with, and a choice that makes it rather easier to work out which cards to discard when you are given information by mother nature.
So while most members of the public have heard of AI, very few have heard of machine learning. So when I introduce machine learning in articles targeted at a wide audience I describe it as “the principal technology underpinning the recent advances in artificial intelligence”.
Artificial intelligence as a field actually includes a lot more than machine learning, it’s just that recently a lot challenges that were considered very hard have been solved using ideas from machine learning. Machine learning is actually used for many applications that might not be thought of as artificial intelligence.
Machine learning is also one of the principal technologies underpinning data science.
And to the extent that AI underpins data science I think it does so through machine learning.
So there are two broad components to what machine learning does: AI and data science.
Data science itself also involves a lot more than just machine learning, as does AI.
Machine learning is a data driven approach to decision making, and it therefore overlaps a great deal with statistics. In the past, when asked to distinguish between statistics and machine learning I put it as follows:
Statistics is trying to turn humans into computers, while machine learning is trying to turn computers into humans. Neither task is currently possible so we meet somewhere in the middle.
What I mean by that is the following: the main aim of the field of statistics (at its inception) was to ensure that we weren’t misled by statistics. Statistics are just summary numbers and the field was called mathematical statistics. People were interested in proving things about particular statistics you could compute that would allow you to be confident in your conclusions about the world given the data (are people in London richer or poorer than people in Manchester?). Humans have natural inductive biases which cause us to see patterns where there are none. A major preoccupation of statistics is ensuring that a particular statistic is not just exploiting this tendency. In my quote the computer represents an idealised decision maker that wouldn’t have such a bias, and statisticians work towards trying to ensure that important decisions are made without such biases.
Machine learning researchers, on the other hand, are fascinated by all the things that humans can do that computers can’t. Many of these things are actually the positive side effects of our inductive biases (our tendency to see patterns). So we would actually like to have methods that encode these biases.
So the philosophy of statistics and machine learning (certainly at outset) is quite different and that leads to different emphasis.
Note in all these discussions there’s no one field that’s right and one field that’s wrong. We’re just interested in slightly different things. It’s very important to bear that in mind when involved in interdisciplinary discussion, otherwise frustration and argument results!
From a personal perspective I love the fact that data driven research is a pass to ‘access all areas’, so the interface between statistics and machine learning really excites me. And it’s fantastic that machine learning has been so influential that we are asked to contribute to debates in both data science and artificial intelligence. I hope we do so constructively!
A Gaussian process is a flexible, non-linear, prior over functions. It enables us to express our belief about a class of functions before observing the data and importantly tractably compute our updated posterior belief about the functions that are consistent with our prior assumptions and the data we observed. It does this tractably, and gives us a full posterior distribution over functions.
All this is done with linear algebra which is often implemented very efficiently on a computer. So importantly, sustaining the uncertainty in your model is far easier with a Gaussian process than it is with many other approaches.
I think people take against them because they have Gaussian in their name. And people think of Gaussians as overly-simplistic.
Unfortunately, those people are rather prone to believing they are doing something non-Gaussian when they apply other methods (generalised linear models, Kalman filters, classical signal processing, autoregressive models) which can easily be shown to be a special case of a Gaussian process.
They certainly have limitations, and a lot of my own research and that of my collaborators is focused on overcoming those limitations (e.g. deep GPs). But if you are interested in uncertainty, then as a first point of call for modelling functions, you often find you are a lot further down the road than with other techniques.
Again, I prefer to read this question as asking me how deep learning can benefit from principled handling of uncertainty (see related answer here).
Handling of uncertainty in deep learning will allow methods to be much more data efficient.
Current deep learning algorithms based on backpropagation and neural networks are very data inefficient relative to humans, and I think uncertainty propagation is the answer to fixing this. I wrote a longer blog post about that here:
OpenAI is an exciting development in the community with some extremely impressive early recruits. I feel very privileged to be working in a community that is gaining such high profile investment. My comments below should be read against the background of that very positive outlook, but just because we welcome it it doesn’t mean we shouldn’t be steering carefully as it arrives.
OpenAI’s launch put me in mind of a larger phenomenon where it seems (to me) you don’t have any credibility as a Silicon Valley billionaire at the moment unless you have your own approach to solving AI. This is not specific to OpenAI, there is also FAIR, Google DeepMind, Google Brain and the Allen Institute for AI. OpenAI was marketed as a different approach, but I’m yet to understand how it is fundamentally different from what already exists.
OpenAI are promising openness, but so FAIR, Google, Allen Institute and Google DeepMind do already engage in openness (sharing of code, frameworks and algorithms). As I’ve said before in this Guardian article: The challenge is to get access to the data. The power accumulates with data, not algorithms.
Of course, all this AI-activity is probably more worthy than launching attempts at winning the America’s Cup (and it’s probably more expensive too), but there are a lot of parallels.
Thinking of wider industry, I’m very excited by services (such as Azure or AWS) that are bringing powerful machine learning (at scale) to the mass market. Machine learning as a commodity. The utility of that approach for wider society may be far greater in the near term than the output of these institutes. One aspect that is missed in all the hype is the importance of infrastructures in dealing with data challenges that real people have. That’s the practical end of curing cancer, predicting stock markets, making transport systems more efficient, etc.
I’m more than a little worried about a circumstance where the main directions in our research community are decided over a latte in Palo Alto (of course, that’s a silly idea, I bet none of them drink lattes, it will be flat whites or something).
In my analogy of this previous answer all these organisations are like new tower blocks in the village of machine learning. They are less organically evolved than the old historic center, but are benefiting from a lot of investment. However, as we know from failed experiments in social housing from the 1960s bigger is not always better, and there is a risk of destroying the very aspects that made the community so interesting when we build on such a scale.
The current generation of personal assistants are combining a range of different technologies, some of which are outside the scope of machine learning. They are a major engineering challenge and the exact ingredients and how they are combined are largely beyond my own expertise.
One thing we can say is that they are very largely data driven and are likely to become more so!
To me the biggest existential threat to humans seems to be humans!
Machine learning is the main technology underpinning the recent advances in artificial intelligence, but at one extreme machine learning is merely a branch of statistics focussed on classification. So, does statistics represent an existential threat to humans?
To my mind statistics is a poor term, because actually that field is about data analysis. It’s just that when the field launched, the main way to do data analysis was to compute a statistic and then talk about that. Machine learning filled a gap in statistics that was more reliant on the computer for inferences.
It’s arguable that statistics does represent an existential threat because we are utterly reliant on statistics underpinning our modern intelligent decision making processes, if it were done badly we would be in a lot of trouble. However, traditionally those processes actually involve humans in the loop.
Because of our tendency to anthropomorphize, many conversations about AI can quickly depart from the reality of the challenges we face in the modern data-dominated era and into the domain of science fiction.
It’s true that we face many potential existential threats: AI, nuclear war, meteorite impact, supervolcano eruptions, climate change etc. Interestingly AI (or as I’d prefer to call it machine intelligence) is the only one of these potential existential threats that may help provide potential solutions for one or more of the other challenges.
My concern is much more about the imminent threat that is associated with loosing control of our personal data. By relinquishing control of our personal data we are allowing a new form of power-accumulation. I can envisage a dystopian society based on accumulated personal data that would be easy to implement with today’s technology. Over hyped talk of AI existentialist threats is obscuring the very real dangers we are facing right now.
I think the biggest misconception arises from the tendency to embody AI.
At the recent Future of AI event hosted by NYU Cynthia Breazley was relating how much more effect intelligent systems are when they are embodied in some physical object.
It seems to me that many people (including experts) cannot help but begin to embody intelligence in some boxed in entity when thinking about it. This is a dramatic mistake.
As I’ve recently blogged about, our own intelligence is very locked in due to the limited bandwidth of our communication abilities. Indeed, our tendency to embody intelligence is a side effect of the intense modelling we do of other entities when we try to communicate with them so that we can use that bandwith efficiently.
Real world intelligent systems are extremely diffuse and distributed. Where we do embody them we will be doing so principally to facilitate communication with people (pandering to their expectations of intelligence).
For this reason, I often find it easy to think of data science, machine learning and statistics when I think of future intelligent systems. People don’t tend to embody statistics … despite it being fundamental to our AI technologies …
I don’t think there is such a thing as a master algorithm. As I said in a recent tweet, referring to Strong AI: “There are eight ways to make coffee, why should we expect there to be only one way to do AI?”
Of course there are more than eight different ways to make coffee! But I had to try and find a pithy statement for a tweet.
I sometimes think of intelligence as the use of information to make decisions that save energy.
I don’t think there will be one algorithm for doing that, but I do think that there are key principles that are often important (like uncertainty!).
This is one of the topics we try and address in our summer schools in Gaussian processes. You can find old lectures via the website here: Gaussian Process Summer Schools. Or you can attend our September school on Gaussian processes (with focussed applications in Uncertainty Quantification)!
This is clearly a question for Mark Zuckerberg rather than me (he’s set himself the challenge of doing it!).
Free online courses (including mine! http://inverseprobability.com/ml…), participation in Kaggle competitions, use R, Python, Jupyter notebook and open source tools.
I think the main difference between the two is this ‘locked in’ idea I explored in the debate I blogged about here: Future Debates: This House Believes An Artificial Intelligence will Benefit Society
The idea is that our major constraint (as humans) is the low bandwidth communication we have between ourselves. That limits our ability to operate as a collective intelligence. As a result we spend a large amount of effort second guessing one another in an effort to (a) outwit each other or (b) collaborate better.
Artificial intelligences have no such constraints, indeed the fact that the technology is being driven now by machine learning is reflective of its current reliance on data. The systems we are developing (like the AI that is currently beating the leading human Go player in Korea!) are dependent on vast amounts of data, way more than the human opponent has access to.
So currently our AIs are much more stupid than us but make up for this fact by having access to much more information. An “idiot savant” if you will …
Well, I’m very excited by these models and their potential to solve some of the major challenges that researchers are facing in AI today. They are not the only approach, but I feel they are an extremely promising one.
To attach your question from a more general direction though, another way of interpreting it question is ‘How can deep learning benefit from handling uncertainty in a mathematically principled manner?’.
There will be massive benefits in regimes where data is scarce relative to the complexity of the problem. Almost all the success we see in deep learning is founded on massive data sets which allow the very complex functions underpinning the learning to be well determined across a wide range of inputs.
As I’ve written about in recent blog posts, the calculus of uncertainty should allow us to make such complex functions robust to variation in the input that is not directly encoded in the data. This opens up important domains such as medical data and biological problems.
Interestingly, once you’ve done it, it also makes unsupervised learning trivial. Deep GPs have no difficulty with unsupervised learning. However, I’m slightly skeptical about the idea that unsupervised learning is some kind of global panacea that will solve all the challenges we face. It comes with its own problems (I’m actually more sympathetic to reinforcement learning as a global panacea … but also don’t give that idea too much credibility, global panacea’s are notoriously rare … fortunately efficient reinforcement learning is also critically dependent on uncertainty).
However, principled propagation of probability does come with significant computational overhead. Resolving this overhead is a very important challenge. To my mind it is the most important challenge at the moment. In this respect I’ve put my money where my mouth is, because I’ve made a significant personal investment in a small company that is trying to address these issues.
I’m not sure I understand the question, ML is very widely used in both medicine and biology!!
There are certainly challenges in obtaining regulatory approval for machine learning applications in health, but people like Lionel Tarassenko have already been down this path and deployed monitoring systems in intensive care units that have been shown, in clinical trials, to save lives.
If you search for computational health you’ll find more! I’m also very interested in applications in domain.
For biology, computational biology makes enormous use of machine learning. My own past record is dominated by this application area, often in collaboration with Magnus Rattray.
The Open Data Science Initiative is our attempt to make it easier for people to engage with data. Our aim is to make it easier for people to understand and apply methodologies to their own data.
The idea is inspired by a recognition that we don’t (ourselves) have the time to help everyone who needs help with there data processing problems. But if we can share our expertise widely we can empower others to do their own data analysis. One of the things I’m most proud of is that we instigated this module:
Which teaches 3rd year biology undergraduates to do data analysis in R. They’ve had no prior programming experience. It was challenging for them, but they all were really inspired by it. This is exactly the sort of thing we need to do to ensure that important data analysis techniques are deployed as widely as possible.
I want to develop intelligent devices which can change human lives to a great extent. What should I learn in order to achieve this?
You can have most effect by delivering such solutions in developing countries, where there aren’t existing solutions to solve many problems due to lack of local infrastructure. I wrote an article about this in the Guardian here, and we are hosting another data science workshop in Kampala in June (in collaboration with Makerere University, UN Global Pulse, IBM Nairobi, Dedan Kemathi University of Technology).
I really enjoy those applications as they have significant potential to make a difference to human lives (as you ask in your question). The best thing you can do is educate people to address problems themselves by learning about data, smart phone apps and machine learning. That’s why we deliver our Data Science Schools in Africa alongside the workshops.
Are we witnessing the early stages of ML being used in industry or do you believe the application of ML towards industry is already widespread?
It’s already extremely widespread, particularly in the big four internet companies (Facebook, Google, Amazon, Microsoft).
Probably the application where it’s making the most money is Ad-click prediction.
Note also that most of the time very simple models are used, it is more about scaling the learning than applying complex models. It is very much a new phenomenon that these companies are becoming interested in deep learning, their routes in machine learning go much further back (at least for Facebook, Google and Microsoft, Amazon are more recent arrivals, but have caught up fast) and historically focussed on much more scalable systems.
I’m pretty sure that today most of their money is still being made by simpler algorithms (logistic regression, decision trees), very cleverly implemented with extremely intelligent feature selection. Their deep learning efforts are a cherry on that cake.
Whether that changes in the next few years remains to be seen (perhaps it’s already changing now!).
There is a fantastic symbiosis between the two which is an important driver of innovation and change in the world we are experiencing today.
I worked briefly in an industrial lab when I finished my PhD, and indeed I strongly believed then that that would be a better match for my skill set. (I didn’t think I would be good enough to forge an independent career in research, I thought I was more suited to translation of academic ideas into practice). Unfortunately, at that time Microsoft Research weren’t interested in offering me a long term position, so I sort of ended up in academia by accident.
Back then I certainly believed there was much greater opportunity to deploy machine learning in practice for companies. Salaries were/are also much higher!
However, today the situation is a bit different. We all have access to much more data, and I have a great deal of freedom on what I say, do and work on. This doesn’t come immediately in academia, you are more constrained by climbing the ladder in the early days (or perhaps a better analogy is climbing an oily pole). However, now I think I can have a great deal of influence from an academic post on real world applications, perhaps greater than I would do within an individual company (unless I was very senior!). I love being able to work across a wide range of applications and advise many different organisations. It is an immense pleasure.
I’m also extremely proud of those that work with me in my research group (students and post-docs) and get particular joy from their achievements after they move on.
There are many attractions to industry, but it is clear (for me) that the opportunities to make a difference (today) in academia are far greater.
I suppose goals change over time and any individual goal may vary from the field’s main goal. But the simple answer is to automate tasks that humans do well and computers don’t!
Current research focuses heavily on ML as it is the most promising branch of AI. Should we be worried about local optima in this area of study?
Well, my own interest is more in machine learning than AI (I only started referring to myself as an AI researcher when everyone else started referring to machine learning as AI!)
AI is a very diffuse and evolving term that seems to mean different things to many people. There is always a danger with over focus on any single technology, but machine learning is a wide range of technologies.
To rephrase your question: is there a danger in AI focussing too much on data as the source of the intelligence?
I think the answer (for me) is no!
There is certainly a danger in focussing too much on any subfield of machine learning though!
Privacy, fairness and transparency in learning. These are extremely important areas that are not getting enough attention!
What is the relationship between Kernel Methods (KMs) and Gaussian Processes (GP)? Is it possible to use GP in non-gaussian data?
Kernel methods can often be seen as special cases of Gaussian processes where the uncertainty has been discarded. This normally makes them much simpler algorithmically, but for me they then loose the essence of why I’m considering the Gaussian process in the first place.
People often have an overly simplistic idea of what ‘non-Gaussian data’ is. I’ve known people say that they are not applying a Gaussian process because the data won’t be Gaussian, but then they apply models which can be shown to be a special case of a Gaussian process!
There are certainly cases where the data is truly non-Gaussian (like classification) and the normal approach in these cases is to approximate the non-Gaussian likelihoods.
Personalized health is a massively important area of application of machine learning techniques.
Many of the ideas I work on with collaborators are inspired by challenges in personalized health. In particular: what are the new methodologies we need to drive forward applications in personalized health.
Perhaps the application I’m most inspired by is support for people with mental health issues (ranging from bipolar and schizophrenia to elderly people with issues of dementia). Support for these people is very poor in large parts of the world, and it seems that technology could do a lot to assist professionals in providing that support, or providing early diagnosis.
Of course this could only be realised with close monitoring of individuals, with massive implications for their personal privacy. That presents a major challenge: how do we provide the support without infringing individual rights? This is a big motivation for a lot of my group’s current research.
With our European project in this area we organised a workshop on Machine Learning for Personalized Medicine, link is here:
And you can find videos of many of the lectures here:
I’m very pleased to announce that the University of Sheffield is launching an MSc in Data Analytics starting in September 2016!
So naturally that’s the best place to do an MSc in this subject ;-)
However, accepting that not everyone is able to come to Sheffield, and that there may be other Universities with interesting offerings, then for UK I’d suggest looking at:
Edinburgh, Cambridge and UCL.
I am less familiar with courses in other countries.
Is there a fundamental reason why explicitly enforcing prior knowledge often leads to problems which are computationally hard or intractable?
This is a really interesting question. I’m not sure I can give a fundamental reason (in terms of, for example, mathematics) but let me try and give you some of my intuitions.
I think the tendency is as follows: I’m trying to constrain my search space by including as much mechanistic information as I can. This is effectively trying to find way to constrain my prior believe … i.e. change the deck of cards I use when playing the Kriging game (see this answer here). The more mechanism you put into the model, the closer you are getting to a simulation. Simulations are notoriously hard to do inference on, and we end up using techniques like ABC (which is effectively the Kriging game in algorithmic form!).
An exciting direction for trying to deal with this issue is the general area of surrogate modelling (or emulation). At Sheffield we are lucky to have a great deal of local expertise in these techniques. For this reason we are organising a special version of our Gaussian Process Summer School focussed on uncertainty quantification which is the broader area in which these ideas fall.
In machine learning the area of probabilistic numerics (championed by researchers such as Mark Girolami, Mike Osbourne and Philipp Hennig) is closely related. I’m extremely excited by these ideas and am enjoying lots of workshops in this area.
I love this interface between the fields and I really enjoy collaborating with colleagues from statistics. I first engaged with colleagues in the area mainly when working in computational biology. There are very large overlaps between the fields. I get a lot out of understanding their perspectives and priorities.
I believe the (methodological) challenges of data science will be solved by the combination of these two communities. Certainly in those areas where researchers are collaborating and listening to each other.
Statistics is a much larger community than machine learning, and statistics is also a profession (you can become a statistician with only an undergraduate degree). As a profession it is required to apply its methods in particular ways to meet certain standards. This is important because the field of statistics is really about the safety and reliability of statistics not really the statistics themselves.
This means it is applied in matters of great import and therefore the conclusions need to be conservatively applied.
For post-graduate work, the world of statistical departments is very different. The constraints of the profession apply less to researchers into new statistical methodology (at least when they are doing their imagining rather than their applying).
- Machine learning people can get a very misleading idea of academic statistics by speaking only to ‘jobbing’ statisticians.
- There are many qualified statisticians who have a great deal of scepticism about machine learning methodologies because they fall outside their realm of experience.
But we shouldn’t let these issues mislead us, as always close collaboration and regular conversations from researchers with different perspectives drive us forward.
What is your advice for data scientists with no biology background who want to apply ML towards medicine/biology?
You need to develop your understanding of biology. Don’t be afraid of trying things (that’s a great way of learning), but don’t get carried away with a particular direction. Try and collaborate with people who can give guidance on your work and redirect you when things are going wrong.
It looks like my answer above
Should be helpful here too!
Do you think GP with approximation method is still useful for big data? Can it beat deep neural network in the future?
Yes, I’m very convinced that will be the case. In particular deep Gaussian processes will outperform deep neural networks in examples where the data is scarce relative to the complexity of the problem. This can happen even for ‘big data’ when the problem is very complex. I call this scenario “when big data is small”.
There will still be a place for deep neural network models (I don’t believe any single method will dominate in all applications) but I certainly think it will be the case that deep Gaussian processes will outperform deep neural networks in a variety of applications.
I’m going to answer this assuming that you are looking for a research career, if it was an industrial career then my answer would be different.
The last thing you should do is pursue a particular area because you think it will lead to a ‘great career’. You must pursue your passion and interest. That is particularly important in an intellectually intensive area such as machine learning. If you do your work only for personal gain, you won’t participate in the open and collaborative manner that will build you the right network of colleagues to inspire you and drive your career forward.
Of course, you can be inspired by a particular algorithm because it looks like it has the potential to have impact, and impact inspires you. But a cynical approach to career development in ML research will always come unstuck at some point!
I’m doing a lot of consulting nowadays with industry partners. Companies in mobile apps, pipeline infrastructure, the medical field and even an F1 team. There are a wide range of applications of machine learning in industry and a large demand for expertise! I find it very important to work with these industrial partners to ensure that our academic research is nicely aligned with the challenges that industrial colleagues are facing.
Well, the first thing for the ‘drop out’ to do would be to stop thinking of themselves as a ‘drop out’ and find something in the area to be passionate about. Any team is pleased to gain a member who is passionate about the subject!
This is a major challenge that we are working on solving right now. I hope we will solve it, and early indications for our new ideas are really good. If we manage to make them work (or someone else does) then I think these models will be a very important driver of progress.