Uncertainty and Probability
More Machine Learning Motivation
Neil’s Inaugural lecture is available here.
If you feel your basic probability needs brushing up, you might want to watch this lecture from 2012-13. It covers basic concepts of probability.
Reading for Probability Review
See also appendix of lecture notes below.
- Bishop: pg 12-17
- Bishop: Section 1.6 & 1.6.1, skip material on pg 50-51
- Bishop: Exercise 1.3
Reading for Week 1
- Rogers and Girolami: Chapter 2 up to page 62
- Bishop: Section 1.2.1 (pg 17-19)
- Bishop: Section 1.2.2 (pg 19-20)
- Bishop: Part of Section 1.2.4 (pg 24-25)
- Bishop: Rest of Section 1.2.4 (pg 26-28, don’t worry about material on bias)
Exercises for Week 1
- Bishop: Exercise 1.7 & 1.8 (look at and understand them - don’t need to recreate it)
- Bishop: Exercise 1.9: Do it.
Probabilities with Python and the iPython Notebook.
The notebook for the lab class can be downloaded from here.
To obtain the lab class in ipython notebook, first open the ipython notebook. Then paste the following code into the ipython notebook
import urllib urllib.urlretrieve('https://github.com/SheffieldML/notebook/blob/master/lab_classes/machine_learning/MLAI_lab1.ipynb', 'MLAI_lab1.ipynb')
You should now be able to find the lab class by clicking
the ipython notebook menu.
Solutions for the lab class can be downloaded from this notebook here.
Additional Material: Lecture from 2012/13 on Maximum Likelihood
- Maximum Likelihood Lecture Slides\
Learning Outcomes Week 1
Understand that machine learning combines data with assumptions to make predictions
Understand that probability provides a calculus of uncertainty for us to deal with unknowns.
Understand the definition of entropy
Understand that the KL-divergence is an asymmetric measure of similarity between probability distributions
Understand sample based approximations
Be able to prove that maximum likelihood solution is approximately minimizing the KL-divergence
Be able to derive an error function from the likelihood of a single data point.
- Independence of data points (data is i.i.d.)
- Logarithm is monotonic
- In optimization, by convention, we minimize so take negative.