# Week 1

# Uncertainty and Probability

### More Machine Learning Motivation

Neil’s Inaugural lecture is available here.

### Probability Review

If you feel your basic probability needs brushing up, you might want to watch this lecture from 2012-13. It covers basic concepts of probability.

#### Reading for Probability Review

See also appendix of lecture notes below.

- Bishop: pg 12-17
- Bishop: Section 1.6 & 1.6.1, skip material on pg 50-51

Exercise

- Bishop: Exercise 1.3

Lecture Notes

Uncertainty and Probability Lecture Slides.

#### Reading for Week 1

- Rogers and Girolami: Chapter 2 up to page 62
- Bishop: Section 1.2.1 (pg 17-19)
- Bishop: Section 1.2.2 (pg 19-20)
- Bishop: Part of Section 1.2.4 (pg 24-25)
- Bishop: Rest of Section 1.2.4 (pg 26-28, don’t worry about material on bias)

#### Exercises for Week 1

- Bishop: Exercise 1.7 & 1.8 (look at and understand them - don’t need to recreate it)
- Bishop: Exercise 1.9: Do it.

## Lab Class

Probabilities with Python and the iPython Notebook.

The notebook for the lab class can be downloaded from here.

To obtain the lab class in ipython notebook, first open the ipython notebook. Then paste the following code into the ipython notebook

```
import urllib
urllib.urlretrieve('https://github.com/SheffieldML/notebook/blob/master/lab_classes/machine_learning/MLAI_lab1.ipynb', 'MLAI_lab1.ipynb')
```

You should now be able to find the lab class by clicking `File->Open`

on
the ipython notebook menu.

Solutions for the lab class can be downloaded from this notebook here.

## Additional Material: Lecture from 2012/13 on Maximum Likelihood

- Maximum Likelihood Lecture Slides\

## Learning Outcomes Week 1

Understand that machine learning combines data with assumptions to make predictions

Understand that probability provides a calculus of uncertainty for us to deal with unknowns.

Understand the definition of entropy

Understand that the KL-divergence is an asymmetric measure of similarity between probability distributions

Understand sample based approximations

Be able to prove that maximum likelihood solution is approximately minimizing the KL-divergence

Be able to derive an error function from the likelihood of a single data point.

- Independence of data points (data is i.i.d.)
- Logarithm is monotonic
- In optimization, by convention, we minimize so take negative.