Learning Outcomes Week 1

Understand that machine learning combines data with assumptions to make predictions

Understand that probability provides a calculus of uncertainty for us to deal with unknowns.

Understand the definition of entropy

Understand that the KL-divergence is an asymmetric measure of similarity between probability distributions

Understand sample based approximations

Be able to prove that maximum likelihood solution is approximately minimizing the KL-divergence

Be able to derive an error function from the likelihood of a single data point.

  • Independence of data points (data is i.i.d.)
  • Logarithm is monotonic
  • In optimization, by convention, we minimize so take negative.

Learning Outcomes Week 2

Consolidate understanding of stages of a basic probabilistic machine learning:

  • Write down model.
  • Make an assumption about the errors.
  • Use combination of mathematical model and error assumptions to write down a likelihood
  • Maximize the likelihood with respect to the parameters of the model
  • Use the resulting model to make predictions.

Understand the principles of using gradient methods to find a fixed point equation to maximize a likelihoood.

Understand the weakness of coordinate descent methods when parameters are correlated.

Understand the advantages of using multivariate calculus to maximize the likelihood in linear regression.

Understand how basis functions can be used to go from linear models to non-linear models.

Learning Outcomes Week 3

  • Understand the challenge of model selection.
  • Understand the difference between training set, test set and validation set.
  • Understand and be able to apply appropriately the following approaches to model validation:
    • hold out set,
    • leave one out cross validation,
    • k-fold cross validation.
  • Be able to identify the type of error that arises from bias and the type of error that arises from variance.
  • Be able to distinguish between different types of uncertainty: aleatoric and epistemic. Be able to give examples of each type.
  • Be able to derive Bayes rule’ from the product rule of probability.
  • Understand the meaning of the terms prior, posterior and marginal likelihood
    • Be able to identify these terms in Bayes’ rule.
    • Be able to describe what each of these terms represents (belief before observation, belief after observation, relationship between belief and observation, the model score.)
  • Understand how to derive the marginal likelihood from the likelihood and the prior.
  • Understand the difference between the frequentist approach and the Bayesian approach, i.e. that in the Bayesian approach parameters are treated as random variables
  • Be able to derive the maths to perform a simple Bayesian update on the offset parameter of a regression problem.

Learning Outcomes Week 5

  • Understand the principal of integrating parameters and how to use Bayes rule to do so.
  • Understand the role of the prior distribution.
  • In multivariate and univariate Gaussian examples, be able to combine the prior with the likelihood to form a posterior distribution..
  • Recognise the role of the marginal likelihood and know its form for Bayesian regression under Gaussian priors.
  • Be able to compute the expected output of the model and its covariance using the posterior distribution and the formula for the function.
  • Understand the effect of model averaging and its advantages when making predictions including:
    • Error bars
    • Regularized prediction (reduces variance)

Learning Outcomes Week 7

  • Understanding how the marginal likelihood in a Gaussian Bayesian regression model can be computed using properties of the multivariate Gaussian.
  • Understanding that Bayesian Regression Models put a joint Gaussian prior across the data.
  • Understanding that we can specify the covariance function of that prior directly.
  • Understanding that Gaussian process models generalize basis function models to allow infinite basis functions.