@Rogers:book11
@Bishop:book06
data : observations, could be actively or passively acquired (meta-data).
model : assumptions, based on previous experience (other data! transfer learning etc), or beliefs about the regularities of the universe. Inductive bias.
prediction : an action to be taken or a categorization or a quality score.
import numpy as np
# Create some data
x = np.array([1, 3])
y = np.array([3, 1])
xvals = np.linspace(0, 5, 2);
m = (y[1]-y[0])/(x[1]-x[0]);
c = y[0]-m*x[0];
yvals = m*xvals+c;
%matplotlib inline
import matplotlib.pyplot as plt
xvals = np.linspace(0, 5, 2);
m = (y[1]-y[0])/(x[1]-x[0]);
c = y[0]-m*x[0];
yvals = m*xvals+c;
ylim = np.array([0, 5])
xlim = np.array([0, 5])
f, ax = plt.subplots(1,1,figsize=(5,5))
a = ax.plot(xvals, yvals, '-', linewidth=3);
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.xlabel('$x$', fontsize=30)
plt.ylabel('$y$',fontsize=30)
plt.text(4, 4, '$y=mx+c$', horizontalalignment='center', verticalalignment='bottom', fontsize=30)
plt.savefig('diagrams/straight_line1.svg')
ctext = ax.text(0.15, c+0.15, '$c$', horizontalalignment='center', verticalalignment='bottom', fontsize=20)
xl = np.array([1.5, 2.5])
yl = xl*m + c;
mhand = ax.plot([xl[0], xl[1]], [yl.min(), yl.min()], color=[0, 0, 0])
mhand2 = ax.plot([xl.min(), xl.min()], [yl[0], yl[1]], color=[0, 0, 0])
mtext = ax.text(xl.mean(), yl.min()-0.2, '$m$', horizontalalignment='center', verticalalignment='bottom',fontsize=20);
plt.savefig('diagrams/straight_line2.svg')
a2 = ax.plot(x, y, '.', markersize=20, linewidth=3, color=[1, 0, 0])
plt.savefig('diagrams/straight_line3.svg')
xs = 2
ys = m*xs + c + 0.3
ast = ax.plot(xs, ys, '.', markersize=20, linewidth=3, color=[0, 1, 0])
plt.savefig('diagrams/straight_line4.svg')
m = (y[1]-ys)/(x[1]-xs);
c = ys-m*xs;
yvals = m*xvals+c;
for i in a:
i.set_visible(False)
for i in mhand:
i.set_visible(False)
for i in mhand2:
i.set_visible(False)
mtext.set_visible(False)
ctext.set_visible(False)
a3 = ax.plot(xvals, yvals, '-', linewidth=2, color=[0, 0, 1])
for i in ast:
i.set_color([1, 0, 0])
plt.savefig('diagrams/straight_line5.svg')
m = (ys-y[0])/(xs-x[0])
c = y[0]-m*x[0]
yvals = m*xvals+c
for i in a3:
i.set_visible(False)
a4 = ax.plot(xvals, yvals, '-', linewidth=2, color=[0, 0, 1]);
for i in ast:
i.set_color([1, 0, 0])
plt.savefig('diagrams/straight_line6.svg')
for i in a:
i.set_visible(True)
for i in a3:
i.set_visible(True)
plt.savefig('diagrams/straight_line7.svg')
import pods
pods.notebook.display_plots('straight_line{plot}.svg',
directory='./diagrams', plot=(1, 7))
point 1: $x = 1$, $y=3$ $$3 = m + c$$ point 2: $x = 3$, $y=1$ $$1 = 3m + c$$ point 3: $x = 2$, $y=2.5$ $$2.5 = 2m + c$$
point 1: $x = 1$, $y=3$ $$3 = m + c + \epsilon_1$$
point 2: $x = 3$, $y=1$ $$1 = 3m + c + \epsilon_2$$
point 3: $x = 2$, $y=2.5$ $$2.5 = 2m + c + \epsilon_3$$
Predict a real value, $y_i$ given some inputs $x_i$.
Predict quality of meat given spectral measurements (Tecator data).
Radiocarbon dating, the C14 calibration curve: predict age given quantity of C14 isotope.
Predict quality of different Go or Backgammon moves given expert rated training data.
Image from Wikimedia Commons http://bit.ly/191adDC
data = pods.datasets.olympic_100m_men()
f, ax = plt.subplots(figsize=(7,7))
ax.plot(data['X'], data['Y'], 'ro', markersize=10)
[<matplotlib.lines.Line2D at 0x109316208>]
Image from Wikimedia Commons http://bit.ly/16kMKHQ
Gold medal times for Olympic Marathon since 1896.
Marathons before 1924 didn’t have a standardised distance.
Present results using pace per km.
In 1904 Marathon was badly organised leading to very slow times.
data = pods.datasets.olympic_marathon_men()
f, ax = plt.subplots(figsize=(7,7))
ax.plot(data['X'], data['Y'], 'ro',markersize=10)
[<matplotlib.lines.Line2D at 0x114702ef0>]
$\text{data}$ : observations, could be actively or passively acquired (meta-data).
$\text{model}$ : assumptions, based on previous experience (other data! transfer learning etc), or beliefs about the regularities of the universe. Inductive bias.
$\text{prediction}$ : an action to be taken or a categorization or a quality score.
$y_i$ : winning time/pace.
$x_i$ : year of Olympics.
$m$ : rate of improvement over time.
$c$ : winning time at year 0.
point 1: $x = 1$, $y=3$ $$3 = m + c$$ point 2: $x = 3$, $y=1$ $$1 = 3m + c$$ point 3: $x = 2$, $y=2.5$ $$2.5 = 2m + c$$
point 1: $x = 1$, $y=3$ $$3 = m + c + \epsilon_1$$
point 2: $x = 3$, $y=1$ $$1 = 3m + c + \epsilon_2$$
point 3: $x = 2$, $y=2.5$ $$2.5 = 2m + c + \epsilon_3$$
The Gaussian PDF with $\mu=1.7$ and variance $\sigma^2= 0.0225$. Mean shown as red line. It could represent the heights of a population of students.
$\sigma^2$ is the variance of the density and $\mu$ is the mean.
In the standard Gaussian, parametized by mean and variance.
Make the mean a linear function of an input.
This leads to a regression model. \begin{align*} y_i=&f\left(x_i\right)+\epsilon_i,\
\epsilon_i \sim &\mathcal{N}(0, \sigma^2).
\end{align*}
Assume $y_i$ is height and $x_i$ is weight.
Likelihood of an individual data point $$p\left(y_i|x_i,m,c\right)=\frac{1}{\sqrt{2\pi \sigma^2}}\exp \left(-\frac{\left(y_i-mx_i-c\right)^{2}}{2\sigma^2}\right).$$
Parameters are gradient, $m$, offset, $c$ of the function and noise variance $\sigma^2$.
If the noise, $\epsilon_i$ is sampled independently for each data point.
Each data point is independent (given $m$ and $c$).
For independent variables: $$p(\mathbf{y}) = \prod_{i=1}^n p(y_i)$$ $$p(\mathbf{y}|\mathbf{x}, m, c) = \prod_{i=1}^n p(y_i|x_i, m, c)$$
i.i.d. assumption
$$p(\mathbf{y}|\mathbf{x}, m, c) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}}\exp \left(-\frac{\left(y_i-mx_i-c\right)^{2}}{2\sigma^2}\right).$$ $$p(\mathbf{y}|\mathbf{x}, m, c) = \frac{1}{\left(2\pi \sigma^2\right)^{\frac{n}{2}}}\exp \left(-\frac{\sum_{i=1}^n\left(y_i-mx_i-c\right)^{2}}{2\sigma^2}\right).$$
Negative log likelihood is the error function leading to an error function $$E(m,c,\sigma^{2})=\frac{n}{2}\log \sigma^2 +\frac{1}{2\sigma^2}\sum _{i=1}^{n}\left(y_i-mx_i-c\right)^{2}.$$
Learning proceeds by minimizing this error function for the data set provided.
Ignoring terms which don’t depend on $m$ and $c$ gives $$E(m, c) \propto \sum_{i=1}^n (y_i - f(x_i))^2$$ where $f(x_i) = mx_i + c$.
This is known as the sum of squares error function.
Commonly used and is closely associated with the Gaussian likelihood.
What is the mathematical interpretation?
There is a cost function.
It expresses mismatch between your prediction and reality. $$E(m, c)=\sum_{i=1}^n \left(y_i - mx_i -c\right)^2$$
This is known as the sum of squares error.
Problem with Linear Regression—$\mathbf{x}$ may not be linearly related to $\mathbf{y}$.
Potential solution: create a feature space: define $\phi(\mathbf{x})$ where $\phi(\cdot)$ is a nonlinear function of $\mathbf{x}$.
Model for target is a linear combination of these nonlinear functions $$f(\mathbf{x}) = \sum_{j=1}^k w_j \phi_j(\mathbf{x})$$
pods.notebook.display_plots('polynomial_basis{num_basis}.svg', directory='./diagrams', num_basis=(1,3))
pods.notebook.display_plots('polynomial_function{func_num}.svg', directory='./diagrams', func_num=(1,3))
pods.notebook.display_plots('radial_basis{num_basis}.svg', directory='./diagrams', num_basis=(1,3))
pods.notebook.display_plots('radial_function{func_num}.svg', directory='./diagrams', func_num=(1,3))
\left(-\frac{\left(y_i-\mathbf{w}^{\top}\boldsymbol{\phi}_i\right)^{2}}{2\sigma^2}\right).$$
-\frac{n}{2}\log 2\pi -\frac{\sum
_{i=1}^{n}\left(y_i-\mathbf{w}^{\top}\boldsymbol{\phi}_i\right)^{2}}{2\sigma^2}.$$
\sigma^2 + \frac{\sum
_{i=1}^{n}\left(y_i-\mathbf{w}^{\top}\boldsymbol{\phi}_i\right)^{2}}{2\sigma^2}.$$
pods.notebook.display_plots('olympic_LM_polynomial{num_basis}.svg', directory='./diagrams', num_basis=(1,7))
pods.notebook.display_plots('olympic_LM_polynomial{num_basis}.svg',
directory='./diagrams', num_basis=(1, max_basis))
pods.notebook.display_plots('olympic_val_LM_polynomial{num_basis}.svg',
directory='./diagrams', num_basis=(1, max_basis))
pods.notebook.display_plots('olympic_val_inter_LM_polynomial{num_basis}.svg',
directory='./diagrams', num_basis=(1, max_basis))
pods.notebook.display_plots('olympic_loo{part}_inter_LM_polynomial{num_basis}.svg',
directory='./diagrams', num_basis=(1, max_basis), part=(0,len(partitions)))
pods.notebook.display_plots('olympic_5cv{part}_inter_LM_polynomial{num_basis}.svg',
directory='./diagrams', num_basis=(1, max_basis), part=(0,5))