MLAI Week 4: Basis Functions¶

Neil D. Lawrence¶

20th October 2015¶

Review¶

Last time: explored least squares for univariate and multivariate regression.
Introduced matrices, linear algebra and derivatives.
This time: introduce basis functions for non-linear regression models.

Nonlinear Regression¶

Problem with Linear Regression—$\mathbf{x}$ may not be linearly related to $\mathbf{y}$.
Potential solution: create a feature space: define $\phi(\mathbf{x})$ where $\phi(\cdot)$ is a nonlinear function of $\mathbf{x}$.
Model for target is a linear combination of these nonlinear functions $$f(\mathbf{x}) = \sum_{j=1}^k w_j \phi_j(\mathbf{x})$$

Quadratic Basis¶

Basis functions can be global. E.g. quadratic basis: $$\mathbf{\phi} = [1, x, x^2]$$

Quadratic Basis¶

Basis functions can be global. E.g. quadratic basis: $$\mathbf{\phi} = [1, x, x^2]$$

Quadratic Basis¶

Basis functions can be global. E.g. quadratic basis: $$\mathbf{\phi} = [1, x, x^2]$$

Functions Derived from Quadratic Basis¶

$$f(x) = {\color{\redColor}w_1} + {\color{\magentaColor}w_2x} + {\color{\blueColor}w_3 x^2}$$

Functions Derived from Quadratic Basis¶

$$f(x) = {\color{\redColor}w_1} + {\color{\magentaColor}w_2x} + {\color{\blueColor}w_3 x^2}$$

Functions Derived from Quadratic Basis¶

$$f(x) = {\color{\redColor}w_1} + {\color{\magentaColor}w_2x} + {\color{\blueColor}w_3 x^2}$$

Radial Basis Functions¶

Or they can be local. E.g. radial (or Gaussian) basis $$\phi_j(x) = \exp\left(-\frac{(x-\mu_j)^2}{\ell^2}\right)$$

Radial Basis Functions¶

Or they can be local. E.g. radial (or Gaussian) basis $$\phi_j(x) = \exp\left(-\frac{(x-\mu_j)^2}{\ell^2}\right)$$

Radial Basis Functions¶

Or they can be local. E.g. radial (or Gaussian) basis $$\phi_j(x) = \exp\left(-\frac{(x-\mu_j)^2}{\ell^2}\right)$$

In [5]:

f, ax = plt.subplots(figsize=(7,7))

Phi = np.zeros((phi[0].shape[0], 3))
for i in range(len(phi)):
    Phi[:, i:i+1] = phi[i]

w = np.random.normal(size=(3, 1))
f = np.dot(Phi,w)
a, = ax.plot(x, f, color=[0, 0, 1], linewidth=3)
ax.plot(x, phi[0], color=[1, 0, 0], linewidth=1) 
ax.plot(x, phi[1], color=[1, 0, 1], linewidth=1)
ax.plot(x, phi[2], color=[0, 0, 1], linewidth=1) 
ylim = [-4, 3]
ax.set_ylim(ylim)
plt.sca(ax)
plt.xticks([-1, 0, 1]) 
ax.set_xlabel('$x$', fontsize=20) 
ax.set_ylabel('$f(x)$', fontsize=20)
t = []
for i in range(w.shape[0]):
    t.append(ax.text(loc[i][0], loc[i][1], '$w_' + str(i) + ' = '+ str(w[i]) + '$', horizontalalignment='center', fontsize=20))

plt.savefig('./diagrams/radialFunction1.svg')

w = np.random.normal(size=(3, 1)) 
f = np.dot(Phi,w) 
a.set_ydata(f)
for i in range(3):
    t[i].set_text('$w_' + str(i) + ' = '+ str(w[i]) + '$')
plt.savefig('./diagrams/radialFunction2.svg')


w = np.random.normal(size=(3, 1)) 
f = np.dot(Phi, w) 
a.set_ydata(f)
for i in range(3):
    t[i].set_text('$w_' + str(i) + ' = '+ str(w[i]) + '$')
plt.savefig('./diagrams/radialFunction3.svg')

Functions Derived from Radial Basis¶

$$f(x) = {\color{\redColor}w_1 e^{-2(x+1)^2}} + {\color{\magentaColor}w_2e^{-2x^2}} + {\color{\blueColor}w_3 e^{-2(x-1)^2}}$$

Functions Derived from Radial Basis¶

$$f(x) = {\color{\redColor}w_1 e^{-2(x+1)^2}} + {\color{\magentaColor}w_2e^{-2x^2}} + {\color{\blueColor}w_3 e^{-2(x-1)^2}}$$

Functions Derived from Radial Basis¶

$$f(x) = {\color{\redColor}w_1 e^{-2(x+1)^2}} + {\color{\magentaColor}w_2e^{-2x^2}} + {\color{\blueColor}w_3 e^{-2(x-1)^2}}$$

Basis Function Models¶

The prediction function is now defined as $$f(\mathbf{x}_i) = \sum_{j=1}^m w_j \phi_{i, j} + c$$

Vector Notation¶

Write in vector notation, $$f(\mathbf{x}_i) = \mathbf{w}^\top \mathbf{\phi}_i + c$$

Log Likelihood for Basis Function Model¶

The likelihood of a single data point is $$p\left(y_i|x_i\right)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp
```
\left(-\frac{\left(y_i-\mathbf{w}^{\top}\mathbf{\phi}_i\right)^{2}}{2\sigma^2}\right).$$
```

Leading to a log likelihood for the data set of $$L(\mathbf{w},\sigma^2)= -\frac{n}{2}\log \sigma^2

  -\frac{n}{2}\log 2\pi -\frac{\sum
    _{i=1}^{n}\left(y_i-\mathbf{w}^{\top}\mathbf{\phi}_i\right)^{2}}{2\sigma^2}.$$

Objective Funciton¶

And a corresponding objective function of the form $$E(\mathbf{w},\sigma^2)= \frac{n}{2}\log

    \sigma^2 + \frac{\sum
      _{i=1}^{n}\left(y_i-\mathbf{w}^{\top}\mathbf{\phi}_i\right)^{2}}{2\sigma^2}.$$

Expand the Brackets¶

$$\begin{align} E(\mathbf{w},\sigma^2) = &\frac{n}{2}\log \sigma^2 + \frac{1}{2\sigma^2}\sum _{i=1}^{n}y_i^{2}-\frac{1}{\sigma^2}\sum _{i=1}^{n}y_i\mathbf{w}^{\top}\mathbf{\phi}_i\\ &+\frac{1}{2\sigma^2}\sum _{i=1}^{n}\mathbf{w}^{\top}\mathbf{\phi}_i\mathbf{\phi}_i^{\top}\mathbf{w} +\text{const}.\end{align}$$

Expand the Brackets¶

$$ \begin{align} E(\mathbf{w}, \sigma^2) = & \frac{n}{2}\log \sigma^2 + \frac{1}{2\sigma^2}\sum _{i=1}^{n}y_i^{2}-\frac{1}{\sigma^2} \mathbf{w}^\top\sum_{i=1}^{n}\mathbf{\phi}_i y_i\\ & +\frac{1}{2\sigma^2} \mathbf{w}^{\top}\left[\sum _{i=1}^{n}\mathbf{\phi}_i\mathbf{\phi}_i^{\top}\right]\mathbf{w} +\text{const}.\end{align}$$

Multivariate Derivatives Reminder¶

We will need some multivariate calculus. $$\frac{\text{d}\mathbf{a}^{\top}\mathbf{w}}{\text{d}\mathbf{w}}=\mathbf{a}$$ and $$\frac{\text{d}\mathbf{w}^{\top}\mathbf{A}\mathbf{w}}{\text{d}\mathbf{w}}=\left(\mathbf{A}+\mathbf{A}^{\top}\right)\mathbf{w}$$ or if $\mathbf{A}$ is symmetric (i.e. $\mathbf{A}=\mathbf{A}^{\top}$) $$\frac{\text{d}\mathbf{w}^{\top}\mathbf{A}\mathbf{w}}{\text{d}\mathbf{w}}=2\mathbf{A}\mathbf{w}.$$

Differentiate¶

Differentiating with respect to the vector $\mathbf{w}$ we obtain $$\frac{\text{d} L\left(\mathbf{w},\beta \right)}{\text{d} \mathbf{w}}=\frac{1}{\sigma^2} \sum _{i=1}^{n}\mathbf{\phi}_iy_i-\frac{1}{\sigma^2} \left[\sum _{i=1}^{n}\mathbf{\phi}_i\mathbf{\phi}_i^{\top}\right]\mathbf{w}$$ Leading to $$\mathbf{w}^{*}=\left[\sum _{i=1}^{n}\mathbf{\phi}_i\mathbf{\phi}_i^{\top}\right]^{-1}\sum _{i=1}^{n}\mathbf{\phi}_iy_i,$$

Matrix Notation¶

Rewrite in matrix notation: $$\sum _{i=1}^{n}\mathbf{\phi}_i\mathbf{\phi}_i^\top = \mathbf{\Phi}^\top \mathbf{\Phi}$$ $$\sum _{i=1}^{n}\mathbf{\phi}_iy_i = \mathbf{\Phi}^\top \mathbf{y}$$

Update Equations¶

Update for $\mathbf{w}^{*}$. $$\mathbf{w}^{*} = \left(\mathbf{\Phi}^\top \mathbf{\Phi}\right)^{-1} \mathbf{\Phi}^\top \mathbf{y}$$
The equation for $\left.\sigma^2\right.^{*}$ may also be found $$\left.\sigma^2\right.^{{*}}=\frac{\sum _{i=1}^{n}\left(y_i-\left.\mathbf{w}^{*}\right.^{\top}\mathbf{\phi}_i\right)^{2}}{n}.$$

Avoid Direct Inverse¶

E.g. Solve for $\mathbf{w}$ $$\left(\mathbf{\Phi}^\top \mathbf{\Phi}\right)\mathbf{w} = \mathbf{\Phi}^\top \mathbf{y}$$
See np.linalg.solve
In practice use $\mathbf{Q}\mathbf{R}$ decomposition (see lab class notes).

Polynomial Fits to Olympic Data¶

Reading¶

Section 1.4 of @Rogers:book11.
Chapter 1, pg 1-6 of @Bishop:book06.
Chapter 3, Section 3.1 of @Bishop:book06 up to pg 143.