Problem with Linear Regression—\(\inputVector\) may not be linearly related to \(\dataVector\).
Potential solution: create a feature space: define \(\basisFunc(\inputVector)\) where \(\basisFunc(\cdot)\) is a nonlinear function of \(\inputVector\).
Model for target is a linear combination of these nonlinear functions \[\mappingFunction(\inputVector) = \sum_{j=1}^\numBasisFunc \mappingScalar_j \basisFunc_j(\inputVector)\]
Basis functions can be local e.g. radial (or Gaussian) basis \[\basisFunc_j(\inputScalar) = \exp\left(-\frac{(\inputScalar-\mu_j)^2}{\lengthScale^2}\right)\]
Now we are going to consider how these basis functions can be adjusted to fit to a particular data set. We will return to the olympic marathon data from last time. First we will scale the output of the data to be zero mean and variance 1.
Basis Function Models
The prediction function is now defined as \[
\mappingFunction(\inputVector_i) = \sum_{j=1}^\numBasisFunc \mappingScalar_j \basisFunc_{i, j}
\]
Vector Notation
Write in vector notation, \[
\mappingFunction(\inputVector_i) = \mappingVector^\top \basisVector_i
\]
Log Likelihood for Basis Function Model
The likelihood of a single data point is \[
p\left(\dataScalar_i|\inputScalar_i\right)=\frac{1}{\sqrt{2\pi\dataStd^2}}\exp\left(-\frac{\left(\dataScalar_i-\mappingVector^{\top}\basisVector_i\right)^{2}}{2\dataStd^2}\right).
\]
Log Likelihood for Basis Function Model
Leading to a log likelihood for the data set of \[
L(\mappingVector,\dataStd^2)= -\frac{\numData}{2}\log \dataStd^2-\frac{\numData}{2}\log 2\pi -\frac{\sum_{i=1}^{\numData}\left(\dataScalar_i-\mappingVector^{\top}\basisVector_i\right)^{2}}{2\dataStd^2}.
\]
Objective Function
And a corresponding objective function of the form \[
\errorFunction(\mappingVector,\dataStd^2)= \frac{\numData}{2}\log\dataStd^2 + \frac{\sum_{i=1}^{\numData}\left(\dataScalar_i-\mappingVector^{\top}\basisVector_i\right)^{2}}{2\dataStd^2}.
\]
We will need some multivariate calculus. \[\frac{\text{d}\mathbf{a}^{\top}\mappingVector}{\text{d}\mappingVector}=\mathbf{a}\] and \[\frac{\text{d}\mappingVector^{\top}\mathbf{A}\mappingVector}{\text{d}\mappingVector}=\left(\mathbf{A}+\mathbf{A}^{\top}\right)\mappingVector\] or if \(\mathbf{A}\) is symmetric (i.e.\(\mathbf{A}=\mathbf{A}^{\top}\)) \[\frac{\text{d}\mappingVector^{\top}\mathbf{A}\mappingVector}{\text{d}\mappingVector}=2\mathbf{A}\mappingVector.\]
Differentiate
Differentiating with respect to the vector \(\mappingVector\) we obtain \[\frac{\text{d} E\left(\mappingVector,\dataStd^2 \right)}{\text{d}\mappingVector}=-\frac{1}{\dataStd^2} \sum_{i=1}^{\numData}\basisVector_i\dataScalar_i+\frac{1}{\dataStd^2} \left[\sum_{i=1}^{\numData}\basisVector_i\basisVector_i^{\top}\right]\mappingVector\] Leading to \[\mappingVector^{*}=\left[\sum_{i=1}^{\numData}\basisVector_i\basisVector_i^{\top}\right]^{-1}\sum_{i=1}^{\numData}\basisVector_i\dataScalar_i,\]
Update for \(\mappingVector^{*}\)\[
\mappingVector^{*} = \left(\basisMatrix^\top \basisMatrix\right)^{-1} \basisMatrix^\top \dataVector
\]
The equation for \(\left.\dataStd^2\right.^{*}\) may also be found \[
\left.\dataStd^2\right.^{{*}}=\frac{\sum_{i=1}^{\numData}\left(\dataScalar_i-\left.\mappingVector^{*}\right.^{\top}\basisVector_i\right)^{2}}{\numData}.
\]
Avoid Direct Inverse
E.g. Solve for \(\mappingVector\)\[
\left(\basisMatrix^\top \basisMatrix\right)\mappingVector = \basisMatrix^\top \dataVector
\]
See np.linalg.solve
In practice use \(\mathbf{Q}\mathbf{R}\) decomposition (see lab class notes).
Polynomial Fits to Olympic Data
Non-linear but Linear in the Parameters
Model is non-linear, but linear in parameters \[
\mappingFunction(\inputVector) = \mappingVector^\top \basisVector(\inputVector)
\]
\(\inputVector\) is inside the non-linearity, but \(\mappingVector\) is outside. \[
\mappingFunction(\inputVector) = \mappingVector^\top \basisVector(\inputVector;
\boldsymbol{\theta}),
\]
Further Reading
Section 1.4 of Rogers and Girolami (2011)
Chapter 1, pg 1-6 of Bishop (2006)
Chapter 3, Section 3.1 up to pg 143 of Bishop (2006)