import numpy as np
import pandas as pd
import pods
import matplotlib.pyplot as plt
%matplotlib inline
The Logit function, $$g^{-1}(\pi_i) = \log\frac{\pi_i}{1-\pi_i}.$$ This function is known as a link function.
For a standard regression we take, $$f(\mathbf{x}_i) = \mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}_i),$$
We have defined the link function as taking the form $g^{-1}(\cdot)$ implying that the inverse link function is given by $g(\cdot)$. Since we have defined, $$ g^{-1}(\pi(\mathbf{x})) = \mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}) $$ we can write $\pi$ in terms of the inverse link function, $g(\cdot)$ as $$ \pi(\mathbf{x}) = g(\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x})). $$
fig, ax = plt.subplots(figsize=(14,7))
f = np.linspace(-8, 8, 100)
g = 1/(1+np.exp(-f))
ax.plot(f, g, 'r-')
ax.set_title('Logistic Function', fontsize=20)
ax.set_xlabel('$f_i$', fontsize=20)
ax.set_ylabel('$g_i$', fontsize=20)
plt.savefig('./diagrams/logistic.svg')
Can now write $\pi$ as a function of the input and the parameter vector as, $$\pi(\mathbf{x},\mathbf{w}) = \frac{1}{1+ \exp\left(-\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x})\right)}.$$
Compute the output of a standard linear basis function composition ($\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x})$, as we did for linear regression)
Apply the inverse link function, $g(\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}))$.
Use this value in a Bernoulli distribution to form the likelihood.
From last time $$P(y_i|\mathbf{w}, \mathbf{x}) = \pi_i^{y_i} (1-\pi_i)^{1-y_i}$$
Trick for switching betwen probabilities
def bernoulli(y, pi):
if y == 1:
return pi
else:
return 1-pi
Probability of positive outcome for the $i$th data point $$\pi_i = g\left(\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}_i)\right),$$ where $g(\cdot)$ is the inverse link function
Objective function of the form \begin{align} E(\mathbf{w}) = & - \sum{i=1}^n y_i \log g\left(\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}_i)\right) \& - \sum{i=1}^n(1-y_i)\log \left(1-g\left(\mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}_i)\right)\right). \end{align}
Can't find a stationary point of the objective function analytically.
Optimization has to proceed by numerical methods.
Similarly to matrix factorization, for large data stochastic gradient descent (Robbins Munroe optimization procedure) works well.
This approach used in many internet companies.
Example: ad matching for Facebook.
Logistic regression used in combination with decision trees
Logistic regression is part of a family known as generalized linear models
They all take the form $$g^{-1}(f_i(x)) = \mathbf{w}^\top \boldsymbol{\phi}(\mathbf{x}_i)$$
As another example let's look at Poisson regression.
Poisson distribution is used for 'count data'. For non-negative integers, $y$, $$P(y) = \frac{\lambda^y}{y!}\exp(-y)$$
Here $\lambda$ is a rate parameter that can be thought of as the number of arrivals per unit time.
Poisson distributions can be used for disease count data. E.g. number of incidence of malaria in a district.
from scipy.stats import poisson
fig, ax = plt.subplots(figsize=(14,7))
y = np.asarray(range(0, 16))
p1 = poisson.pmf(y, mu=1.)
p3 = poisson.pmf(y, mu=3.)
p10 = poisson.pmf(y, mu=10.)
ax.plot(y, p1, 'r.-', markersize=20, label='$\lambda=1$')
ax.plot(y, p3, 'g.-', markersize=20, label='$\lambda=3$')
ax.plot(y, p10, 'b.-', markersize=20, label='$\lambda=10$')
ax.set_title('Poisson Distribution', fontsize=20)
ax.set_xlabel('$y_i$', fontsize=20)
ax.set_ylabel('$p(y_i)$', fontsize=20)
ax.legend(fontsize=20)
plt.savefig('./diagrams/poisson.svg')
In a Poisson regression make rate a function of space/time.$$\log \lambda(\mathbf{x}, t) = \mathbf{w}_x^\top \boldsymbol{\phi_x(\mathbf{x})} + \mathbf{w}_t^\top \boldsymbol(\phi_t(t))$$
This is known as a log linear or log additive model.
The link function is a logarithm.
We can rewrite such a function as $$\log \lambda(\mathbf{x}, t) = f_x(\mathbf{x}) + f_t(t)$$
Be careful though ... a log additive model is really multiplicative. $$\log \lambda(\mathbf{x}, t) = f_x(\mathbf{x}) + f_t(t)$$
Becomes $$\lambda(\mathbf{x}, t) = \exp(f_x(\mathbf{x}) + f_t(t))$$
Which is equivalent to $$\lambda(\mathbf{x}, t) = \exp(f_x(\mathbf{x}))\exp(f_t(t))$$
Link functions can be deceptive in this way.