Perceptron Algorithm Tutorial

The perceptron is one of the earliest and simplest machine learning algorithms for binary classification. This tutorial will walk you through the mathematical foundations, implementation, and practical usage of the perceptron algorithm using MLAI.

Mathematical Background

The perceptron algorithm is a linear classifier that learns to separate two classes of data points using a hyperplane. Given input data \(\mathbf{x} \in \mathbb{R}^d\) and binary labels \(y \in \{-1, +1\}\), the perceptron learns a weight vector \(\mathbf{w} \in \mathbb{R}^d\) and bias \(b \in \mathbb{R}\) such that:

\[f(\mathbf{x}) = \text{sign}(\mathbf{w}^T \mathbf{x} + b)\]

The algorithm works by iteratively updating the weights when it makes a mistake:

\[\mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + \alpha \cdot y_i \cdot \mathbf{x}_i\]

where \(\alpha\) is the learning rate.

Implementation in MLAI

MLAI provides a simple implementation of the perceptron algorithm. Let’s explore how to use it:

import numpy as np
import mlai.mlai as mlai
import matplotlib.pyplot as plt

# Generate some linearly separable data
np.random.seed(42)
n_samples = 100
n_features = 2

# Create two classes
X_plus = np.random.randn(n_samples//2, n_features) + np.array([2, 2])
X_minus = np.random.randn(n_samples//2, n_features) + np.array([-2, -2])

# Combine the data
X = np.vstack([X_plus, X_minus])
y = np.hstack([np.ones(n_samples//2), -np.ones(n_samples//2)])

# Initialize perceptron weights
w, b = mlai.init_perceptron(X_plus, X_minus, seed=42)
print(f"Initial weights: {w}")
print(f"Initial bias: {b}")

Training the Perceptron

Now let’s train the perceptron on our data:

# Training parameters
max_iterations = 100
learning_rate = 0.1

# Training loop
for iteration in range(max_iterations):
    # Update weights for one epoch
    w, b, x_selected, updated = mlai.update_perceptron(w, b, X_plus, X_minus, learning_rate)

    if not updated:
        print(f"Converged after {iteration} iterations")
        break

print(f"Final weights: {w}")
print(f"Final bias: {b}")

Visualizing the Results

Instead of manually plotting the decision boundary, you can use MLAI’s built-in perceptron visualization tools for a more informative plot. These tools show the decision boundary, the weight vector, and histograms of the projections for each class.

import mlai.plot as plot
import matplotlib.pyplot as plt

# Prepare the figure and axes
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Initialize the perceptron plot
handles = plot.init_perceptron(fig, axes, X_plus, X_minus, w, b, fontsize=16)
plt.show()

# During training, you can update the plot after each weight update:
for iteration in range(max_iterations):
    w, b, x_selected, updated = mlai.update_perceptron(w, b, X_plus, X_minus, learning_rate)
    handles = plot.update_perceptron(handles, fig, axes, X_plus, X_minus, iteration, w, b)
    plt.pause(0.1)  # Pause to visualize the update
    if not updated:
        print(f"Converged after {iteration} iterations")
        break
plt.show()

This approach provides a dynamic and educational visualization of how the perceptron algorithm updates its decision boundary and weight vector during training.

Key Concepts

Linear Separability: The perceptron only works when the data is linearly separable. If the classes cannot be separated by a hyperplane, the algorithm will not converge.
Convergence: The perceptron convergence theorem states that if the data is linearly separable, the perceptron will converge in a finite number of steps.
Learning Rate: The learning rate controls how much the weights are updated on each mistake. A larger learning rate leads to faster convergence but may cause instability.
Bias Term: The bias term allows the decision boundary to not pass through the origin, making the model more flexible.

Limitations

Only works for linearly separable data
Sensitive to the order of training examples
May not find the optimal separating hyperplane
Binary classification only

Extensions

The perceptron algorithm has inspired many modern machine learning techniques:

Support Vector Machines (SVMs): Find the optimal separating hyperplane
Neural Networks: Multi-layer perceptrons for complex decision boundaries
Online Learning: Update weights incrementally as new data arrives