CIP-0002: Comprehensive Test Framework with pytest

author:: Neil Lawrence
created:: 2025-07-09
id:: 0002
last_updated:: 2025-07-09
status:: proposed
tags:: ["cip", "testing", "pytest", "quality", "ci-cd"]
title:: Comprehensive Test Framework with pytest

CIP-0002: Comprehensive Test Framework with pytest

Summary

This CIP proposes the implementation of a comprehensive test framework using pytest for the MLAI package. The framework will include unit tests, integration tests, property-based tests, and automated test execution with coverage reporting to ensure code quality and reliability.

Motivation

The current MLAI package lacks any testing infrastructure, which poses several risks:

Code Quality: No automated verification that code changes work correctly
Regression Prevention: No way to detect when new changes break existing functionality
Documentation: Tests serve as living documentation of expected behavior
Confidence: Developers and users lack confidence in the codebase reliability
Educational Value: As a teaching package, MLAI should demonstrate best practices including testing

A comprehensive test framework will provide:

Automated verification of all functionality
Protection against regressions
Clear examples of expected behavior
Quality metrics through coverage reporting
Integration with CI/CD pipelines
Educational value for students learning ML

Detailed Description

Current State Analysis

No test directory or test files exist
No testing dependencies configured in pyproject.toml
No CI/CD integration for automated testing
No coverage reporting
No test documentation or examples

Proposed Test Framework Architecture

1. Test Structure

tests/
├── unit/                    # Unit tests for individual functions/classes
│   ├── test_mlai.py        # Core ML functionality tests
│   ├── test_plot.py        # Plotting functionality tests
│   ├── test_gp_tutorial.py # GP tutorial tests
│   └── test_mountain_car.py # Mountain car tests
├── integration/            # Integration tests
│   ├── test_tutorials.py   # End-to-end tutorial tests
│   └── test_examples.py    # Complete example tests
├── property/               # Property-based tests
│   └── test_properties.py  # Mathematical property tests
├── fixtures/               # Shared test fixtures
│   ├── conftest.py         # pytest configuration
│   └── data/               # Test data files
└── docs/                   # Test documentation
    └── testing_guide.md    # How to write and run tests

2. Testing Categories

Unit Tests:

Individual function behavior
Class method functionality
Edge cases and error conditions
Input validation

Integration Tests:

Complete tutorial workflows
End-to-end examples
Cross-module interactions
Real-world usage scenarios

Property-Based Tests:

Mathematical properties (e.g., GP kernel properties)
Invariants that should always hold
Statistical properties of ML algorithms

Performance Tests:

Execution time benchmarks
Memory usage monitoring
Scalability testing

3. Test Infrastructure

Dependencies:

pytest: Core testing framework
pytest-cov: Coverage reporting
pytest-benchmark: Performance testing
hypothesis: Property-based testing
numpy.testing: Numerical testing utilities
matplotlib.testing: Plot testing utilities

Configuration:

pytest.ini: Test discovery and execution settings
.coveragerc: Coverage reporting configuration
GitHub Actions workflow for CI/CD

Implementation Plan

Phase 1: Foundation Setup

Set up testing infrastructure:
- Add testing dependencies to pyproject.toml
- Create tests/ directory structure
- Configure pytest.ini and .coveragerc
- Set up basic test discovery
Create test configuration:
- Write conftest.py with shared fixtures
- Set up test data generation utilities
- Configure test environment variables

Phase 2: Core Unit Tests

Implement unit tests for core modules:
- Test mlai.py core functionality
- Test plot.py plotting functions
- Test mathematical operations and algorithms
- Test error handling and edge cases
Add property-based tests:
- Test GP kernel mathematical properties
- Test statistical invariants
- Test numerical stability properties

Phase 3: Integration and Tutorial Tests

Create integration tests:
- Test complete tutorial workflows
- Test end-to-end examples
- Test cross-module interactions
- Test real-world usage scenarios
Add performance benchmarks:
- Benchmark key algorithms
- Monitor memory usage
- Test scalability with different data sizes

Phase 4: CI/CD and Documentation

Set up automated testing:
- Configure GitHub Actions workflow
- Set up coverage reporting
- Add test status badges to README
- Configure automated test execution
Create test documentation:
- Write testing guide for contributors
- Document test writing conventions
- Create examples of good test practices
- Add test running instructions

Backward Compatibility

This change is purely additive and will not affect any existing functionality. All existing code will continue to work as before. The only changes are:

Addition of test files and directories
Addition of testing dependencies
Addition of CI/CD configuration files

Testing Strategy

Test-Driven Development: Write tests before implementing new features
Comprehensive Coverage: Aim for >90% code coverage
Continuous Integration: Run tests on every commit and PR
Performance Monitoring: Track performance regressions
Documentation: Tests serve as executable documentation

Implementation Status

[x] Set up testing infrastructure
[x] Create test configuration
[x] Implement unit tests for core modules (gp_tutorial, init.py, deepgp_tutorial)
[x] Implement comprehensive unit tests for mlai.py
[x] Set up automated testing with GitHub Actions
[x] Add test status badges to README
[x] Add tests for abstract base classes and neural networks
[x] Improve code robustness with input validation
[x] Add comprehensive tests for Logistic Regression and Gaussian Process methods
[x] Create integration tests for tutorial workflows
[x] Fix logistic regression optimization issues
[ ] Add comprehensive tests for plot.py (35 unit tests created, integration tests needed)
[ ] Add property-based tests
[ ] Add performance benchmarks
[ ] Create test documentation

Completed Work (2025-07-15)

Testing Infrastructure: Added pytest, pytest-cov, hypothesis, and pytest-benchmark dependencies to pyproject.toml
Test Configuration: Created tests/ directory structure with conftest.py, pytest.ini, and .coveragerc
Unit Tests: Implemented comprehensive unit tests for:
Coverage Improvements: Increased mlai.py coverage from 15% to 86% through comprehensive testing
Abstract Base Classes: Added tests for Model, ProbModel, and MapModel NotImplementedError behavior
Neural Networks: Added comprehensive tests for SimpleNeuralNetwork, SimpleDropoutNeuralNetwork, and NonparametricDropoutNeuralNetwork
Input Validation: Added parameter validation to neural network constructors for robustness
Edge Cases: Added comprehensive tests for LM error handling, Kernel repr_html, and utility function edge cases
Kernel Functions: Added tests for 10+ additional kernel functions (exponentiated_quadratic, eq_cov, ou_cov, matern32_cov, matern52_cov, mlp_cov, icm_cov, slfm_cov, add_cov, prod_cov)
Basis Functions: Added edge case tests for polynomial and radial basis functions
Logistic Regression: Added comprehensive tests for LR gradient, compute_g, update_g, and objective methods
Gaussian Process: Added tests for posterior_f and update_inverse functions
Gaussian Noise Model: Added tests for grad_vals method
Overall Project Coverage: Achieved 27% overall project coverage (up from 0%)
- gp_tutorial.py: 100% coverage with mocked matplotlib dependencies
- __init__.py: 73% coverage testing imports and GPy availability
- deepgp_tutorial.py: 83% coverage with circular import fixes and NumPy compatibility improvements
- mlai.py: 58% coverage with 39 comprehensive test cases covering utility functions, perceptron, basis functions, linear models, neural networks, kernels, Gaussian processes, and noise models
Code Quality Improvements: Fixed circular import in deepgp_tutorial by changing import from mlai to mlai.mlai
Robustness Enhancements: Made scale_data function handle both 1D and 2D arrays
NumPy Compatibility: Fixed compatibility issues with newer NumPy versions by using explicit np.min()/np.max() calls
CI/CD Integration: Set up GitHub Actions workflows for testing and linting with codecov integration
Overall Progress: Achieved 21% total code coverage (up from 15%) with 70 total tests (60 passed, 10 skipped)

Completed Work (2025-07-16)

Integration Tests: Created comprehensive integration tests for tutorial workflows including basis functions, linear regression, logistic regression, perceptron, and Bayesian linear regression
Logistic Regression Optimization: Fixed parameter flattening during gradient descent while maintaining 2D internal storage for proper matrix operations
Coverage Improvements: Increased mlai.py coverage from 86% to 88% through comprehensive testing
Overall Project Coverage: Achieved 30% overall project coverage
- mlai.py: 88% coverage with comprehensive test cases
- gp_tutorial.py: 100% coverage with mocked matplotlib dependencies
- __init__.py: 73% coverage testing imports and GPy availability
- deepgp_tutorial.py: 83% coverage with circular import fixes and NumPy compatibility improvements
Test Infrastructure: 134 total tests (124 passed, 10 skipped) with comprehensive coverage of core functionality

Completed Work (2025-07-16) - Plot.py Testing

Foundational Plot.py Testing: Created 35 unit tests covering core plotting infrastructure
- Plot Utilities: Tests for constants, pred_range function with various parameters
- Matrix Plotting: Tests for all matrix display types (values, entries, image, patch, colorpatch)
- Network Visualization: Tests for network and layer classes with properties and methods
- Model Output Functions: Tests for model_output and model_sample with proper mocking
- Error Handling: Tests for invalid inputs and edge cases
- Performance: Tests for large matrices and high point counts
- Integration: Tests for matplotlib, IPython, and optional dependencies (daft, GPy)
Bug Fixes in plot.py:
- String Formatting: Fixed '{val:{prec}}' not working with integer values by converting to float
- Return Values: Made model_output and model_sample return axis objects for better composability
- Parameter Handling: Improved robust handling of highlight_row, highlight_col, zoom_row, zoom_col
- Backward Compatibility: Maintained support for “:” string parameter while adding None/int/list support
Test Coverage: All 35 tests passing (100% success rate)
Code Quality: Improved error handling, consistent return values, and better parameter validation
Overall Impact: Significant improvement to plot.py reliability and maintainability
Status: Still in progress - tested ~6-8 functions out of ~60+ total functions in plot.py

CIP-0002: Comprehensive Test Framework with pytest