CIP-0002: Comprehensive Test Framework with pytest

author:

Neil Lawrence

created:

2025-07-09

id:

0002

last_updated:

2025-07-09

status:

proposed

tags:

["cip", "testing", "pytest", "quality", "ci-cd"]

title:

Comprehensive Test Framework with pytest

CIP-0002: Comprehensive Test Framework with pytest

Summary

This CIP proposes the implementation of a comprehensive test framework using pytest for the MLAI package. The framework will include unit tests, integration tests, property-based tests, and automated test execution with coverage reporting to ensure code quality and reliability.

Motivation

The current MLAI package lacks any testing infrastructure, which poses several risks:

  • Code Quality: No automated verification that code changes work correctly

  • Regression Prevention: No way to detect when new changes break existing functionality

  • Documentation: Tests serve as living documentation of expected behavior

  • Confidence: Developers and users lack confidence in the codebase reliability

  • Educational Value: As a teaching package, MLAI should demonstrate best practices including testing

A comprehensive test framework will provide:

  • Automated verification of all functionality

  • Protection against regressions

  • Clear examples of expected behavior

  • Quality metrics through coverage reporting

  • Integration with CI/CD pipelines

  • Educational value for students learning ML

Detailed Description

Current State Analysis

  • No test directory or test files exist

  • No testing dependencies configured in pyproject.toml

  • No CI/CD integration for automated testing

  • No coverage reporting

  • No test documentation or examples

Proposed Test Framework Architecture

1. Test Structure
tests/
├── unit/                    # Unit tests for individual functions/classes
│   ├── test_mlai.py        # Core ML functionality tests
│   ├── test_plot.py        # Plotting functionality tests
│   ├── test_gp_tutorial.py # GP tutorial tests
│   └── test_mountain_car.py # Mountain car tests
├── integration/            # Integration tests
│   ├── test_tutorials.py   # End-to-end tutorial tests
│   └── test_examples.py    # Complete example tests
├── property/               # Property-based tests
│   └── test_properties.py  # Mathematical property tests
├── fixtures/               # Shared test fixtures
│   ├── conftest.py         # pytest configuration
│   └── data/               # Test data files
└── docs/                   # Test documentation
    └── testing_guide.md    # How to write and run tests
2. Testing Categories

Unit Tests:

  • Individual function behavior

  • Class method functionality

  • Edge cases and error conditions

  • Input validation

Integration Tests:

  • Complete tutorial workflows

  • End-to-end examples

  • Cross-module interactions

  • Real-world usage scenarios

Property-Based Tests:

  • Mathematical properties (e.g., GP kernel properties)

  • Invariants that should always hold

  • Statistical properties of ML algorithms

Performance Tests:

  • Execution time benchmarks

  • Memory usage monitoring

  • Scalability testing

3. Test Infrastructure

Dependencies:

  • pytest: Core testing framework

  • pytest-cov: Coverage reporting

  • pytest-benchmark: Performance testing

  • hypothesis: Property-based testing

  • numpy.testing: Numerical testing utilities

  • matplotlib.testing: Plot testing utilities

Configuration:

  • pytest.ini: Test discovery and execution settings

  • .coveragerc: Coverage reporting configuration

  • GitHub Actions workflow for CI/CD

Implementation Plan

Phase 1: Foundation Setup

  1. Set up testing infrastructure:

    • Add testing dependencies to pyproject.toml

    • Create tests/ directory structure

    • Configure pytest.ini and .coveragerc

    • Set up basic test discovery

  2. Create test configuration:

    • Write conftest.py with shared fixtures

    • Set up test data generation utilities

    • Configure test environment variables

Phase 2: Core Unit Tests

  1. Implement unit tests for core modules:

    • Test mlai.py core functionality

    • Test plot.py plotting functions

    • Test mathematical operations and algorithms

    • Test error handling and edge cases

  2. Add property-based tests:

    • Test GP kernel mathematical properties

    • Test statistical invariants

    • Test numerical stability properties

Phase 3: Integration and Tutorial Tests

  1. Create integration tests:

    • Test complete tutorial workflows

    • Test end-to-end examples

    • Test cross-module interactions

    • Test real-world usage scenarios

  2. Add performance benchmarks:

    • Benchmark key algorithms

    • Monitor memory usage

    • Test scalability with different data sizes

Phase 4: CI/CD and Documentation

  1. Set up automated testing:

    • Configure GitHub Actions workflow

    • Set up coverage reporting

    • Add test status badges to README

    • Configure automated test execution

  2. Create test documentation:

    • Write testing guide for contributors

    • Document test writing conventions

    • Create examples of good test practices

    • Add test running instructions

Backward Compatibility

This change is purely additive and will not affect any existing functionality. All existing code will continue to work as before. The only changes are:

  • Addition of test files and directories

  • Addition of testing dependencies

  • Addition of CI/CD configuration files

Testing Strategy

  • Test-Driven Development: Write tests before implementing new features

  • Comprehensive Coverage: Aim for >90% code coverage

  • Continuous Integration: Run tests on every commit and PR

  • Performance Monitoring: Track performance regressions

  • Documentation: Tests serve as executable documentation

Implementation Status

  • [x] Set up testing infrastructure

  • [x] Create test configuration

  • [x] Implement unit tests for core modules (gp_tutorial, init.py, deepgp_tutorial)

  • [x] Implement comprehensive unit tests for mlai.py

  • [x] Set up automated testing with GitHub Actions

  • [x] Add test status badges to README

  • [x] Add tests for abstract base classes and neural networks

  • [x] Improve code robustness with input validation

  • [x] Add comprehensive tests for Logistic Regression and Gaussian Process methods

  • [x] Create integration tests for tutorial workflows

  • [x] Fix logistic regression optimization issues

  • [ ] Add comprehensive tests for plot.py (35 unit tests created, integration tests needed)

  • [ ] Add property-based tests

  • [ ] Add performance benchmarks

  • [ ] Create test documentation

Completed Work (2025-07-15)

  • Testing Infrastructure: Added pytest, pytest-cov, hypothesis, and pytest-benchmark dependencies to pyproject.toml

  • Test Configuration: Created tests/ directory structure with conftest.py, pytest.ini, and .coveragerc

  • Unit Tests: Implemented comprehensive unit tests for:

  • Coverage Improvements: Increased mlai.py coverage from 15% to 86% through comprehensive testing

  • Abstract Base Classes: Added tests for Model, ProbModel, and MapModel NotImplementedError behavior

  • Neural Networks: Added comprehensive tests for SimpleNeuralNetwork, SimpleDropoutNeuralNetwork, and NonparametricDropoutNeuralNetwork

  • Input Validation: Added parameter validation to neural network constructors for robustness

  • Edge Cases: Added comprehensive tests for LM error handling, Kernel repr_html, and utility function edge cases

  • Kernel Functions: Added tests for 10+ additional kernel functions (exponentiated_quadratic, eq_cov, ou_cov, matern32_cov, matern52_cov, mlp_cov, icm_cov, slfm_cov, add_cov, prod_cov)

  • Basis Functions: Added edge case tests for polynomial and radial basis functions

  • Logistic Regression: Added comprehensive tests for LR gradient, compute_g, update_g, and objective methods

  • Gaussian Process: Added tests for posterior_f and update_inverse functions

  • Gaussian Noise Model: Added tests for grad_vals method

  • Overall Project Coverage: Achieved 27% overall project coverage (up from 0%)

    • gp_tutorial.py: 100% coverage with mocked matplotlib dependencies

    • __init__.py: 73% coverage testing imports and GPy availability

    • deepgp_tutorial.py: 83% coverage with circular import fixes and NumPy compatibility improvements

    • mlai.py: 58% coverage with 39 comprehensive test cases covering utility functions, perceptron, basis functions, linear models, neural networks, kernels, Gaussian processes, and noise models

  • Code Quality Improvements: Fixed circular import in deepgp_tutorial by changing import from mlai to mlai.mlai

  • Robustness Enhancements: Made scale_data function handle both 1D and 2D arrays

  • NumPy Compatibility: Fixed compatibility issues with newer NumPy versions by using explicit np.min()/np.max() calls

  • CI/CD Integration: Set up GitHub Actions workflows for testing and linting with codecov integration

  • Overall Progress: Achieved 21% total code coverage (up from 15%) with 70 total tests (60 passed, 10 skipped)

Completed Work (2025-07-16)

  • Integration Tests: Created comprehensive integration tests for tutorial workflows including basis functions, linear regression, logistic regression, perceptron, and Bayesian linear regression

  • Logistic Regression Optimization: Fixed parameter flattening during gradient descent while maintaining 2D internal storage for proper matrix operations

  • Coverage Improvements: Increased mlai.py coverage from 86% to 88% through comprehensive testing

  • Overall Project Coverage: Achieved 30% overall project coverage

    • mlai.py: 88% coverage with comprehensive test cases

    • gp_tutorial.py: 100% coverage with mocked matplotlib dependencies

    • __init__.py: 73% coverage testing imports and GPy availability

    • deepgp_tutorial.py: 83% coverage with circular import fixes and NumPy compatibility improvements

  • Test Infrastructure: 134 total tests (124 passed, 10 skipped) with comprehensive coverage of core functionality

Completed Work (2025-07-16) - Plot.py Testing

  • Foundational Plot.py Testing: Created 35 unit tests covering core plotting infrastructure

    • Plot Utilities: Tests for constants, pred_range function with various parameters

    • Matrix Plotting: Tests for all matrix display types (values, entries, image, patch, colorpatch)

    • Network Visualization: Tests for network and layer classes with properties and methods

    • Model Output Functions: Tests for model_output and model_sample with proper mocking

    • Error Handling: Tests for invalid inputs and edge cases

    • Performance: Tests for large matrices and high point counts

    • Integration: Tests for matplotlib, IPython, and optional dependencies (daft, GPy)

  • Bug Fixes in plot.py:

    • String Formatting: Fixed '{val:{prec}}' not working with integer values by converting to float

    • Return Values: Made model_output and model_sample return axis objects for better composability

    • Parameter Handling: Improved robust handling of highlight_row, highlight_col, zoom_row, zoom_col

    • Backward Compatibility: Maintained support for “:” string parameter while adding None/int/list support

  • Test Coverage: All 35 tests passing (100% success rate)

  • Code Quality: Improved error handling, consistent return values, and better parameter validation

  • Overall Impact: Significant improvement to plot.py reliability and maintainability

  • Status: Still in progress - tested ~6-8 functions out of ~60+ total functions in plot.py

References