CIP-0002: Comprehensive Test Framework with pytest
- author:
Neil Lawrence
- created:
2025-07-09
- id:
0002
- last_updated:
2025-07-09
- status:
proposed
- tags:
["cip", "testing", "pytest", "quality", "ci-cd"]
- title:
Comprehensive Test Framework with pytest
CIP-0002: Comprehensive Test Framework with pytest
Summary
This CIP proposes the implementation of a comprehensive test framework using pytest for the MLAI package. The framework will include unit tests, integration tests, property-based tests, and automated test execution with coverage reporting to ensure code quality and reliability.
Motivation
The current MLAI package lacks any testing infrastructure, which poses several risks:
Code Quality: No automated verification that code changes work correctly
Regression Prevention: No way to detect when new changes break existing functionality
Documentation: Tests serve as living documentation of expected behavior
Confidence: Developers and users lack confidence in the codebase reliability
Educational Value: As a teaching package, MLAI should demonstrate best practices including testing
A comprehensive test framework will provide:
Automated verification of all functionality
Protection against regressions
Clear examples of expected behavior
Quality metrics through coverage reporting
Integration with CI/CD pipelines
Educational value for students learning ML
Detailed Description
Current State Analysis
No test directory or test files exist
No testing dependencies configured in pyproject.toml
No CI/CD integration for automated testing
No coverage reporting
No test documentation or examples
Proposed Test Framework Architecture
1. Test Structure
tests/
├── unit/ # Unit tests for individual functions/classes
│ ├── test_mlai.py # Core ML functionality tests
│ ├── test_plot.py # Plotting functionality tests
│ ├── test_gp_tutorial.py # GP tutorial tests
│ └── test_mountain_car.py # Mountain car tests
├── integration/ # Integration tests
│ ├── test_tutorials.py # End-to-end tutorial tests
│ └── test_examples.py # Complete example tests
├── property/ # Property-based tests
│ └── test_properties.py # Mathematical property tests
├── fixtures/ # Shared test fixtures
│ ├── conftest.py # pytest configuration
│ └── data/ # Test data files
└── docs/ # Test documentation
└── testing_guide.md # How to write and run tests
2. Testing Categories
Unit Tests:
Individual function behavior
Class method functionality
Edge cases and error conditions
Input validation
Integration Tests:
Complete tutorial workflows
End-to-end examples
Cross-module interactions
Real-world usage scenarios
Property-Based Tests:
Mathematical properties (e.g., GP kernel properties)
Invariants that should always hold
Statistical properties of ML algorithms
Performance Tests:
Execution time benchmarks
Memory usage monitoring
Scalability testing
3. Test Infrastructure
Dependencies:
pytest: Core testing framework
pytest-cov: Coverage reporting
pytest-benchmark: Performance testing
hypothesis: Property-based testing
numpy.testing: Numerical testing utilities
matplotlib.testing: Plot testing utilities
Configuration:
pytest.ini: Test discovery and execution settings
.coveragerc: Coverage reporting configuration
GitHub Actions workflow for CI/CD
Implementation Plan
Phase 1: Foundation Setup
Set up testing infrastructure:
Add testing dependencies to pyproject.toml
Create tests/ directory structure
Configure pytest.ini and .coveragerc
Set up basic test discovery
Create test configuration:
Write conftest.py with shared fixtures
Set up test data generation utilities
Configure test environment variables
Phase 2: Core Unit Tests
Implement unit tests for core modules:
Test mlai.py core functionality
Test plot.py plotting functions
Test mathematical operations and algorithms
Test error handling and edge cases
Add property-based tests:
Test GP kernel mathematical properties
Test statistical invariants
Test numerical stability properties
Phase 3: Integration and Tutorial Tests
Create integration tests:
Test complete tutorial workflows
Test end-to-end examples
Test cross-module interactions
Test real-world usage scenarios
Add performance benchmarks:
Benchmark key algorithms
Monitor memory usage
Test scalability with different data sizes
Phase 4: CI/CD and Documentation
Set up automated testing:
Configure GitHub Actions workflow
Set up coverage reporting
Add test status badges to README
Configure automated test execution
Create test documentation:
Write testing guide for contributors
Document test writing conventions
Create examples of good test practices
Add test running instructions
Backward Compatibility
This change is purely additive and will not affect any existing functionality. All existing code will continue to work as before. The only changes are:
Addition of test files and directories
Addition of testing dependencies
Addition of CI/CD configuration files
Testing Strategy
Test-Driven Development: Write tests before implementing new features
Comprehensive Coverage: Aim for >90% code coverage
Continuous Integration: Run tests on every commit and PR
Performance Monitoring: Track performance regressions
Documentation: Tests serve as executable documentation
Implementation Status
[x] Set up testing infrastructure
[x] Create test configuration
[x] Implement unit tests for core modules (gp_tutorial, init.py, deepgp_tutorial)
[x] Implement comprehensive unit tests for mlai.py
[x] Set up automated testing with GitHub Actions
[x] Add test status badges to README
[x] Add tests for abstract base classes and neural networks
[x] Improve code robustness with input validation
[x] Add comprehensive tests for Logistic Regression and Gaussian Process methods
[x] Create integration tests for tutorial workflows
[x] Fix logistic regression optimization issues
[ ] Add comprehensive tests for plot.py (35 unit tests created, integration tests needed)
[ ] Add property-based tests
[ ] Add performance benchmarks
[ ] Create test documentation
Completed Work (2025-07-15)
Testing Infrastructure: Added pytest, pytest-cov, hypothesis, and pytest-benchmark dependencies to pyproject.toml
Test Configuration: Created tests/ directory structure with conftest.py, pytest.ini, and .coveragerc
Unit Tests: Implemented comprehensive unit tests for:
Coverage Improvements: Increased mlai.py coverage from 15% to 86% through comprehensive testing
Abstract Base Classes: Added tests for Model, ProbModel, and MapModel NotImplementedError behavior
Neural Networks: Added comprehensive tests for SimpleNeuralNetwork, SimpleDropoutNeuralNetwork, and NonparametricDropoutNeuralNetwork
Input Validation: Added parameter validation to neural network constructors for robustness
Edge Cases: Added comprehensive tests for LM error handling, Kernel repr_html, and utility function edge cases
Kernel Functions: Added tests for 10+ additional kernel functions (exponentiated_quadratic, eq_cov, ou_cov, matern32_cov, matern52_cov, mlp_cov, icm_cov, slfm_cov, add_cov, prod_cov)
Basis Functions: Added edge case tests for polynomial and radial basis functions
Logistic Regression: Added comprehensive tests for LR gradient, compute_g, update_g, and objective methods
Gaussian Process: Added tests for posterior_f and update_inverse functions
Gaussian Noise Model: Added tests for grad_vals method
Overall Project Coverage: Achieved 27% overall project coverage (up from 0%)
gp_tutorial.py
: 100% coverage with mocked matplotlib dependencies__init__.py
: 73% coverage testing imports and GPy availabilitydeepgp_tutorial.py
: 83% coverage with circular import fixes and NumPy compatibility improvementsmlai.py
: 58% coverage with 39 comprehensive test cases covering utility functions, perceptron, basis functions, linear models, neural networks, kernels, Gaussian processes, and noise models
Code Quality Improvements: Fixed circular import in deepgp_tutorial by changing import from
mlai
tomlai.mlai
Robustness Enhancements: Made scale_data function handle both 1D and 2D arrays
NumPy Compatibility: Fixed compatibility issues with newer NumPy versions by using explicit np.min()/np.max() calls
CI/CD Integration: Set up GitHub Actions workflows for testing and linting with codecov integration
Overall Progress: Achieved 21% total code coverage (up from 15%) with 70 total tests (60 passed, 10 skipped)
Completed Work (2025-07-16)
Integration Tests: Created comprehensive integration tests for tutorial workflows including basis functions, linear regression, logistic regression, perceptron, and Bayesian linear regression
Logistic Regression Optimization: Fixed parameter flattening during gradient descent while maintaining 2D internal storage for proper matrix operations
Coverage Improvements: Increased mlai.py coverage from 86% to 88% through comprehensive testing
Overall Project Coverage: Achieved 30% overall project coverage
mlai.py
: 88% coverage with comprehensive test casesgp_tutorial.py
: 100% coverage with mocked matplotlib dependencies__init__.py
: 73% coverage testing imports and GPy availabilitydeepgp_tutorial.py
: 83% coverage with circular import fixes and NumPy compatibility improvements
Test Infrastructure: 134 total tests (124 passed, 10 skipped) with comprehensive coverage of core functionality
Completed Work (2025-07-16) - Plot.py Testing
Foundational Plot.py Testing: Created 35 unit tests covering core plotting infrastructure
Plot Utilities: Tests for constants, pred_range function with various parameters
Matrix Plotting: Tests for all matrix display types (values, entries, image, patch, colorpatch)
Network Visualization: Tests for network and layer classes with properties and methods
Model Output Functions: Tests for model_output and model_sample with proper mocking
Error Handling: Tests for invalid inputs and edge cases
Performance: Tests for large matrices and high point counts
Integration: Tests for matplotlib, IPython, and optional dependencies (daft, GPy)
Bug Fixes in plot.py:
String Formatting: Fixed
'{val:{prec}}'
not working with integer values by converting to floatReturn Values: Made model_output and model_sample return axis objects for better composability
Parameter Handling: Improved robust handling of highlight_row, highlight_col, zoom_row, zoom_col
Backward Compatibility: Maintained support for “:” string parameter while adding None/int/list support
Test Coverage: All 35 tests passing (100% success rate)
Code Quality: Improved error handling, consistent return values, and better parameter validation
Overall Impact: Significant improvement to plot.py reliability and maintainability
Status: Still in progress - tested ~6-8 functions out of ~60+ total functions in plot.py