Skip to content

license-shield interrogate-badge Ruff coverage-badge versions-shield

Timebased Cross Validation

timebasedcv is a Python codebase that provides a cross validation strategy based on time.


Documentation | Repository | Issue Tracker


Alpha Notice

This codebase is experimental and is working for my use cases. It is very probable that there are cases not covered and for which it breaks (badly). If you find them, please feel free to open an issue in the issue page of the repo.

Description

The current implementation of scikit-learn TimeSeriesSplit lacks the flexibility of having multiple samples within the same time period/unit.

This codebase addresses such problem by providing a cross validation strategy based on a time period rather than the number of samples. This is useful when the data is time dependent, and the model should be trained on past data and tested on future data, independently from the number of observations present within a given time period.

Temporal data leakage is an issue and we want to prevent that from happening!

We introduce two main classes:

  • TimeBasedSplitallows to define a time based split with a given frequency, train size, test size, gap, stride and window type. Its core method split requires to pass a time series as input to create the boolean masks for train and test from the instance information defined above. Therefore it is not compatible with scikit-learn CV Splitters.
  • TimeBasedCVSplitter conforms with scikit-learn CV Splitters but requires to pass the time series as input to the instance. That is because a CV Splitter needs to know a priori the number of splits, and the split method shouldn't take any extra arguments as input other than the arrays to split.it.

Installation

timebasedcv is a published Python package on pypi, therefore it can be installed directly via pip, as well as from source using pip and git, or with a local clone:

python -m pip install timebasedcv
python -m pip install git+https://github.com/FBruzzesi/timebasedcv.git
git clone https://github.com/FBruzzesi/timebasedcv.git
cd timebasedcv
python -m pip install .

Getting Started

Please refer to the dedicated page Getting Started.

License

The project has a MIT Licence