Skip to content

timebasedcv.core

timebasedcv.core._CoreTimeBasedSplit

Base class for time based splits. This class is not meant to be used directly.

_CoreTimeBasedSplit implements all the logics to set up a time based splits class.

In particular it implements _splits_from_period which is used to generate splits from a given time period (from start to end dates) from the given arguments of the class (frequency, train_size, forecast_horizon, gap, stride and window type).

Parameters:

Name Type Description Default
frequency FrequencyUnit

The frequency (or time unit) of the time series. Must be one of "days", "seconds", "microseconds", "milliseconds", "minutes", "hours", "weeks". These are the only valid values for the unit argument of timedelta from python datetime standard library.

required
train_size int

Defines the minimum number of time units required to be in the train set.

required
forecast_horizon int

Specifies the number of time units to forecast.

required
gap int

Sets the number of time units to skip between the end of the train set and the start of the forecast set.

0
stride Union[int, None]

How many time unit to move forward after each split. If None (or set to 0), the stride is equal to the forecast_horizon quantity.

None
window WindowType

The type of window to use, either "rolling" or "expanding".

'rolling'
mode ModeType

Determines in which orders the splits are generated, either "forward" (start to end) or "backward" (end to start).

'forward'

Raises:

Type Description
ValueError
  • If frequency is not one of "days", "seconds", "microseconds", "milliseconds", "minutes", "hours", "weeks".
  • If window is not one of "rolling" or "expanding".
  • If mode is not one of "forward" or "backward"
  • If train_size, forecast_horizon, gap or stride are not strictly positive.
TypeError

If train_size, forecast_horizon, gap or stride are not of type int.

Although _CoreTimeBasedSplit is not meant to be used directly, it can be used as a template to create new time based splits classes.

Examples:

from timebasedcv.core import _CoreTimeBasedSplit


class MyTimeBasedSplit(_CoreTimeBasedSplit):
    ...

    def split(self, X, timeseries):
        '''Implement the split method to return a generator'''

        for split in self._splits_from_period(timeseries.min(), timeseries.max()):
            # Do something with the split to compute the train and forecast sets
            ...
            yield X_train, y_test
Source code in timebasedcv/core.py
class _CoreTimeBasedSplit:
    """Base class for time based splits. This class is not meant to be used directly.

    `_CoreTimeBasedSplit` implements all the logics to set up a time based splits class.

    In particular it implements `_splits_from_period` which is used to generate splits from a given time period (from
    start to end dates) from the given arguments of the class (frequency, train_size, forecast_horizon, gap, stride and
    window type).

    Arguments:
        frequency: The frequency (or time unit) of the time series. Must be one of "days", "seconds", "microseconds",
            "milliseconds", "minutes", "hours", "weeks". These are the only valid values for the `unit` argument of
            `timedelta` from python `datetime` standard library.
        train_size: Defines the minimum number of time units required to be in the train set.
        forecast_horizon: Specifies the number of time units to forecast.
        gap: Sets the number of time units to skip between the end of the train set and the start of the forecast set.
        stride: How many time unit to move forward after each split. If `None` (or set to 0), the stride is equal to the
            `forecast_horizon` quantity.
        window: The type of window to use, either "rolling" or "expanding".
        mode: Determines in which orders the splits are generated, either "forward" (start to end) or "backward"
            (end to start).

    Raises:
        ValueError:
            - If `frequency` is not one of "days", "seconds", "microseconds", "milliseconds", "minutes", "hours",
            "weeks".
            - If `window` is not one of "rolling" or "expanding".
            - If `mode` is not one of "forward" or "backward"
            - If `train_size`, `forecast_horizon`, `gap` or `stride` are not strictly positive.
        TypeError: If `train_size`, `forecast_horizon`, `gap` or `stride` are not of type `int`.

    Although `_CoreTimeBasedSplit` is not meant to be used directly, it can be used as a template to create new time
    based splits classes.

    Examples:
        ```python
        from timebasedcv.core import _CoreTimeBasedSplit


        class MyTimeBasedSplit(_CoreTimeBasedSplit):
            ...

            def split(self, X, timeseries):
                '''Implement the split method to return a generator'''

                for split in self._splits_from_period(timeseries.min(), timeseries.max()):
                    # Do something with the split to compute the train and forecast sets
                    ...
                    yield X_train, y_test
        ```
    """

    def __init__(  # noqa: PLR0913
        self: Self,
        *,
        frequency: FrequencyUnit,
        train_size: int,
        forecast_horizon: int,
        gap: int = 0,
        stride: Union[int, None] = None,
        window: WindowType = "rolling",
        mode: ModeType = "forward",
    ) -> None:
        self.frequency_ = frequency
        self.train_size_ = train_size
        self.forecast_horizon_ = forecast_horizon
        self.gap_ = gap
        self.stride_ = stride or forecast_horizon
        self.window_ = window
        self.mode_ = mode

        self._validate_arguments()

    def _validate_arguments(self: Self) -> None:
        """Post init used to validate the TimeSpacedSplit attributes."""
        # Validate frequency
        if self.frequency_ not in _frequency_values:
            msg = f"`frequency` must be one of {_frequency_values}. Found {self.frequency_}"
            raise ValueError(msg)

        # Validate window
        if self.window_ not in _window_values:
            msg = f"`window` must be one of {_window_values}. Found {self.window_}"
            raise ValueError(msg)

        # Validate mode
        if self.mode_ not in _mode_values:
            msg = f"`mode` must be one of {_mode_values}. Found {self.mode_}"
            raise ValueError(msg)

        # Validate positive integer arguments
        _slot_names = ("train_size_", "forecast_horizon_", "gap_", "stride_")
        _values = tuple(getattr(self, _attr) for _attr in _slot_names)
        _lower_bounds = (1, 1, 0, 1)

        _types = tuple(type(v) for v in _values)

        if not all(t is int for t in _types):
            msg = (
                f"(`{'`, `'.join(_slot_names)}`) arguments must be of type `int`. "
                f"Found (`{'`, `'.join(str(t) for t in _types)}`)"
            )
            raise TypeError(msg)

        if not all(v >= lb for v, lb in zip(_values, _lower_bounds)):
            msg = (
                f"(`{'`, `'.join(_slot_names)}`) must be greater or equal than "
                f"({', '.join(map(str, _lower_bounds))}).\n"
                f"Found ({', '.join(str(v) for v in _values)})"
            )
            raise ValueError(msg)

    @property
    def name_(self: Self) -> str:
        return self.__class__.__name__

    def __repr__(self: Self) -> str:
        """Custom repr method."""
        _attrs = (
            "frequency_",
            "train_size_",
            "forecast_horizon_",
            "gap_",
            "stride_",
            "window_",
        )
        _values = tuple(getattr(self, _attr) for _attr in _attrs)
        _new_line_tab = "\n    "

        return f"{self.name_}" "(\n    " f"{_new_line_tab.join(f'{s} = {v}' for s, v in zip(_attrs, _values))}" "\n)"

    @property
    def train_delta(self: Self) -> timedelta:
        """Returns the `timedelta` object corresponding to the `train_size`."""
        return timedelta(**{str(self.frequency_): self.train_size_})

    @property
    def forecast_delta(self: Self) -> timedelta:
        """Returns the `timedelta` object corresponding to the `forecast_horizon`."""
        return timedelta(**{str(self.frequency_): self.forecast_horizon_})

    @property
    def gap_delta(self: Self) -> timedelta:
        """Returns the `timedelta` object corresponding to the `gap` and `frequency`."""
        return timedelta(**{str(self.frequency_): self.gap_})

    @property
    def stride_delta(self: Self) -> timedelta:
        """Returns the `timedelta` object corresponding to `stride`."""
        return timedelta(**{str(self.frequency_): self.stride_})

    def _splits_from_period(
        self: Self,
        time_start: DateTimeLike,
        time_end: DateTimeLike,
    ) -> Generator[SplitState, None, None]:
        """Generate splits from `time_start` to `time_end` based on the parameters passed to the class instance.

        This is the core iteration that generates splits. It is used by the `split` method to generate splits from the
        time series.

        Arguments:
            time_start: The start of the time period.
            time_end: The end of the time period.

        Returns:
            A generator of `SplitState` instances.
        """
        if time_start >= time_end:
            msg = "`time_start` must be before `time_end`."
            raise ValueError(msg)

        if self.mode_ == "forward":
            train_delta = self.train_delta
            forecast_delta = self.forecast_delta
            gap_delta = self.gap_delta
            stride_delta = self.stride_delta

            train_start = time_start
            train_end = time_start + train_delta
            forecast_start = train_end + gap_delta
            forecast_end = forecast_start + forecast_delta

        else:
            train_delta = -self.train_delta
            forecast_delta = -self.forecast_delta
            gap_delta = -self.gap_delta
            stride_delta = -self.stride_delta

            forecast_end = time_end
            forecast_start = forecast_end + forecast_delta
            train_end = forecast_start + gap_delta
            train_start = train_end + train_delta if self.window_ == "rolling" else time_start

        while (forecast_start <= time_end) and (train_start >= time_start) and (train_start <= train_end + train_delta):
            yield SplitState(train_start, train_end, forecast_start, forecast_end)

            # Update state values
            train_start = train_start + stride_delta if self.window_ == "rolling" else train_start
            train_end = train_end + stride_delta
            forecast_start = forecast_start + stride_delta
            forecast_end = forecast_end + stride_delta

    def n_splits_of(
        self: Self,
        *,
        time_series: Union[SeriesLike[DateTimeLike], None] = None,
        start_dt: NullableDatetime = None,
        end_dt: NullableDatetime = None,
    ) -> int:
        """Returns the number of splits that can be generated from `time_series`.

        Arguments:
            time_series: A time series data. If provided it should support `.min()` and `.max().
            start_dt: The start date and time of the time series. If not provided, it will be inferred from
                `time_series`.
            end_dt: The end date and time of the time series. If not provided, it will be inferred from
                `time_series`.

        Returns:
            The number of splits that can be generated from the given time series.

        Raises:
            ValueError:
                - If both `start_dt` and `end_dt` are provided and `start_dt` is greater than or equal to `end_dt`.
                - If neither `time_series` nor (`start_dt`, `end_dt`) pair is provided.
        """
        if (start_dt is not None) and (end_dt is not None):
            if start_dt >= end_dt:
                msg = "`start_dt` must be before `end_dt`."
                raise ValueError(msg)
            else:
                time_start, time_end = start_dt, end_dt
        elif time_series is not None:
            time_start, time_end = time_series.min(), time_series.max()
        else:
            msg = "Either `time_series` or (`start_dt`, `end_dt`) pair must be provided."
            raise ValueError(msg)

        return len(tuple(self._splits_from_period(time_start, time_end)))

forecast_delta: timedelta property

Returns the timedelta object corresponding to the forecast_horizon.

gap_delta: timedelta property

Returns the timedelta object corresponding to the gap and frequency.

stride_delta: timedelta property

Returns the timedelta object corresponding to stride.

train_delta: timedelta property

Returns the timedelta object corresponding to the train_size.

__repr__()

Custom repr method.

Source code in timebasedcv/core.py
def __repr__(self: Self) -> str:
    """Custom repr method."""
    _attrs = (
        "frequency_",
        "train_size_",
        "forecast_horizon_",
        "gap_",
        "stride_",
        "window_",
    )
    _values = tuple(getattr(self, _attr) for _attr in _attrs)
    _new_line_tab = "\n    "

    return f"{self.name_}" "(\n    " f"{_new_line_tab.join(f'{s} = {v}' for s, v in zip(_attrs, _values))}" "\n)"

n_splits_of(*, time_series=None, start_dt=None, end_dt=None)

Returns the number of splits that can be generated from time_series.

Parameters:

Name Type Description Default
time_series Union[SeriesLike[DateTimeLike], None]

A time series data. If provided it should support .min() and `.max().

None
start_dt NullableDatetime

The start date and time of the time series. If not provided, it will be inferred from time_series.

None
end_dt NullableDatetime

The end date and time of the time series. If not provided, it will be inferred from time_series.

None

Returns:

Type Description
int

The number of splits that can be generated from the given time series.

Raises:

Type Description
ValueError
  • If both start_dt and end_dt are provided and start_dt is greater than or equal to end_dt.
  • If neither time_series nor (start_dt, end_dt) pair is provided.
Source code in timebasedcv/core.py
def n_splits_of(
    self: Self,
    *,
    time_series: Union[SeriesLike[DateTimeLike], None] = None,
    start_dt: NullableDatetime = None,
    end_dt: NullableDatetime = None,
) -> int:
    """Returns the number of splits that can be generated from `time_series`.

    Arguments:
        time_series: A time series data. If provided it should support `.min()` and `.max().
        start_dt: The start date and time of the time series. If not provided, it will be inferred from
            `time_series`.
        end_dt: The end date and time of the time series. If not provided, it will be inferred from
            `time_series`.

    Returns:
        The number of splits that can be generated from the given time series.

    Raises:
        ValueError:
            - If both `start_dt` and `end_dt` are provided and `start_dt` is greater than or equal to `end_dt`.
            - If neither `time_series` nor (`start_dt`, `end_dt`) pair is provided.
    """
    if (start_dt is not None) and (end_dt is not None):
        if start_dt >= end_dt:
            msg = "`start_dt` must be before `end_dt`."
            raise ValueError(msg)
        else:
            time_start, time_end = start_dt, end_dt
    elif time_series is not None:
        time_start, time_end = time_series.min(), time_series.max()
    else:
        msg = "Either `time_series` or (`start_dt`, `end_dt`) pair must be provided."
        raise ValueError(msg)

    return len(tuple(self._splits_from_period(time_start, time_end)))