`timebasedcv.core`¶

timebasedcv.core._CoreTimeBasedSplit ¶

Base class for time based splits. This class is not meant to be used directly.

_CoreTimeBasedSplit implements all the logics to set up a time based splits class.

In particular it implements _splits_from_period which is used to generate splits from a given time period (from start to end dates) from the given arguments of the class (frequency, train_size, forecast_horizon, gap, stride and window type).

Parameters:

Name	Type	Description	Default
`frequency`	`FrequencyUnit`	The frequency (or time unit) of the time series. Must be one of "days", "seconds", "microseconds", "milliseconds", "minutes", "hours", "weeks", "months" or "years". These are the valid values for the `unit` argument of `relativedelta` from python `dateutil` library.	required
`train_size`	`int`	Defines the minimum number of time units required to be in the train set.	required
`forecast_horizon`	`int`	Specifies the number of time units to forecast.	required
`gap`	`int`	Sets the number of time units to skip between the end of the train set and the start of the forecast set.	`0`
`stride`	`int \| None`	How many time unit to move forward after each split. If `None` (or set to 0), the stride is equal to the `forecast_horizon` quantity.	`None`
`window`	`WindowType`	The type of window to use, either "rolling" or "expanding".	`'rolling'`
`mode`	`ModeType`	Determines in which orders the splits are generated, either "forward" (start to end) or "backward" (end to start).	`'forward'`

Raises:

Type	Description
`ValueError`	If `frequency` is not one of "days", "seconds", "microseconds", "milliseconds", "minutes", "hours", "weeks". If `window` is not one of "rolling" or "expanding". If `mode` is not one of "forward" or "backward" If `train_size`, `forecast_horizon`, `gap` or `stride` are not strictly positive.
`TypeError`	If `train_size`, `forecast_horizon`, `gap` or `stride` are not of type `int`.

Although _CoreTimeBasedSplit is not meant to be used directly, it can be used as a template to create new time based splits classes.

Examples:

from timebasedcv.core import _CoreTimeBasedSplit


class MyTimeBasedSplit(_CoreTimeBasedSplit):
    ...

    def split(self, X, timeseries):
        '''Implement the split method to return a generator'''

        for split in self._splits_from_period(timeseries.min(), timeseries.max()):
            # Do something with the split to compute the train and forecast sets
            ...
            yield X_train, y_test

Source code in timebasedcv/core.py

class _CoreTimeBasedSplit:
    """Base class for time based splits. This class is not meant to be used directly.

    `_CoreTimeBasedSplit` implements all the logics to set up a time based splits class.

    In particular it implements `_splits_from_period` which is used to generate splits from a given time period (from
    start to end dates) from the given arguments of the class (frequency, train_size, forecast_horizon, gap, stride and
    window type).

    Arguments:
        frequency: The frequency (or time unit) of the time series. Must be one of "days", "seconds", "microseconds",
            "milliseconds", "minutes", "hours", "weeks", "months" or "years". These are the valid values for the
            `unit` argument of `relativedelta` from python `dateutil` library.
        train_size: Defines the minimum number of time units required to be in the train set.
        forecast_horizon: Specifies the number of time units to forecast.
        gap: Sets the number of time units to skip between the end of the train set and the start of the forecast set.
        stride: How many time unit to move forward after each split. If `None` (or set to 0), the stride is equal to the
            `forecast_horizon` quantity.
        window: The type of window to use, either "rolling" or "expanding".
        mode: Determines in which orders the splits are generated, either "forward" (start to end) or "backward"
            (end to start).

    Raises:
        ValueError:
            - If `frequency` is not one of "days", "seconds", "microseconds", "milliseconds", "minutes", "hours",
            "weeks".
            - If `window` is not one of "rolling" or "expanding".
            - If `mode` is not one of "forward" or "backward"
            - If `train_size`, `forecast_horizon`, `gap` or `stride` are not strictly positive.
        TypeError: If `train_size`, `forecast_horizon`, `gap` or `stride` are not of type `int`.

    Although `_CoreTimeBasedSplit` is not meant to be used directly, it can be used as a template to create new time
    based splits classes.

    Examples:
        ```python
        from timebasedcv.core import _CoreTimeBasedSplit


        class MyTimeBasedSplit(_CoreTimeBasedSplit):
            ...

            def split(self, X, timeseries):
                '''Implement the split method to return a generator'''

                for split in self._splits_from_period(timeseries.min(), timeseries.max()):
                    # Do something with the split to compute the train and forecast sets
                    ...
                    yield X_train, y_test
        ```
    """

    def __init__(  # noqa: PLR0913
        self: Self,
        *,
        frequency: FrequencyUnit,
        train_size: int,
        forecast_horizon: int,
        gap: int = 0,
        stride: int | None = None,
        window: WindowType = "rolling",
        mode: ModeType = "forward",
    ) -> None:
        self.frequency_ = frequency
        self.train_size_ = train_size
        self.forecast_horizon_ = forecast_horizon
        self.gap_ = gap
        self.stride_ = stride or forecast_horizon
        self.window_ = window
        self.mode_ = mode

        self._validate_arguments()

    def _validate_arguments(self: Self) -> None:
        """Post init used to validate the TimeSpacedSplit attributes."""
        # Validate frequency
        if self.frequency_ not in _frequency_values:
            msg = f"`frequency` must be one of {_frequency_values}. Found {self.frequency_}"
            raise ValueError(msg)

        # Validate window
        if self.window_ not in _window_values:
            msg = f"`window` must be one of {_window_values}. Found {self.window_}"
            raise ValueError(msg)

        # Validate mode
        if self.mode_ not in _mode_values:
            msg = f"`mode` must be one of {_mode_values}. Found {self.mode_}"
            raise ValueError(msg)

        # Validate positive integer arguments
        _slot_names = ("train_size_", "forecast_horizon_", "gap_", "stride_")
        _values = tuple(getattr(self, _attr) for _attr in _slot_names)
        _lower_bounds = (1, 1, 0, 1)

        _types = tuple(type(v) for v in _values)

        if not all(t is int for t in _types):
            msg = (
                f"(`{'`, `'.join(_slot_names)}`) arguments must be of type `int`. "
                f"Found (`{'`, `'.join(str(t) for t in _types)}`)"
            )
            raise TypeError(msg)

        if not all(v >= lb for v, lb in zip(_values, _lower_bounds)):
            msg = (
                f"(`{'`, `'.join(_slot_names)}`) must be greater or equal than "
                f"({', '.join(map(str, _lower_bounds))}).\n"
                f"Found ({', '.join(str(v) for v in _values)})"
            )
            raise ValueError(msg)

    @property
    def name_(self: Self) -> str:
        return self.__class__.__name__

    def __repr__(self: Self) -> str:
        """Custom repr method."""
        _attrs = (
            "frequency_",
            "train_size_",
            "forecast_horizon_",
            "gap_",
            "stride_",
            "window_",
        )
        _values = tuple(getattr(self, _attr) for _attr in _attrs)
        _new_line_tab = "\n    "

        return f"{self.name_}(\n    {_new_line_tab.join(f'{s} = {v}' for s, v in zip(_attrs, _values))}\n)"

    @property
    def train_delta(self: Self) -> relativedelta:
        """Returns the `relativedelta` object corresponding to the `train_size`."""
        return relativedelta(**{str(self.frequency_): self.train_size_})  # type: ignore[arg-type]

    @property
    def forecast_delta(self: Self) -> relativedelta:
        """Returns the `relativedelta` object corresponding to the `forecast_horizon`."""
        return relativedelta(**{str(self.frequency_): self.forecast_horizon_})  # type: ignore[arg-type]

    @property
    def gap_delta(self: Self) -> relativedelta:
        """Returns the `relativedelta` object corresponding to the `gap` and `frequency`."""
        return relativedelta(**{str(self.frequency_): self.gap_})  # type: ignore[arg-type]

    @property
    def stride_delta(self: Self) -> relativedelta:
        """Returns the `relativedelta` object corresponding to `stride`."""
        return relativedelta(**{str(self.frequency_): self.stride_})  # type: ignore[arg-type]

    def _splits_from_period(
        self: Self,
        time_start: DateTimeLike,
        time_end: DateTimeLike,
    ) -> Generator[SplitState, None, None]:
        """Generate splits from `time_start` to `time_end` based on the parameters passed to the class instance.

        This is the core iteration that generates splits. It is used by the `split` method to generate splits from the
        time series.

        Arguments:
            time_start: The start of the time period.
            time_end: The end of the time period.

        Returns:
            A generator of `SplitState` instances.
        """
        if time_start >= time_end:
            msg = "`time_start` must be before `time_end`."
            raise ValueError(msg)

        is_rolling_window = self.window_ == "rolling"

        if self.mode_ == "forward":
            train_delta = self.train_delta
            forecast_delta = self.forecast_delta
            gap_delta = self.gap_delta
            stride_delta = self.stride_delta

            train_start = time_start
            train_end = time_start + train_delta
            forecast_start = train_end + gap_delta
            forecast_end = forecast_start + forecast_delta

        else:
            train_delta = -self.train_delta
            forecast_delta = -self.forecast_delta
            gap_delta = -self.gap_delta
            stride_delta = -self.stride_delta

            forecast_end = time_end
            forecast_start = forecast_end + forecast_delta
            train_end = forecast_start + gap_delta
            train_start = train_end + train_delta if is_rolling_window else time_start

        while (forecast_start <= time_end) and (train_start >= time_start) and (train_start <= train_end + train_delta):
            yield SplitState(train_start, train_end, forecast_start, forecast_end)

            # Update state values
            train_start = train_start + stride_delta if is_rolling_window else train_start
            train_end = train_end + stride_delta
            forecast_start = forecast_start + stride_delta
            forecast_end = forecast_end + stride_delta

    def n_splits_of(
        self: Self,
        *,
        time_series: SeriesLike[DateTimeLike] | None = None,
        start_dt: NullableDatetime = None,
        end_dt: NullableDatetime = None,
    ) -> int:
        """Returns the number of splits that can be generated from `time_series`.

        Arguments:
            time_series: A time series data. If provided it should support `.min()` and `.max().
            start_dt: The start date and time of the time series. If not provided, it will be inferred from
                `time_series`.
            end_dt: The end date and time of the time series. If not provided, it will be inferred from
                `time_series`.

        Returns:
            The number of splits that can be generated from the given time series.

        Raises:
            ValueError:
                - If both `start_dt` and `end_dt` are provided and `start_dt` is greater than or equal to `end_dt`.
                - If neither `time_series` nor (`start_dt`, `end_dt`) pair is provided.
        """
        if (start_dt is not None) and (end_dt is not None):
            if start_dt >= end_dt:
                msg = "`start_dt` must be before `end_dt`."
                raise ValueError(msg)
            else:
                time_start, time_end = start_dt, end_dt
        elif time_series is not None:
            time_start, time_end = time_series.min(), time_series.max()
        else:
            msg = "Either `time_series` or (`start_dt`, `end_dt`) pair must be provided."
            raise ValueError(msg)

        return len(tuple(self._splits_from_period(time_start, time_end)))

forecast_delta `property` ¶

forecast_delta: relativedelta

Returns the relativedelta object corresponding to the forecast_horizon.

gap_delta `property` ¶

gap_delta: relativedelta

Returns the relativedelta object corresponding to the gap and frequency.

stride_delta `property` ¶

stride_delta: relativedelta

Returns the relativedelta object corresponding to stride.

train_delta `property` ¶

train_delta: relativedelta

Returns the relativedelta object corresponding to the train_size.

repr ¶

__repr__() -> str

Custom repr method.

Source code in timebasedcv/core.py

def __repr__(self: Self) -> str:
    """Custom repr method."""
    _attrs = (
        "frequency_",
        "train_size_",
        "forecast_horizon_",
        "gap_",
        "stride_",
        "window_",
    )
    _values = tuple(getattr(self, _attr) for _attr in _attrs)
    _new_line_tab = "\n    "

    return f"{self.name_}(\n    {_new_line_tab.join(f'{s} = {v}' for s, v in zip(_attrs, _values))}\n)"

n_splits_of ¶

n_splits_of(*, time_series: SeriesLike[DateTimeLike] | None = None, start_dt: NullableDatetime = None, end_dt: NullableDatetime = None) -> int

Returns the number of splits that can be generated from time_series.

Parameters:

Name	Type	Description	Default
`time_series`	`SeriesLike[DateTimeLike] \| None`	A time series data. If provided it should support `.min()` and `.max().	`None`
`start_dt`	`NullableDatetime`	The start date and time of the time series. If not provided, it will be inferred from `time_series`.	`None`
`end_dt`	`NullableDatetime`	The end date and time of the time series. If not provided, it will be inferred from `time_series`.	`None`

Returns:

Type	Description
`int`	The number of splits that can be generated from the given time series.

Raises:

Type	Description
`ValueError`	If both `start_dt` and `end_dt` are provided and `start_dt` is greater than or equal to `end_dt`. If neither `time_series` nor (`start_dt`, `end_dt`) pair is provided.

Source code in timebasedcv/core.py

def n_splits_of(
    self: Self,
    *,
    time_series: SeriesLike[DateTimeLike] | None = None,
    start_dt: NullableDatetime = None,
    end_dt: NullableDatetime = None,
) -> int:
    """Returns the number of splits that can be generated from `time_series`.

    Arguments:
        time_series: A time series data. If provided it should support `.min()` and `.max().
        start_dt: The start date and time of the time series. If not provided, it will be inferred from
            `time_series`.
        end_dt: The end date and time of the time series. If not provided, it will be inferred from
            `time_series`.

    Returns:
        The number of splits that can be generated from the given time series.

    Raises:
        ValueError:
            - If both `start_dt` and `end_dt` are provided and `start_dt` is greater than or equal to `end_dt`.
            - If neither `time_series` nor (`start_dt`, `end_dt`) pair is provided.
    """
    if (start_dt is not None) and (end_dt is not None):
        if start_dt >= end_dt:
            msg = "`start_dt` must be before `end_dt`."
            raise ValueError(msg)
        else:
            time_start, time_end = start_dt, end_dt
    elif time_series is not None:
        time_start, time_end = time_series.min(), time_series.max()
    else:
        msg = "Either `time_series` or (`start_dt`, `end_dt`) pair must be provided."
        raise ValueError(msg)

    return len(tuple(self._splits_from_period(time_start, time_end)))

timebasedcv.core¶