Skip to content

anyschema: From Type Specifications to Dataframe Schemas

anyschema is a Python library that enables conversions from type specifications (such as Pydantic models) to native dataframe schemas (such as PyArrow, Polars, and Pandas).

Development Status

anyschema is still in early development and possibly unstable.

Installation

anyschema is available on pypi, and it can be installed directly via any package manager. For instance:

python -m pip install anyschema
uv pip install anyschema

We suggest to install also pydantic or attrs to follow along with the examples.

  • anyschema interoperability with pydantic models requires pydantic>=2.0.0.
  • anyschema interoperability with attrs classes requires attrs>=24.0.0.
python -m pip install "anyschema[pydantic]"
# or
python -m pip install "anyschema[attrs]"
uv pip install "anyschema[pydantic]"
# or
uv pip install "anyschema[attrs]"

Quick Start

Here's a simple example showing how anyschema works.

First define the type specification via a Pydantic model and create an AnySchema instance from it:

from anyschema import AnySchema
from pydantic import BaseModel, PositiveInt


class Student(BaseModel):
    name: str
    age: PositiveInt  # (1)
    classes: list[str]


schema = AnySchema(spec=Student)
  1. By using PositiveInt instead of python int, we ensure that the age is always a positive integer. And we can translate this constraint into the resulting dataframe schema.

Then, you can convert it to different dataframe schemas:

pa_schema = schema.to_arrow()
print(pa_schema)
name: string
age: uint64
classes: list<item: string>
  child 0, item: string
pl_schema = schema.to_polars()
print(pl_schema)
Schema({'name': String, 'age': UInt64, 'classes': List(String)})
pd_schema = schema.to_pandas()
print(pd_schema)
{'name': <class 'str'>, 'age': 'uint64', 'classes': list<item: string>[pyarrow]}

When to use anyschema

anyschema is designed for scenarios where some type specifications (e.g. a Pydantic Model) want to be used as a single source of truth for both validation and (dataframe) schema generation.

The typical use cases are: Data pipelines, API to database workflows, schema generation, type-safe data processing.

Key Features

  • Multiple Input Formats: Support for Pydantic models, attrs classes, TypedDict, dataclasses, Python mappings and sequence of field specifications.
  • Multiple Output Formats: Convert to PyArrow, Polars, or Pandas schemas.
  • Modular Architecture: Extensible parser pipeline for custom type handling.
  • Rich Type Support: Handles complex types including Optional, Union, List, nested structures, Pydantic-specific types, and attrs classes.
  • Narwhals Integration: Leverages Narwhals as the intermediate representation.

Core Components

Type Parsers (Steps)

Parser steps are modular components that convert type annotations to Narwhals dtypes. Each parser handles specific type patterns:

Learn more about how these work together in the Architecture section.

Spec Adapters

Adapters convert input specifications into a common format that the parser pipeline can process:

See the API Reference for detailed documentation.

Next Steps

Learning Path

We recommend following this order to get the most out of anyschema:

  1. Getting Started: Learn the basics.
  2. Architecture: Understand the internal design and how components work together.
  3. Advanced Usage: Create custom parser steps and adapters for your specific needs.
  4. Best Practices: Learn patterns and anti-patterns for custom components.
  5. End-to-End Example: See a complete real-world example.

Reference Materials

Contributing

Contributions are welcome! Please check out the GitHub repository to get started.

License

This project is licensed under the Apache-2.0 license.

Why anyschema?

The project was inspired by a Talk Python podcast episode featuring the creator of LanceDB, who mentioned the need to convert from Pydantic models to PyArrow schemas.

This challenge led to a realization: such conversion could be generalized to many dataframe libraries by using Narwhals as an intermediate representation. anyschema makes this conversion seamless and extensible.