anyschema: From Type Specifications to Dataframe Schemas¶
anyschema is a Python library that enables conversions from type specifications (such as Pydantic models) to native
dataframe schemas (such as PyArrow, Polars, and Pandas).
Development Status
anyschema is still in early development and possibly unstable.
Installation¶
anyschema is available on pypi, and it can be installed directly via
any package manager. For instance:
We suggest to install also pydantic or attrs to follow along with the examples.
anyschemainteroperability with pydantic models requirespydantic>=2.0.0.anyschemainteroperability with attrs classes requiresattrs>=24.0.0.
Quick Start¶
Here's a simple example showing how anyschema works.
First define the type specification via a Pydantic model and create an AnySchema instance from it:
from anyschema import AnySchema
from pydantic import BaseModel, PositiveInt
class Student(BaseModel):
name: str
age: PositiveInt # (1)
classes: list[str]
schema = AnySchema(spec=Student)
- By using
PositiveIntinstead of pythonint, we ensure that the age is always a positive integer. And we can translate this constraint into the resulting dataframe schema.
Then, you can convert it to different dataframe schemas:
When to use anyschema¶
anyschema is designed for scenarios where some type specifications (e.g. a Pydantic Model) want to be used as a single
source of truth for both validation and (dataframe) schema generation.
The typical use cases are: Data pipelines, API to database workflows, schema generation, type-safe data processing.
Key Features¶
- Multiple Input Formats: Support for Pydantic models, attrs classes, TypedDict, dataclasses, Python mappings and sequence of field specifications.
- Multiple Output Formats: Convert to PyArrow, Polars, or Pandas schemas.
- Modular Architecture: Extensible parser pipeline for custom type handling.
- Rich Type Support: Handles complex types including Optional, Union, List, nested structures, Pydantic-specific types, and attrs classes.
- Narwhals Integration: Leverages Narwhals as the intermediate representation.
Core Components¶
Type Parsers (Steps)¶
Parser steps are modular components that convert type annotations to Narwhals dtypes. Each parser handles specific type patterns:
ForwardRefStep: Resolves forward references.UnionTypeStep: HandlesUnionandOptionaltypes.AnnotatedStep: Extracts metadata fromtyping.Annotated.AnnotatedTypesStep: Refines types based on constraints from theannotated-typeslibrary.PydanticTypeStep: Handles Pydantic-specific types.AttrsTypeStep: Handles attrs classes.PyTypeStep: Handles basic Python types (fallback).
Learn more about how these work together in the Architecture section.
Spec Adapters¶
Adapters convert input specifications into a common format that the parser pipeline can process:
into_ordered_dict_adapter: Handles Python dicts and sequences.typed_dict_adapter: Extracts field information from TypedDict classes.dataclass_adapter: Extracts field information from dataclasses.attrs_adapter: Extracts field information from attrs classes.pydantic_adapter: Extracts field information from Pydantic models.
See the API Reference for detailed documentation.
Next Steps¶
Learning Path¶
We recommend following this order to get the most out of anyschema:
- Getting Started: Learn the basics.
- Architecture: Understand the internal design and how components work together.
- Advanced Usage: Create custom parser steps and adapters for your specific needs.
- Best Practices: Learn patterns and anti-patterns for custom components.
- End-to-End Example: See a complete real-world example.
Reference Materials¶
- API Reference: Complete API documentation.
- Troubleshooting: Common issues and solutions.
Contributing¶
Contributions are welcome! Please check out the GitHub repository to get started.
License¶
This project is licensed under the Apache-2.0 license.
Why anyschema?¶
The project was inspired by a Talk Python podcast episode featuring the creator of LanceDB, who mentioned the need to convert from Pydantic models to PyArrow schemas.
This challenge led to a realization: such conversion could be generalized to many dataframe libraries by using Narwhals
as an intermediate representation. anyschema makes this conversion seamless and extensible.