Typing overview¶

PydanTable supports two end-user strategies for DataFrameModel static typing, plus a third checker used to validate the library itself:

Strategy	Checkers	Schema-evolving chains
Inferred chains	mypy with `pydantable.mypy_plugin`	Return types refine from literals / conservative plugin rules.
Explicit after-model	Pyright, Pylance, Astral `ty`, and any checker without the plugin	Shipped `.pyi` stubs; after a transform, use `as_model(...)` / `try_as_model(...)` / `assert_model(...)`.

Astral ty does not load mypy plugins. For application code type-checked with ty, treat it like Pyright/Pylance: use the explicit after-model pattern, not plugin inference.

The pydantable repo runs ty check on first-party trees in CI (make check-python). That validates annotations and APIs; it is not a substitute for running mypy with the plugin in your project if you rely on inferred chains.

This page consolidates the typing story and links to the relevant contracts.

The typing contract (nominal model, derived row type, structural helpers)¶

Nominal table type: users name subclasses of DataFrameModel (for example class Users(DataFrameModel): ...).
Row type is derived: each DataFrameModel subclass generates a per-row Pydantic model exposed as Users.RowModel.
Generics are for relationships / helpers: for cross-model helpers, prefer structural typing rather than pretending DataFrameModel[Row] “is” a particular subclass.

Structural helper types (`pydantable.typing`)¶

For reusable helpers that accept any model with a given row type, use the Protocol:

from pydantable.typing import DataFrameModelWithRow

def materialize_rows(m: DataFrameModelWithRow[RowT]) -> list[RowT]:
    return m.rows()

`SupportsLazyAsyncMaterialize` (async `acollect`)¶

Use DataFrameModelWithRow[RowT] when the helper needs sync row APIs (rows, collect, …) tied to a known RowModel.
Use SupportsLazyAsyncMaterialize[Any] (or parameterize RowT if you do) when the helper only awaits acollect and must accept both a concrete DataFrameModel and a lazy AwaitableDataFrameModel (for example after aread_* or chained select / filter / …).

SupportsLazyAsyncMaterialize describes the acollect contract. It does not include sync collect: synchronous APIs should take DataFrameModel (or a subclass) instead.

The core DataFrame type also implements a compatible acollect; static typing treats the protocol as structural, so anything with a matching acollect is a candidate.

Deprecation (2.0): avoid acollect(..., as_polars=...) and collect(..., as_polars=...); they emit DeprecationWarning and will be removed in pydantable 2.0. Prefer ato_polars() / to_polars(), or columnar to_dict() / collect(as_lists=True). See VERSIONING (planned removals).

Example — shared async materialization

from typing import Any

from pydantable.typing import SupportsLazyAsyncMaterialize


async def materialize_async(m: SupportsLazyAsyncMaterialize[Any]) -> Any:
    return await m.acollect()

Example — endpoint or callback (caller passes either UserDF(...) or UserDF.aread_parquet(...) then transforms)

from typing import Any

from pydantable.typing import SupportsLazyAsyncMaterialize


async def handle(m: SupportsLazyAsyncMaterialize[Any]) -> Any:
    return await m.acollect()

At runtime, SupportsLazyAsyncMaterialize is @runtime_checkable, so isinstance(x, SupportsLazyAsyncMaterialize) succeeds when x has a callable acollect (duck typing). That check does not validate coroutine return types or argument kinds; use mypy, Pyright, or ty for that.

Static checkers: Stubs may not list every lazy aread_* classmethod on each DataFrameModel subclass. If mypy (no plugin), Pyright/Pylance, or ty complains on MyModel.aread_parquet(...), assign via typing.cast(SupportsLazyAsyncMaterialize[Any], MyModel.aread_parquet(...)), bind _aread = MyModel.aread_parquet # type: ignore[attr-defined], or enable the pydantable mypy plugin (mypy only) where applicable.

Pyright, Pylance, and Astral `ty` (explicit after-model)¶

Pyright, Pylance, and Astral ty cannot apply the mypy plugin, so they follow the same stub-based pattern: chained transforms are loosely typed until you assert an after-model. The examples below say “Pyright”; use the identical as_model / try_as_model / assert_model workflow with ty check on your project.

Pyright cannot express dependent “schema evolution” from transform chains, so the ergonomic pattern is:

from pydantable import DataFrameModel

class Before(DataFrameModel):
    id: int
    age: int

class After(DataFrameModel):
    id: int
    age2: int

def pipeline(df: Before) -> After:
    out = df.with_columns(age2=df.age * 2).select("id", "age2")
    return out.as_model(After)

Safer variants:

try_as_model(After) returns After | None on mismatch (no exception).
assert_model(After) raises with a richer schema diff (missing/extra/type mismatches).

Typed escape hatches for aggregations (Pyright / `ty`)¶

Some operations are inherently schema-changing (for example grouped aggregations and rolling aggregations). For Pyright/ty users, prefer the explicit *_as_model helpers so the return type is your declared after-model.

from pydantable import DataFrameModel

class Events(DataFrameModel):
    g: int
    v: int
    ts: str

class ByGroup(DataFrameModel):
    g: int
    total: int

class WithRolling(DataFrameModel):
    ts: str
    roll: int

def grouped(df: Events) -> ByGroup:
    return df.group_by("g").agg_as_model(ByGroup, total=("sum", "v"))

def rolling(df: Events) -> WithRolling:
    return df.rolling_agg_as_model(
        WithRolling,
        on="ts",
        column="v",
        window_size=3,
        op="sum",
        out_name="roll",
    )

Typed escape hatches for schema-changing transforms (Pyright / `ty`)¶

For other deterministic schema-changing transforms, use the dedicated helpers:

melt_as_model(...) / melt_try_as_model(...) / melt_assert_model(...)
unpivot_as_model(...) / unpivot_try_as_model(...) / unpivot_assert_model(...)
join_as_model(...) / join_try_as_model(...) / join_assert_model(...)

mypy workflow (plugin-based inference)¶

If you use Astral ty or Pyright on your project instead of mypy, use the explicit after-model section above — the plugin applies only to mypy.

Enabling the plugin¶

Add the plugin to your mypy config:

[tool.mypy]
plugins = ["pydantable.mypy_plugin"]

What the plugin can infer¶

Inference is intentionally conservative: it refines return types when arguments are literal enough.

Schema-evolving transforms (when literal column names / literal config are provided):
with_columns(...) (best-effort type inference from mypy’s expression types + literals)
select(...), drop(...) (string/list/tuple literals)
rename({...}) (dict literal)
join(..., on=..., suffix=...)
group_by(...).agg(out=("op","col"), ...) (tuple literals; some ops map to int/float)
melt(...), unpivot(...) (literal id_vars/index, plus literal variable_name/value_name)
rolling_agg(..., op=..., out_name=...)
Schema-preserving transforms (kept as the same model type):
fill_null, drop_nulls, explode, unnest
Not inferred / intentionally skipped:
dynamic/computed column name lists (variables, comprehensions, f-strings, unpacking)
pivot(...) (output columns depend on data values)

When the plugin can’t infer safely, it falls back to the original model type (and you can still use as_model(...)).

1.2.0 column types (Literal, IP, WKB, `Annotated[str, ...]`)¶

These scalars are ordinary fields on your DataFrameModel subclass: the plugin still matches transform outputs by field name and static field type from the class body (Literal[...], ipaddress classes, WKB, and plain or Annotated strings show up in mypy’s analysis like int / str).

Users without the mypy plugin (Pyright, Pylance, ty, and so on) keep the same workflow as other scalars: chained methods are typed as DataFrameModel[Any] in stubs, so use as_model(After) / try_as_model / assert_model when you need an explicit After type after select / with_columns / rename.

Contract coverage lives in:

tests/test_extended_scalar_dtypes_v12.py (runtime + schema helpers)
tests/test_typing_engine_parity.py (Rust plan descriptors vs runtime schema_fields)
tests/test_mypy_dataframe_model_return_types.py (test_mypy_accepts_literal_ip_wkb_...)
tests/test_pyright_dataframe_model_return_types.py (test_pyright_accepts_literal_ip_wkb_...)

Stubs and drift prevention¶

PydanTable ships py.typed and .pyi stubs for the public surface. In the repo:

scripts/generate_typing_artifacts.py regenerates committed typing artifacts.
scripts/generate_typing_artifacts.py --check fails if stubs are out of date.
make check-typing runs: generator drift check → ty → typing snippet tests.

Contributor workflow (static typing)¶

Which checker does what¶

Tool	Role
Astral `ty`	Primary checker for `python/pydantable`, `pydantable-protocol`, and `pydantable-native` (see `[tool.ty]` in `pyproject.toml`). Used in `make check-python` / CI. No mypy plugins — for `DataFrameModel`, it matches the stub + `as_model` story (same as Pyright), not plugin inference.
mypy + `pydantable.mypy_plugin`	Optional schema-evolving `DataFrameModel` chains for mypy users; run via `tests/test_mypy_.py` or `mypy` with the repo config. `tests.` is ignored by mypy by design.
Pyright	Narrow config (`pyrightconfig.json`) targets typing contract tests under `tests/` plus `typings/`. Same explicit-`as_model` contract as `ty` for app code. Optional `pyrightconfig-strict.json` type-checks the full `python/pydantable` tree for maintainers (`make pyright-check-strict`); expect noise and optional deps.

Public vs internal API (pragmatic `Any`)¶

Public surface — imports from pydantable, pydantable.dataframe, and documented I/O helpers: prefer concrete types, Protocols, PathLike, Mapping/Sequence, and TYPE_CHECKING imports for types that would create cycles.
Internal modules — engine plans, Rust handles, and dynamic adapters may keep Any where the runtime type is opaque or checker-specific; narrow with NewType / small Protocols only when it reduces real bugs without lying.

Policy: `typing.Any` must be justified¶

Any disables static checking for that value. Do not use it because a precise type is merely inconvenient. Prefer, in order:

Concrete types (str, Path, bytes, models, DataFrame[...]).
TypeVar / Generic when the same function is polymorphic in a known way.
Protocol for structural APIs (duck typing with a name).
object when the code only needs identity, repr, or a few isinstance branches (unknown cell values, opaque scan objects).
Mapping[str, ...] / Sequence[...] instead of untyped dict / list.

When Any is justified (document the category in PRs for new hotspots; legacy code is covered by the table below):

Category	Where it shows up	Why `Any` instead of lying
Opaque Rust / PyO3	`rust_engine.py`, `pydantable_native/*`, plan handles, `execute_plan(plan, data, …)`	Runtime objects are defined in Rust; Python sees untyped or generated bindings. Mirrors `ExecutionEngine` in `pydantable_protocol` (which uses `Any` for plan/data until a portable IR exists).
Optional / heavy deps	`io/extras.py`, SQL, Kafka, cloud clients	Third-party libraries may be absent or thinly stubbed; signatures stay permissive at boundaries.
Dynamic adapters	`pandas.py`, `pyspark/*`, plugin surfaces	APIs mimic other ecosystems; parameters are intentionally wide.
Schema / Pydantic internals	`schema/_impl.py`, `dataframe_model.py`	`TypeAdapter`, `create_model`, and validation hooks use dynamic types from Pydantic.
Mypy plugin	`mypy_plugin.py`	Operates on mypy’s internal IR (`Any` is required by the plugin API).
Public “column dict”	`dict[str, list[Any]]` for materialized columns	Column element types vary by dtype; a precise `Union` would be enormous and still incomplete. Prefer documenting invariants in `SUPPORTED_TYPES.md`.
Explicit escape hatch	Rare	Only with a short comment at the definition site: why a `Protocol` or `TypeVar` is not yet possible (e.g. circular import, pending refactor).

Review rule: If you add new Any on a public symbol, add a sentence in the docstring or link to this section. If you can use object or a Protocol without changing behavior, use that instead.

Phased strictness (`ty`)¶

[tool.ty.rules] in pyproject.toml enables some rule families gradually. Currently enforced at error (where clean): unknown-argument, invalid-argument-type, invalid-return-type, unsupported-operator. Others (for example not-iterable) stay ignore until Astral ty handles async generators and async for over imported helpers without false positives. When tightening a rule, fix callsites or add a narrow suppression with a short comment; prefer fixing types over broad [tool.ty.analysis] overrides.

Hotspots for future annotation work (rough counts in python/pydantable/**/*.py, subject to churn): Any appears most often in dataframe/_impl.py, dataframe_model.py, pandas.py, rust_engine.py, and io/__init__.py; # type: ignore is concentrated in io/extras.py, pandas.py, and schema/_impl.py; cast( is common in pyspark/dataframe.py and dataframe_model.py.

Local commands¶

make ty-check              # Astral ty (matches main CI config)
make ty-check-minimal      # ty in a minimal venv (optional imports stay sound)
make check-typing          # stub drift check + ty + mypy/pyright contract tests
make pyright-check-strict  # optional full-package Pyright (see pyrightconfig-strict.json)

Environment notes¶

Use a single-Python virtualenv for local runs (for example only lib/python3.10 under .venv). If ty reports unresolved imports for core deps like pydantic, recreate the venv or align the interpreter ty resolves with the one that has your dependencies installed (make ty-check-minimal uses a dedicated minimal venv).

DATAFRAMEMODEL.md: end-user guide with typing examples.
SUPPORTED_TYPES.md: dtype/nullable contract and per-method Expr rules (what dtypes each method accepts, null behavior, Polars vs stub execution).
INTERFACE_CONTRACT.md: engine capabilities (Polars-backed vs row-wise stub).
TROUBLESHOOTING.md: common typing pitfalls.

Typing overview¶

The typing contract (nominal model, derived row type, structural helpers)¶

Structural helper types (pydantable.typing)¶

SupportsLazyAsyncMaterialize (async acollect)¶

Pyright, Pylance, and Astral ty (explicit after-model)¶

Typed escape hatches for aggregations (Pyright / ty)¶

Typed escape hatches for schema-changing transforms (Pyright / ty)¶