Mental model¶

This page is the “map” of pydantable: the core concepts and how they relate.

If you’re new, read this once, then jump to the Five-minute tour or DataFrameModel.

The four core nouns¶

`Schema` (row shape)¶

Schema is a Pydantic-style row model used to describe the shape and types of a table.

Used with DataFrame[RowSchema]
Used as an output type for materialized rows (e.g. collect())

See: DataFrameModel (it covers Schema as part of the typed table story).

`DataFrame[T]` (typed table)¶

DataFrame[T] is a typed table whose columns match the schema T.

You can create a DataFrame from columnar data, from I/O helpers, or from optional engine-specific sources.

Start here:

`Expr` (typed expressions)¶

Expr is how you refer to and compute columns (e.g., df.score > 8.0).

Expressions are designed to remain type-aware so transforms can stay typed and composable.

Start here:

`DataFrameModel` (a “table model”)¶

DataFrameModel is the higher-level “SQLModel-style” concept: a reusable typed table definition with methods for:

ingest/validation rules
typed transforms
structured I/O entrypoints

If you’re building real pipelines (especially in services), this is usually the best place to start.

Start here: DataFrameModel

Execution: plans and materialization¶

pydantable is designed so that many operations can remain lazy until you explicitly choose to materialize.

Two key pages:

Execution: what runs when, and what costs what
Materialization: the “how” of turning a plan into output

Materialization outputs (what you get at the end)¶

Common end states include:

Pydantic rows (a list of row models)
Columnar dicts (dict[str, list])
Engine-native objects (e.g. Polars/Arrow) when the relevant extras are installed

See: Execution (copy/interchange + display) and Materialization (modes).

A common early gotcha: `shape` is not always “executed rows”¶

After lazy transforms, df.shape follows root-buffer semantics and may not reflect the number of rows that will materialize after execution.

This is documented as part of the compatibility contract:

Engines and backends (what “engine” means here)¶

There are two related ideas:

1) Execution backend: where the plan runs (the default native engine is Polars-backed inside the Rust extension). 2) Data sources / sinks: how you read/write data (files, HTTP, SQL, etc.).

Default execution¶

Out of the box, pydantable executes via the native extension.

If you want to understand the runtime and cost model:

Execution

Optional swap-in engines¶

pydantable also supports optional engines that keep the DataFrame API but use different backends:

I/O is a separate story (choose an entrypoint)¶

Even if you stay on the default execution engine, you still need to choose I/O entrypoints:

Where pydantable fits in the ecosystem¶

If you’re deciding between tools, start here: