Mental model¶
This page is the “map” of pydantable: the core concepts and how they relate.
If you’re new, read this once, then jump to the Five-minute tour or DataFrameModel.
The four core nouns¶
Schema (row shape)¶
Schema is a Pydantic-style row model used to describe the shape and types of a table.
- Used with
DataFrame[RowSchema] - Used as an output type for materialized rows (e.g.
collect())
See: DataFrameModel (it covers Schema as part of the typed table story).
DataFrame[T] (typed table)¶
DataFrame[T] is a typed table whose columns match the schema T.
You can create a DataFrame from columnar data, from I/O helpers, or from optional engine-specific sources.
Start here:
Expr (typed expressions)¶
Expr is how you refer to and compute columns (e.g., df.score > 8.0).
Expressions are designed to remain type-aware so transforms can stay typed and composable.
Start here:
DataFrameModel (a “table model”)¶
DataFrameModel is the higher-level “SQLModel-style” concept: a reusable typed table definition with methods for:
- ingest/validation rules
- typed transforms
- structured I/O entrypoints
If you’re building real pipelines (especially in services), this is usually the best place to start.
Start here: DataFrameModel
Execution: plans and materialization¶
pydantable is designed so that many operations can remain lazy until you explicitly choose to materialize.
Two key pages:
- Execution: what runs when, and what costs what
- Materialization: the “how” of turning a plan into output
Materialization outputs (what you get at the end)¶
Common end states include:
- Pydantic rows (a list of row models)
- Columnar dicts (
dict[str, list]) - Engine-native objects (e.g. Polars/Arrow) when the relevant extras are installed
See: Execution (copy/interchange + display) and Materialization (modes).
A common early gotcha: shape is not always “executed rows”¶
After lazy transforms, df.shape follows root-buffer semantics and may not reflect the number of rows that will materialize after execution.
This is documented as part of the compatibility contract:
Engines and backends (what “engine” means here)¶
There are two related ideas:
1) Execution backend: where the plan runs (the default native engine is Polars-backed inside the Rust extension). 2) Data sources / sinks: how you read/write data (files, HTTP, SQL, etc.).
Default execution¶
Out of the box, pydantable executes via the native extension.
If you want to understand the runtime and cost model:
Optional swap-in engines¶
pydantable also supports optional engines that keep the DataFrame API but use different backends:
I/O is a separate story (choose an entrypoint)¶
Even if you stay on the default execution engine, you still need to choose I/O entrypoints:
Where pydantable fits in the ecosystem¶
If you’re deciding between tools, start here: