Five-minute tour¶
This page is the RTD-friendly version of the optional notebook in the repository
at notebooks/five_minute_tour.ipynb (same steps).
It uses the same steps: build a typed DataFrame, inspect it, summarize, filter, and materialize.
Note
Requires a working pydantable install with the Rust extension (pip install . or wheels).
If you’re deciding between tools
If you’re choosing between pydantable and a general-purpose DataFrame library, start here:
1. Define a table model¶
from pydantable import DataFrameModel
class Sales(DataFrameModel):
id: int
score: float
label: str
df = Sales({
"id": [1, 2, 3],
"score": [10.0, 20.5, 7.0],
"label": ["a", "b", "a"],
})
Other patterns
DataFrame[Schema]— same engine, generic API (Typing guide)- Avoid plain
pydantic.BaseModelunless you intentionally skipDataFrameModelhelpers
2. String repr and HTML (Jupyter)¶
In a terminal, repr(df) shows the schema and column dtypes (no row count—plans may be lazy).
In Jupyter / VS Code, the last expression in a cell can render as HTML via _repr_html_() (bounded preview; same cost class as head() + to_dict() for the slice). See EXECUTION Jupyter / HTML and Display options.
3. Discovery helpers¶
df.columns
df.shape # root-buffer semantics after lazy transforms—see [INTERFACE_CONTRACT](../semantics/interface-contract/)
df.info()
print(df.describe())
Runnable script:
from future import annotations
from pydantable import DataFrame from pydantic import BaseModel
class Row(BaseModel): id: int score: float label: str
def main() -> None: df = DataFrameRow
print("df.columns =", df.columns)
print("df.shape =", df.shape)
print()
print(df.info())
print()
print(df.describe())
if name == "main": main()
Output¶
df.columns = ['id', 'score', 'label']
df.shape = (3, 3)
DataFrame[Row]
schema: Row
columns: 3
shape (root buffer): 3 x 3
Note: after lazy transforms (e.g. filter), root row count may not match materialized rows; use to_dict() or collect() for true count.
dtypes:
id: int
score: float
label: str
describe() — one to_dict(); int/float/bool/str/date/datetime columns.
id: count=3 mean=2 std=1 min=1 max=3
score: count=3 mean=12.5 std=7.08872 min=7.0 max=20.5
label: count=3 n_unique=2 min_len=1 max_len=1 null=0
4. Filter and materialize¶
filtered = df.filter(df.score > 8.0)
rows = filtered.collect() # list[Pydantic row models]
cols = filtered.to_dict() # dict[str, list]
Use to_polars() / to_arrow() when the optional extras are installed (EXECUTION Copy as / interchange).
5. Join, group, and window (optional)¶
For a longer analytics walkthrough (join + groupby + window), run:
Where to read next¶
- DataFrameModel — validation, transforms, service patterns
- PANDAS_UI — optional
pydantable.pandasimport (assign,merge, cleaning helpers) - EXECUTION — materialization cost, async, display limits
- INTERFACE_CONTRACT — semantics (joins, nulls,
shapevs executed rows) - IO_DECISION_TREE — pick lazy vs eager I/O; prefer
DataFrameModel/DataFrame[Schema]classmethods over rawpydantable.io - IO_OVERVIEW — per-format tables (Parquet, CSV, NDJSON, JSON, IPC, HTTP, SQL)
- MONGO_ENGINE / BEANIE — optional
pydantable[mongo](lazyMongoDataFrame, eagerfetch_mongo/afetch_mongo, Beanie ODM helpers)