Performance notes¶
Benchmarking the native extension¶
Editable installs often build a debug extension. For comparable numbers, use a release build:
See also benchmarks/run_release.sh and DEVELOPER.md.
Where time goes (typical)¶
End-to-end work splits roughly into:
- Python ingestion validation —
validate_columns_strictruns PydanticTypeAdapterper cell whentrusted_mode="off"(default). - Rust ingest —
root_data_to_polars_dfcopies Python column lists into PolarsSeries, or ingests NumPy/PyArrow buffers, or (withtrusted_mode="shape_only"or"strict") a PolarsDataFramevia Arrow IPC (seeexecute_polars.rs). - Polars execution — lazy plan
collect()inside Rust (always synchronous inside the extension). - Rust → Python — results are materialized as Python column lists (
dict[str, list]) for the default Python API;collect()wraps rows as Pydantic models, andto_dict()exposes the columnar dict directly. Optionalto_polars()builds a PolarsDataFramewhen thepolarsextra is installed.
Async handlers (0.15.0+): acollect / ato_dict / ato_polars offload steps (3)+(4) to a thread pool so the asyncio loop stays free; 0.16.0 adds the same for ato_arrow. Synchronous pydantable.io.materialize_parquet / materialize_ipc block the current thread. See EXECUTION and FASTAPI.
Ratios vs raw Polars/pandas in benchmarks/pydantable_vs_*.py reflect this stack, not only step (3).
FastAPI and bulk ingest¶
HTTP handlers that build DataFrameModel from large or pre-validated tables often use trusted_mode="shape_only" or strict to skip per-cell Pydantic while keeping shape (and, with strict, dtype) guarantees. Threat modeling—who may skip RowModel validation—and Polars/Arrow patterns are covered in FASTAPI.md (“Large tables, Polars, Arrow, and trust boundaries”).
Profiling scripts¶
| Script | Purpose |
|---|---|
benchmarks/profile_breakdown.py |
Wall-time split: validation vs DataFrame construction vs transform+collect() |
benchmarks/micro_collect_only.py |
Mean time for collect() only on a pre-built DataFrame (execution + egress) |
benchmarks/framed_window_bench.py |
Mean time for collect() on a framed rowsBetween + window_sum pipeline |
benchmarks/trusted_polars_ingest_bench.py |
Polars root + trusted_mode="strict" ingest plus trivial select + collect() |
python -m cProfile / py-spy |
Deeper Python stacks; Rust: perf / Instruments on the _core shared library |
Run profile_breakdown.py --cprofile for a cumulative profile of one pipeline.
Tuning knobs (see code)¶
trusted_mode="shape_only"onDataFrame/DataFrameModel— skips per-cell Pydantic validation when you trust inputs; keys and column lengths are still checked. Usetrusted_mode="strict"when you want additional Polars dtype / nested-shape checks. NumPy and PyArrow column buffers can be preserved for a lower-copy Rust ingest path (numeric/bool dtypes that match the schema). A PolarsDataFramecan be passed as the root table on trusted paths. See DATAFRAMEMODEL and SUPPORTED_TYPES.collect()(default) — returns alistof Pydantic row models (validated against the current schema).to_dict()/collect(as_lists=True)— columnardict[str, list](common for tests and column-shaped responses).to_polars()— optional; requirespip install 'pydantable[polars]'.collect(as_numpy=True)— returnsdict[str, numpy.ndarray]from the columnar lists.- NumPy / PyArrow columns — with
trusted_mode="shape_only"(orstrict), compatiblenumpy.ndarrayandpyarrow.Array/ChunkedArraycolumns are converted in Rust without a Python per-element loop where dtypes match.
Release profile¶
Wheels and maturin develop --release use Cargo’s release profile. Optional thin LTO is enabled in pydantable-core/Cargo.toml to trade longer compile time for slightly faster native code.
0.19.0 validation¶
0.19.0 spot-checked benchmarks/profile_breakdown.py, benchmarks/micro_collect_only.py, benchmarks/framed_window_bench.py, and benchmarks/trusted_polars_ingest_bench.py under a release extension build on the supported Polars stack; 0.18.x did not change materialization or grouped execution hot paths in ways that require refreshed headline numbers here. Re-run these scripts locally when upgrading Polars or changing the Rust execution path.