Custom execution engine packages¶
This guide is for authors of separate Python packages that implement
pydantable.engine.protocols.ExecutionEngine (defined in
pydantable_protocol, re-exported under pydantable.engine.protocols) so end users can run
PydanTable’s typed DataFrame / DataFrameModel API on top of your backend
(for example a SQL engine, a remote service, or another dataframe library).
For design rationale and guardrails inside this repository, see ADR-engines. For day-to-day contributor setup, see DEVELOPER.
Dependencies¶
Minimum (protocol only):
- Add
pydantable-protocolas a normal dependency, with a version pin that matches the PydanTable releases you support (same1.x.yaspydantableon PyPI — see VERSIONING). - You do not need to depend on
pydantableto define your engine class: implement the structural protocol frompydantable_protocoland raisepydantable_protocol.UnsupportedEngineOperationError(or a subclass) when an operation is not available.
Recommended for development and tests:
pydantableas a dev / test dependency so you can buildDataFrame/DataFrameModelinstances, run materialization, and run integration tests.
End-user installs:
- Applications install
pydantableand your package (and any DB drivers, HTTP clients, etc.). They might also installpydantable-nativeif they want the default Polars-backed engine for file I/O or mixed stacks.
Implementing ExecutionEngine¶
The interface is a typing.Protocol: no inheritance required. At import
time, static checkers and isinstance (with
@runtime_checkable) can verify your class against
pydantable.engine.protocols.ExecutionEngine.
-
Plan construction — Implement
make_plan,plan_*, expression helpers (make_literal,expr_is_global_agg, …) for the operations you intend to support. PydanTable passes opaque plan handles between methods; the native engine wraps Rust objects — your backend might use SQL strings, ORM query objects, or client-specific handles, as long as you keep types consistent acrossexecute_*andplan_*for a given frame. -
Execution — Implement
execute_plan,async_execute_plan,collect_batches,async_collect_plan_batches, and anyexecute_join,execute_groupby_agg, … helpers that your users will trigger. Match documented semantics where you claim parity (INTERFACE_CONTRACT). -
Sinks — Implement
write_parquet,write_csv,write_ipc,write_ndjsonif users can export through your backend; otherwise raiseUnsupportedEngineOperationErrorwith a clear message. -
Capabilities — Expose a
capabilitiesproperty returningEngineCapabilities. Setbackendto"custom". Set feature flags (has_execute_plan,has_sink_parquet, …) to reflect reality so UIs and tests can probe support without trying every API. -
Async honesty —
has_async_execute_planandhas_async_collect_plan_batchesmust match whetherasync_*methods truly work.
Reference implementations:
StubExecutionEnginein this repo (python/pydantable/engine/stub.py): minimal surface, raises on most calls — good for typing tests.-
NativePolarsEngineinpydantable-native: full implementation overpydantable_native._core. -
Third-party: the lazy-SQL optional stack (SQLAlchemy bridge +
ExecutionEngine) ships alongside pydantable’sSqlDataFrame/SqlDataFrameModelpath; see the bridge’s PyPI page anddocs/PYDANTABLE_ENGINE.md. PydanTable user guide: SQL_ENGINE (SqlDataFrame,SqlDataFrameModel,pydantable[sql]).
When PydanTable adds new protocol members, contract tests in this project
(exercising typing_extensions.get_protocol_members) and release notes
will flag required updates — pin pydantable-protocol accordingly.
Wiring your engine in applications¶
Per frame (preferred for mixing backends):
- Pass
engine=when constructingDataFrameorDataFrameModelso the innerDataFrameuses your implementation (seeDataFrameModeldocstring in the source tree).
Process-wide default:
- Call
pydantable.engine.set_default_engine(your_engine)before code that usesget_default_engine(). Ifpydantable-nativeis not installed,get_default_engine()cannot fall back toNativePolarsEngine, soset_default_engine(or explicitengine=everywhere) is required.
Global state: the default engine and expression runtime (below) are process-wide; document thread and testing implications for your users if they use multiple engines.
Expressions (Expr)¶
Expression trees used by filter / with_columns / etc. are built via
pydantable.engine.get_expression_runtime(), which defaults to the native
Rust core only when the default engine is NativePolarsEngine.
For a non-native default, either:
- call
pydantable.engine.set_expression_runtime(lambda: ...)to supply an object compatible with howExpris built in your stack, or - steer users away from
Expr-heavy APIs until a portable expression IR exists (ADR-engines, Track B).
Otherwise PydanTable raises UnsupportedEngineOperationError when building expressions.
File I/O vs execution engine¶
Many lazy read_* entry points (see IO_OVERVIEW) can use pydantable-native for
fast local scans. That path is separate from DataFrame._engine: shipping
a custom ExecutionEngine does not automatically redirect Parquet/CSV reads
through your backend. If your product needs “everything goes to SQL”, you
typically expose your own ingestion APIs and then construct
DataFrame instances already bound to your engine=.
Errors¶
-
pydantable_protocol.UnsupportedEngineOperationError: raise for unsupportedplan_*,execute_*, or sinks.pydantable.errors.UnsupportedEngineOperationErrorsubclasses it, soisinstance(exc, pydantable_protocol.UnsupportedEngineOperationError)catches both library and third-party raises. -
pydantable_protocol.MissingRustExtensionError: reserved for missing or incomplete native extension scenarios; custom engines normally do not raise it.
Testing checklist¶
- Unit tests — Engine methods with fake plan/root handles.
- Integration tests —
pip installpydantable+ your package; build a smallDataFrameModel, runcollect/to_dict,select, etc. - Protocol drift — Periodically assert your class satisfies all
ExecutionEnginemembers (mirrortests/test_engine_contract.pyin this repo). - Version matrix — CI over the
pydantable/pydantable-protocolversions you claim to support.
Publishing¶
- One PyPI project per engine — e.g.
your-org-pydantable-foo. - Pin
pydantable-protocolto the minor line you test against; relax or tighten as upstream releases new protocol methods. - Document whether
pydantable-nativeis optional, recommended, or unsupported for your integration. - Read-only note: this repo’s
scripts/check_engine_bypass.pyapplies topython/pydantable/only; your package is not bound by that allowlist, but avoiding direct imports ofpydantable_native._corein pydantable itself keeps alternative engines viable.
See also¶
- SQL_ENGINE —
SqlDataFrame/SqlDataFrameModelwith the lazy-SQL stack (pydantable[sql]). - MONGO_ENGINE —
MongoDataFrame/MongoDataFrameModel(MongoPydantableEngineinpydantable.mongo_dataframe_engine,MongoRootfrom the Mongo plan stack; façadepydantable.mongo_dataframe,pydantable[mongo]). Eagerfetch_mongo/iter_mongo/write_mongoandafetch_mongo/aiter_mongo/awrite_mongo(PyMongo column dicts) are separate fromExecutionEngine— not a third-party engine package. - ADR-engines — architecture decisions and extension checklist.
- DEVELOPER — repository layout and native packaging.
- EXECUTION — how materialization uses the engine.
- INTERFACE_CONTRACT — behavioural guarantees users may expect.
- VERSIONING — aligning
pydantable,pydantable-protocol, and native versions.