ADR: Modular execution engines¶
Status¶
Accepted — documents the Python-side refactor introducing pydantable.engine.
Context¶
PydanTable executes typed DataFrame operations via the PyO3 extension shipped in pydantable-native (pydantable_native._core, Polars-backed). Call sites previously imported pydantable.rust_engine helpers and _require_rust_core() from many modules, which made an alternate backend (for example SQL generation) hard to isolate.
Decision¶
NativePolarsEngine(in thepydantable-nativedistribution) owns all calls intopydantable_native._corefor plan execution, eager ops (execute_*), and sinks (write_*to Parquet/CSV/IPC/NDJSON).get_default_engine()returns a process-wide default engine typed asExecutionEngine;set_default_engine(None)resets it to a lazily constructedNativePolarsEngine(primarily for tests).DataFrameholds_engineand routes plan transforms and materialization throughself._engineonly (no directrust_coreaccess from the DataFrame implementation)._rust_planremains the opaque logical plan handle (unchanged public shape).ExecutionEngineis defined in the zero-dependencypydantable-protocoldistribution (re-exported frompython/pydantable/engine/protocols.pyfor convenience). It is the single structural protocol for a drop-in backend:PlanExecutor+SinkWriter+ plan helpers (make_plan,plan_*, …) +capabilities. Third-party engine packages can depend only onpydantable-protocolfor typing andpydantable_protocol.UnsupportedEngineOperationError; they do not need to installpydantable. Implementations raise that error (or a subclass) for unsupported operations.MissingRustExtensionErroralso lives inpydantable-protocolsopydantable-nativecan load without importingpydantable.pydantable-nativedepends only onpydantable-protocol(notpydantable).pydantabledeclarespydantable-nativeas a required dependency sopip install pydantableinstalls the full stack.EngineCapabilitiesincludesbackend: Literal["native", "stub", "custom"]plus feature flags derived from the native module when applicable.rust_engine.pyremains a compatibility shim: free functions delegate toget_default_engine()so existing imports keep working.- Expressions:
get_expression_runtime()supplies the object used to buildExprtrees when the default engine isNativePolarsEngine; otherwise callers must useset_expression_runtime(...)or expression APIs will raiseUnsupportedEngineOperationError. - Protocols (
PlanExecutor,SinkWriter,ExecutionEngine) describe the intended seam; they do not require inheritance. StubExecutionEngine(python/pydantable/engine/stub.py) is an in-tree reference for registry and typing tests.- Plan helpers that build lazy plans (including
plan_rolling_agg) live onExecutionEngineso façades useDataFrame._engineinstead ofget_default_engine().rust_core. - Lazy
ScanFileRootconstruction inpydantable.iousespydantable_native.require_rust_core()rather than importing the extension directly, keeping extension loading in one place.
Consequences¶
- New engine work should implement
ExecutionEnginewhere possible and document gaps viaEngineCapabilitiesandUnsupportedEngineOperationError. - Tests that monkeypatched
pydantable.dataframe._impl.execute_planshould patchNativePolarsEngine.execute_plan(or the frame’s engine class) soDataFramedispatch stays consistent. - Tests that monkeypatched
rust_engine._RUST_COREshould patchpydantable_native._binding._RUST_COREinstead soNativePolarsEnginesees the fake. scripts/check_engine_bypass.py(run viamake engine-bypass-checkand CI) fails if new code underpython/pydantable/imports the native extension directly, usesget_default_engine().rust_core, or similar bypasses outside the allowlist: the wholepython/pydantable/engine/tree,python/pydantable/_extension.py, andpython/pydantable/rust_engine.py.
Extension checklist (custom backend)¶
End-to-end guide for shipping a third-party engine on PyPI: CUSTOM_ENGINE_PACKAGE. In-repo optional integrations: SQL_ENGINE (lazy-SQL stack), MONGO_ENGINE (MongoPydantableEngine in pydantable, MongoRoot from the Mongo plan stack for MongoDataFrame). Eager Mongo column-dict helpers (fetch_mongo / write_mongo, afetch_mongo / awrite_mongo, … — PyMongo) are normal I/O, not a third-party ExecutionEngine package.
- Implement
ExecutionEngine(seepydantable_protocol, re-exported underpydantable.engine.protocols) — mirrorNativePolarsEnginefor operations you support. - Return accurate
capabilities(setbackendto"custom"when not native/stub). - For unsupported calls, raise
UnsupportedEngineOperationErrorwith a clear message. - If users should build
Expragainst your default engine, registerset_expression_runtime(...)(native does this implicitly viaget_expression_runtime()). - Keep
StubExecutionEngineandtests/test_engine_contract.pyin sync whenExecutionEnginegains new members (contract tests usetyping_extensions.get_protocol_members).
Track B (optional, not scheduled)¶
A portable Python expression IR shared by multiple backends would require a separate roadmap (Rust bridge, typing, and parity tests). Eager file I/O helpers are provided by pydantable-native alongside the extension today; routing them through ExecutionEngine would be a separate phase if a single choke point for all native entry points is desired.