Skip to content

Engine API parity (typed surface)

This page tracks API parity across execution engines for the typed pydantable surface (DataFrame[Schema] / DataFrameModel), independent of alternate naming surfaces (pydantable.pandas, pydantable.pyspark).

It is intentionally smaller than the full Interface contract: this is a checklist-style matrix used to prevent drift between engines.

Contract: typed-first vs engine-native expressions

PydanTable supports two kinds of expression inputs:

  • Typed expressions: pydantable.expressions.Expr (built by the Rust expression runtime).
  • Engine-native expressions: backend-specific expression objects (e.g. pyspark.sql.Column).

Policy (current):

  • Core typed API (pydantable.DataFrame): typed expressions (Expr) everywhere.
  • Engine-backed DataFrames may accept engine-native expressions when their engine executes outside the native Rust expression runtime.
  • Alternate surfaces (pydantable.pandas, pydantable.pyspark) provide ergonomic wrappers and may accept typed expressions even when the underlying engine-backed class uses engine-native expressions.

In other words: typed-first for the core, engine-native where required, with explicit wrappers/adapters for consistency.

Transform surface matrix (high-signal core ops)

Legend:

  • Y: supported with typed expressions (Expr)
  • N: not supported
  • Native: supported but expects engine-native expressions
Transform Core (native) SQL Mongo Spark
select Y Y Y Y
with_columns Y Y Y Native
filter Y Y Y Native
join Y Y Y Y
group_by Y Y Y Y
sort Y Y Y Y
limit Y Y Y Y
distinct / unique Y Y Y Y
pivot Y Y Y Y
explode Y Y Y Y
concat Y Y Y Y

Notes:

  • Spark (SparkDataFrame) is backed by raikou-core and expects Spark-native expressions for filter / with_columns. Use the PySpark-shaped wrapper export to keep the typed Expr experience (see below).

When you want a consistent UI across engines, prefer these wrappers:

  • Pandas-shaped: pydantable.pandas.{SqlDataFrame,SparkDataFrame,MongoDataFrame}
  • PySpark-shaped: pydantable.pyspark.{SqlDataFrame,SparkDataFrame,MongoDataFrame}

These wrappers exist to keep method names and (where feasible) expression types consistent.

Engine-specific typed methods (additive)

Each engine may also expose additive methods that preserve PydanTable’s typing by validating against the schema and keeping raw-string escape hatches explicit.

  • SQL (SqlDataFrame):
  • from_sql(selectable, ...): build a lazy root from an existing SQLAlchemy selectable.
  • where(whereclause): push down a SQLAlchemy WHERE clause with schema validation.
  • Mongo (MongoDataFrame):
  • match(filter: dict[str, Any]): $match-style filter pushdown with schema key validation.
  • project(fields: Sequence[str] | dict[str, int]): typed projection (schema update).
  • Spark (SparkDataFrame):
  • where_native(condition): explicit engine-native filter (PySpark Column).
  • select_native(*cols): explicit engine-native projection for simple column refs.