Polars Parity Scorecard¶
This scorecard tracks practical parity for the currently implemented pydantable
surface.
Status definitions:
- Implemented: available and covered by contract/parity tests.
- Partial: available with explicit constraints or reduced semantics.
- Missing: not yet exposed as a stable API.
1.8.0 parity targets (reference)¶
1.8.0 shipped the selector/ergonomics work below (see POLARS_PARITY_1_8 and CHANGELOG 1.8.0). This table remains the checklist-style reference for what landed in that minor.
| Area | Target | Status | Notes |
|---|---|---|---|
| Core | select_all() / select_prefix() / select_suffix() |
Implemented | Schema-driven selectors (no wildcard/regex DSL). |
| Core | select supports explicit expression aliasing |
Implemented | select((expr).alias(\"x\")) for computed expressions; plain Expr requires ColumnRef or global agg. |
| Core | with_columns positional aliased expressions |
Implemented | Keep kwargs; add with_columns(expr.alias(\"x\"), ...). |
| Core | sort(..., maintain_order=...) |
Implemented | maintain_order=True uses stable sort semantics in the Polars engine. |
| Core | drop(..., strict=...) |
Implemented | strict=False ignores missing columns (no-op if all are missing). |
| Core | rename(..., strict=...) |
Implemented | strict=False ignores missing rename keys. |
| Core | unique/distinct(..., maintain_order=...) |
Implemented | maintain_order=True uses stable-unique semantics in the Polars engine. |
| GroupBy | Group-by convenience methods (sum/mean/min/max/count/len) |
Implemented | Deterministic naming (<col>_sum, etc.) and len via synthetic constant column. |
| GroupBy | group_by(..., maintain_order=..., drop_nulls=...) |
Implemented | maintain_order=True is stable; drop_nulls=False retains null-key groups. |
| Join | join(..., coalesce=...) |
Partial | Implemented for left_on/right_on name keys (incl multi-key) on inner/left/right/semi/anti, plus typed-safe full (matching key base dtypes only). Expression keys supported only for simple ColumnRef refs; computed expr keys remain unsupported. coalesce=False preserves both keys when schema-safe (may raise for collisions). |
| Join | join(..., validate=...) |
Implemented | Cardinality checks supported for in-memory roots and scan roots (explicit cost). |
| Join | join(..., join_nulls=..., maintain_order=...) |
Implemented | join_nulls controls null key matching (Polars nulls_equal). maintain_order supports 'none'|'left'|'right' plus bool mapping; both work for scan roots. |
| Join | join(..., allow_parallel=..., force_parallel=...) |
Partial | Arguments are accepted but currently raise NotImplementedError (engine API not exposed in pinned Polars Rust join args). |
| Reshape | pivot(..., sort_columns=..., separator=...) |
Implemented | sort_columns=True sorts pivot-value column generation; separator controls generated output names. |
| Utilities | sample, shift, null_count, is_empty |
Implemented | Eager helpers (materialize via to_dict()), returning a new DataFrame (or dict for null_count). |
| Area | Method/Capability | Status | Notes |
|---|---|---|---|
| Core | select, with_columns, filter |
Implemented | Typed expression validation and SQL-like null filter behavior. |
| Materialization | collect() (default), to_dict(), to_polars() |
Implemented | Default collect() → list[BaseModel]; to_dict() columnar dict; to_polars() requires optional Python polars (pydantable[polars]). |
| Core | sort, unique/distinct, drop, rename, slice/head/tail, concat |
Implemented | Contract-tested; deterministic schema propagation. |
| Null/type | fill_null, drop_nulls, cast, is_null, is_not_null |
Implemented | Includes error contracts and nullable schema derivation. |
| Core | limit/first/last/top_k/bottom_k (schema-first helpers) |
Implemented | Convenience wrappers over slice/sort with deterministic schemas. |
| Core | Selection/rename ergonomics (select(exclude=...), reorder helpers, rename prefix/suffix/replace) |
Implemented | Schema-driven column selection and naming helpers; collisions and empty selector matches raise explicit errors. |
| Core | Selector coverage expansions (cast/fill/rename/projection helpers) | Implemented | with_columns_cast, with_columns_fill_null, select_schema, and selector-driven rename conveniences (upper/lower/title/strip). |
| Core | with_row_count / clip / drop_nulls(how=..., threshold=...) / pipe |
Implemented | Schema-first helpers: row numbering via plan step, numeric clamping via typed expressions, and row filtering with Polars-style null rules. |
| Join | inner/left/right/full/semi/anti/cross |
Implemented | Includes expression key support and suffix collision policy. |
| GroupBy | count/sum/mean/min/max/median/std/var/first/last/n_unique |
Implemented | SQL-like all-null-group behavior documented/tested. |
| Reshape | melt/unpivot, pivot, pivot_longer/pivot_wider |
Implemented | Deterministic output naming and validation rules; pivot_longer/pivot_wider are aliases. Selectors supported across reshape args (schema-driven). |
| Pandas UI | duplicated, drop_duplicates(keep=False), get_dummies, cut/qcut, factorize_column, ewm().mean(), façade pivot |
Partial / Implemented | Duplicate mask + drop-duplicate-groups are plan steps on the Polars engine (is_unique / is_first_distinct features); encoding/binning/ewm paths are eager and may require pandas at runtime. Tests: tests/test_pandas_ui.py, tests/test_pandas_ui_popular_features.py. |
| Reshape | explode, unnest, explode_all/unnest_all |
Implemented | Polars-backed; multi-column explode, empty lists, struct unnest naming, and mismatch errors are contract-tested. explode_all/unnest_all are schema-driven helpers (list/struct dtype groups). Typed-schema rules (homogeneous lists, nested models as structs) are the intentional boundary vs raw Polars. |
| Window/time | row_number/rank/dense_rank/window_sum/window_mean/window_min/window_max/lag/lead + WindowSpec, rolling_agg, group_by_dynamic(...).agg(...) |
Implemented | Window.orderBy(..., nulls_last=...) (NULLS FIRST/LAST); row_number requires order_by; lag/lead require order_by; generic Expr.over(partition_by=..., order_by=...) raises TypeError (use named window fns + WindowSpec). rowsBetween / rangeBetween framed windows use the Rust executor path; rangeBetween uses the first orderBy column as the range axis (WINDOW_SQL_SEMANTICS.md). Unframed multi-key .over: Polars accepts one SortOptions for all order columns—mixed per-key ascending / nulls_last raises ValueError; use matching options on every key or a framed window. |
| Temporal typing | datetime, date, duration, time (+ nullable) |
Implemented | End-to-end descriptor roundtrip and execution materialization paths. |
Globals in select |
sum/mean/count/min/max over a column, global_row_count / count(*) |
Implemented | Single-row DataFrame.select; see INTERFACE_CONTRACT. |
| Expr helpers | strptime, unix_timestamp, from_unix_time, dt_dayofyear, cast(str→date/datetime), isin/is_in, matches, string empty/blank helpers, list/map predicate conveniences, map_len/map_get/map_contains_key, binary_len, dt_nanosecond |
Implemented | Rust ExprNode + composed Python Expr helpers; contract tests. |
| Performance | Guardrails for major transforms | Implemented | Lightweight regression checks in test suite. |
| Ecosystem | Optional interfaces pandas and pyspark |
Implemented | Alternate import/naming surfaces; execution is the same Rust core as default (not native pandas/Spark). 0.17.0: PySpark sql.functions adds string/list/bytes helpers (str_replace, strip_*, strptime, binary_len, list_*) as thin wrappers over core Expr. 0.20.0: PySpark UI DataFrame.show() / summary(); core + façades share columns / shape / info / describe (including date / datetime stats—see INTERFACE_CONTRACT Introspection). |
Remaining parity gaps¶
- Arbitrary Polars nested/list dtypes without a matching Pydantic
list[T]/ struct annotation are out of scope; the engine stays schema-first. - Window frame semantics match the documented PostgreSQL-style
RANGErules for multi-keyorderBy, not every SQL dialect; seeWINDOW_SQL_SEMANTICS.md. - Additional advanced analytical APIs outside the current roadmap scope.
0.18.0: No new table methods or PySpark functions rows; this release focused on internals (clearer group_by/Polars error context), documentation, and deferred non-string map keys—see ROADMAP.md Shipped in 0.18.0.
0.19.0: Scorecard matrix unchanged—pre-1.0 doc consolidation, VERSIONING.md, and CI-stable grouped tests; see ROADMAP.md Shipped in 0.19.0.
0.20.0: One ecosystem row update (see table)—UX / discovery on core + PySpark show / summary, plus value_counts, pydantable.display, _repr_mimebundle_, optional verbose plan errors; see ROADMAP.md Shipped in 0.20.0.