Polars Parity Scorecard¶

This scorecard tracks practical parity for the currently implemented pydantable surface.

Status definitions: - Implemented: available and covered by contract/parity tests. - Partial: available with explicit constraints or reduced semantics. - Missing: not yet exposed as a stable API.

1.8.0 parity targets (reference)¶

1.8.0 shipped the selector/ergonomics work below (see POLARS_PARITY_1_8 and CHANGELOG 1.8.0). This table remains the checklist-style reference for what landed in that minor.

Area	Target	Status	Notes
Core	`select_all()` / `select_prefix()` / `select_suffix()`	Implemented	Schema-driven selectors (no wildcard/regex DSL).
Core	`select` supports explicit expression aliasing	Implemented	`select((expr).alias(\"x\"))` for computed expressions; plain `Expr` requires `ColumnRef` or global agg.
Core	`with_columns` positional aliased expressions	Implemented	Keep kwargs; add `with_columns(expr.alias(\"x\"), ...)`.
Core	`sort(..., maintain_order=...)`	Implemented	`maintain_order=True` uses stable sort semantics in the Polars engine.
Core	`drop(..., strict=...)`	Implemented	`strict=False` ignores missing columns (no-op if all are missing).
Core	`rename(..., strict=...)`	Implemented	`strict=False` ignores missing rename keys.
Core	`unique/distinct(..., maintain_order=...)`	Implemented	`maintain_order=True` uses stable-unique semantics in the Polars engine.
GroupBy	Group-by convenience methods (`sum/mean/min/max/count/len`)	Implemented	Deterministic naming (`<col>_sum`, etc.) and `len` via synthetic constant column.
GroupBy	`group_by(..., maintain_order=..., drop_nulls=...)`	Implemented	`maintain_order=True` is stable; `drop_nulls=False` retains null-key groups.
Join	`join(..., coalesce=...)`	Partial	Implemented for `left_on`/`right_on` name keys (incl multi-key) on `inner`/`left`/`right`/`semi`/`anti`, plus typed-safe `full` (matching key base dtypes only). Expression keys supported only for simple `ColumnRef` refs; computed expr keys remain unsupported. `coalesce=False` preserves both keys when schema-safe (may raise for collisions).
Join	`join(..., validate=...)`	Implemented	Cardinality checks supported for in-memory roots and scan roots (explicit cost).
Join	`join(..., join_nulls=..., maintain_order=...)`	Implemented	`join_nulls` controls null key matching (Polars `nulls_equal`). `maintain_order` supports `'none'\|'left'\|'right'` plus bool mapping; both work for scan roots.
Join	`join(..., allow_parallel=..., force_parallel=...)`	Partial	Arguments are accepted but currently raise `NotImplementedError` (engine API not exposed in pinned Polars Rust join args).
Reshape	`pivot(..., sort_columns=..., separator=...)`	Implemented	`sort_columns=True` sorts pivot-value column generation; `separator` controls generated output names.
Utilities	`sample`, `shift`, `null_count`, `is_empty`	Implemented	Eager helpers (materialize via `to_dict()`), returning a new `DataFrame` (or `dict` for `null_count`).

Area	Method/Capability	Status	Notes
Core	`select`, `with_columns`, `filter`	Implemented	Typed expression validation and SQL-like null filter behavior.
Materialization	`collect()` (default), `to_dict()`, `to_polars()`	Implemented	Default `collect()` → `list[BaseModel]`; `to_dict()` columnar dict; `to_polars()` requires optional Python `polars` (`pydantable[polars]`).
Core	`sort`, `unique/distinct`, `drop`, `rename`, `slice/head/tail`, `concat`	Implemented	Contract-tested; deterministic schema propagation.
Null/type	`fill_null`, `drop_nulls`, `cast`, `is_null`, `is_not_null`	Implemented	Includes error contracts and nullable schema derivation.
Core	`limit/first/last/top_k/bottom_k` (schema-first helpers)	Implemented	Convenience wrappers over `slice`/`sort` with deterministic schemas.
Core	Selection/rename ergonomics (`select(exclude=...)`, reorder helpers, rename prefix/suffix/replace)	Implemented	Schema-driven column selection and naming helpers; collisions and empty selector matches raise explicit errors.
Core	Selector coverage expansions (cast/fill/rename/projection helpers)	Implemented	`with_columns_cast`, `with_columns_fill_null`, `select_schema`, and selector-driven rename conveniences (upper/lower/title/strip).
Core	`with_row_count` / `clip` / `drop_nulls(how=..., threshold=...)` / `pipe`	Implemented	Schema-first helpers: row numbering via plan step, numeric clamping via typed expressions, and row filtering with Polars-style null rules.
Join	`inner/left/right/full/semi/anti/cross`	Implemented	Includes expression key support and suffix collision policy.
GroupBy	`count/sum/mean/min/max/median/std/var/first/last/n_unique`	Implemented	SQL-like all-null-group behavior documented/tested.
Reshape	`melt/unpivot`, `pivot`, `pivot_longer/pivot_wider`	Implemented	Deterministic output naming and validation rules; `pivot_longer/pivot_wider` are aliases. Selectors supported across reshape args (schema-driven).
Pandas UI	`duplicated`, `drop_duplicates(keep=False)`, `get_dummies`, `cut`/`qcut`, `factorize_column`, `ewm().mean()`, façade `pivot`	Partial / Implemented	Duplicate mask + drop-duplicate-groups are plan steps on the Polars engine (`is_unique` / `is_first_distinct` features); encoding/binning/ewm paths are eager and may require pandas at runtime. Tests: `tests/test_pandas_ui.py`, `tests/test_pandas_ui_popular_features.py`.
Reshape	`explode`, `unnest`, `explode_all/unnest_all`	Implemented	Polars-backed; multi-column explode, empty lists, struct `unnest` naming, and mismatch errors are contract-tested. `explode_all/unnest_all` are schema-driven helpers (list/struct dtype groups). Typed-schema rules (homogeneous lists, nested models as structs) are the intentional boundary vs raw Polars.
Window/time	`row_number`/`rank`/`dense_rank`/`window_sum`/`window_mean`/`window_min`/`window_max`/`lag`/`lead` + `WindowSpec`, `rolling_agg`, `group_by_dynamic(...).agg(...)`	Implemented	`Window.orderBy(..., nulls_last=...)` (NULLS FIRST/LAST); `row_number` requires `order_by`; `lag`/`lead` require `order_by`; generic `Expr.over(partition_by=..., order_by=...)` raises `TypeError` (use named window fns + `WindowSpec`). `rowsBetween` / `rangeBetween` framed windows use the Rust executor path; `rangeBetween` uses the first `orderBy` column as the range axis (`WINDOW_SQL_SEMANTICS.md`). Unframed multi-key `.over`: Polars accepts one `SortOptions` for all order columns—mixed per-key `ascending` / `nulls_last` raises `ValueError`; use matching options on every key or a framed window.
Temporal typing	`datetime`, `date`, `duration`, `time` (+ nullable)	Implemented	End-to-end descriptor roundtrip and execution materialization paths.
Globals in `select`	`sum`/`mean`/`count`/`min`/`max` over a column, `global_row_count` / `count(*)`	Implemented	Single-row `DataFrame.select`; see `INTERFACE_CONTRACT`.
Expr helpers	`strptime`, `unix_timestamp`, `from_unix_time`, `dt_dayofyear`, `cast(str→date/datetime)`, `isin/is_in`, `matches`, string empty/blank helpers, list/map predicate conveniences, `map_len`/`map_get`/`map_contains_key`, `binary_len`, `dt_nanosecond`	Implemented	Rust `ExprNode` + composed Python Expr helpers; contract tests.
Performance	Guardrails for major transforms	Implemented	Lightweight regression checks in test suite.
Ecosystem	Optional interfaces `pandas` and `pyspark`	Implemented	Alternate import/naming surfaces; execution is the same Rust core as default (not native pandas/Spark). 0.17.0: PySpark `sql.functions` adds string/list/bytes helpers (`str_replace`, `strip_`, `strptime`, `binary_len`, `list_`) as thin wrappers over core `Expr`. 0.20.0: PySpark UI `DataFrame.show()` / `summary()`; core + façades share `columns` / `shape` / `info` / `describe` (including `date` / `datetime` stats—see `INTERFACE_CONTRACT` Introspection).

Remaining parity gaps¶

Arbitrary Polars nested/list dtypes without a matching Pydantic list[T] / struct annotation are out of scope; the engine stays schema-first.
Window frame semantics match the documented PostgreSQL-style RANGE rules for multi-key orderBy, not every SQL dialect; see WINDOW_SQL_SEMANTICS.md.
Additional advanced analytical APIs outside the current roadmap scope.

0.18.0: No new table methods or PySpark functions rows; this release focused on internals (clearer group_by/Polars error context), documentation, and deferred non-string map keys—see ROADMAP.md Shipped in 0.18.0.

0.19.0: Scorecard matrix unchanged—pre-1.0 doc consolidation, VERSIONING.md, and CI-stable grouped tests; see ROADMAP.md Shipped in 0.19.0.

0.20.0: One ecosystem row update (see table)—UX / discovery on core + PySpark show / summary, plus value_counts, pydantable.display, _repr_mimebundle_, optional verbose plan errors; see ROADMAP.md Shipped in 0.20.0.