Skip to content

Polars Parity Scorecard

This scorecard tracks practical parity for the currently implemented pydantable surface.

Status definitions: - Implemented: available and covered by contract/parity tests. - Partial: available with explicit constraints or reduced semantics. - Missing: not yet exposed as a stable API.

1.8.0 parity targets (reference)

1.8.0 shipped the selector/ergonomics work below (see POLARS_PARITY_1_8 and CHANGELOG 1.8.0). This table remains the checklist-style reference for what landed in that minor.

Area Target Status Notes
Core select_all() / select_prefix() / select_suffix() Implemented Schema-driven selectors (no wildcard/regex DSL).
Core select supports explicit expression aliasing Implemented select((expr).alias(\"x\")) for computed expressions; plain Expr requires ColumnRef or global agg.
Core with_columns positional aliased expressions Implemented Keep kwargs; add with_columns(expr.alias(\"x\"), ...).
Core sort(..., maintain_order=...) Implemented maintain_order=True uses stable sort semantics in the Polars engine.
Core drop(..., strict=...) Implemented strict=False ignores missing columns (no-op if all are missing).
Core rename(..., strict=...) Implemented strict=False ignores missing rename keys.
Core unique/distinct(..., maintain_order=...) Implemented maintain_order=True uses stable-unique semantics in the Polars engine.
GroupBy Group-by convenience methods (sum/mean/min/max/count/len) Implemented Deterministic naming (<col>_sum, etc.) and len via synthetic constant column.
GroupBy group_by(..., maintain_order=..., drop_nulls=...) Implemented maintain_order=True is stable; drop_nulls=False retains null-key groups.
Join join(..., coalesce=...) Partial Implemented for left_on/right_on name keys (incl multi-key) on inner/left/right/semi/anti, plus typed-safe full (matching key base dtypes only). Expression keys supported only for simple ColumnRef refs; computed expr keys remain unsupported. coalesce=False preserves both keys when schema-safe (may raise for collisions).
Join join(..., validate=...) Implemented Cardinality checks supported for in-memory roots and scan roots (explicit cost).
Join join(..., join_nulls=..., maintain_order=...) Implemented join_nulls controls null key matching (Polars nulls_equal). maintain_order supports 'none'|'left'|'right' plus bool mapping; both work for scan roots.
Join join(..., allow_parallel=..., force_parallel=...) Partial Arguments are accepted but currently raise NotImplementedError (engine API not exposed in pinned Polars Rust join args).
Reshape pivot(..., sort_columns=..., separator=...) Implemented sort_columns=True sorts pivot-value column generation; separator controls generated output names.
Utilities sample, shift, null_count, is_empty Implemented Eager helpers (materialize via to_dict()), returning a new DataFrame (or dict for null_count).
Area Method/Capability Status Notes
Core select, with_columns, filter Implemented Typed expression validation and SQL-like null filter behavior.
Materialization collect() (default), to_dict(), to_polars() Implemented Default collect()list[BaseModel]; to_dict() columnar dict; to_polars() requires optional Python polars (pydantable[polars]).
Core sort, unique/distinct, drop, rename, slice/head/tail, concat Implemented Contract-tested; deterministic schema propagation.
Null/type fill_null, drop_nulls, cast, is_null, is_not_null Implemented Includes error contracts and nullable schema derivation.
Core limit/first/last/top_k/bottom_k (schema-first helpers) Implemented Convenience wrappers over slice/sort with deterministic schemas.
Core Selection/rename ergonomics (select(exclude=...), reorder helpers, rename prefix/suffix/replace) Implemented Schema-driven column selection and naming helpers; collisions and empty selector matches raise explicit errors.
Core Selector coverage expansions (cast/fill/rename/projection helpers) Implemented with_columns_cast, with_columns_fill_null, select_schema, and selector-driven rename conveniences (upper/lower/title/strip).
Core with_row_count / clip / drop_nulls(how=..., threshold=...) / pipe Implemented Schema-first helpers: row numbering via plan step, numeric clamping via typed expressions, and row filtering with Polars-style null rules.
Join inner/left/right/full/semi/anti/cross Implemented Includes expression key support and suffix collision policy.
GroupBy count/sum/mean/min/max/median/std/var/first/last/n_unique Implemented SQL-like all-null-group behavior documented/tested.
Reshape melt/unpivot, pivot, pivot_longer/pivot_wider Implemented Deterministic output naming and validation rules; pivot_longer/pivot_wider are aliases. Selectors supported across reshape args (schema-driven).
Pandas UI duplicated, drop_duplicates(keep=False), get_dummies, cut/qcut, factorize_column, ewm().mean(), façade pivot Partial / Implemented Duplicate mask + drop-duplicate-groups are plan steps on the Polars engine (is_unique / is_first_distinct features); encoding/binning/ewm paths are eager and may require pandas at runtime. Tests: tests/test_pandas_ui.py, tests/test_pandas_ui_popular_features.py.
Reshape explode, unnest, explode_all/unnest_all Implemented Polars-backed; multi-column explode, empty lists, struct unnest naming, and mismatch errors are contract-tested. explode_all/unnest_all are schema-driven helpers (list/struct dtype groups). Typed-schema rules (homogeneous lists, nested models as structs) are the intentional boundary vs raw Polars.
Window/time row_number/rank/dense_rank/window_sum/window_mean/window_min/window_max/lag/lead + WindowSpec, rolling_agg, group_by_dynamic(...).agg(...) Implemented Window.orderBy(..., nulls_last=...) (NULLS FIRST/LAST); row_number requires order_by; lag/lead require order_by; generic Expr.over(partition_by=..., order_by=...) raises TypeError (use named window fns + WindowSpec). rowsBetween / rangeBetween framed windows use the Rust executor path; rangeBetween uses the first orderBy column as the range axis (WINDOW_SQL_SEMANTICS.md). Unframed multi-key .over: Polars accepts one SortOptions for all order columns—mixed per-key ascending / nulls_last raises ValueError; use matching options on every key or a framed window.
Temporal typing datetime, date, duration, time (+ nullable) Implemented End-to-end descriptor roundtrip and execution materialization paths.
Globals in select sum/mean/count/min/max over a column, global_row_count / count(*) Implemented Single-row DataFrame.select; see INTERFACE_CONTRACT.
Expr helpers strptime, unix_timestamp, from_unix_time, dt_dayofyear, cast(str→date/datetime), isin/is_in, matches, string empty/blank helpers, list/map predicate conveniences, map_len/map_get/map_contains_key, binary_len, dt_nanosecond Implemented Rust ExprNode + composed Python Expr helpers; contract tests.
Performance Guardrails for major transforms Implemented Lightweight regression checks in test suite.
Ecosystem Optional interfaces pandas and pyspark Implemented Alternate import/naming surfaces; execution is the same Rust core as default (not native pandas/Spark). 0.17.0: PySpark sql.functions adds string/list/bytes helpers (str_replace, strip_*, strptime, binary_len, list_*) as thin wrappers over core Expr. 0.20.0: PySpark UI DataFrame.show() / summary(); core + façades share columns / shape / info / describe (including date / datetime stats—see INTERFACE_CONTRACT Introspection).

Remaining parity gaps

  • Arbitrary Polars nested/list dtypes without a matching Pydantic list[T] / struct annotation are out of scope; the engine stays schema-first.
  • Window frame semantics match the documented PostgreSQL-style RANGE rules for multi-key orderBy, not every SQL dialect; see WINDOW_SQL_SEMANTICS.md.
  • Additional advanced analytical APIs outside the current roadmap scope.

0.18.0: No new table methods or PySpark functions rows; this release focused on internals (clearer group_by/Polars error context), documentation, and deferred non-string map keys—see ROADMAP.md Shipped in 0.18.0.

0.19.0: Scorecard matrix unchanged—pre-1.0 doc consolidation, VERSIONING.md, and CI-stable grouped tests; see ROADMAP.md Shipped in 0.19.0.

0.20.0: One ecosystem row update (see table)—UX / discovery on core + PySpark show / summary, plus value_counts, pydantable.display, _repr_mimebundle_, optional verbose plan errors; see ROADMAP.md Shipped in 0.20.0.