Skip to content

Changelog

All notable changes to this project are documented here. The format is inspired by Keep a Changelog.

[Unreleased]

[1.19.2] — 2026-06-09

Documentation

  • Adoption-focused docs overhaul: installation guide with verify snippet and platform matrix, migration guide, glossary, best practices, expanded API reference (I/O, engines, FastAPI, errors), CONTRIBUTING.md and SECURITY.md, restructured MkDocs navigation (Get started / Guides / Recipes / Reference / Internals), forward-looking roadmap split from history, unified quickstart on DataFrameModel, and core analytics example.

Maintenance

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.19.2.

[1.19.1] — 2026-06-06

Maintenance

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.19.1.
  • Strengthen the test suite: native rust_engine roundtrips, cross-engine behavioral parity checks, join validation/order assertions, and targeted coverage cleanup.

[1.19.0] — 2026-04-20

Added

  • Engine pushdown methods (typed): SQL, Mongo, and Spark DataFrame surfaces add typed “native/pushdown” methods (and explicit guardrails) so engine-specific operations are available when the frame is backed by that engine, with clear errors when not supported.
  • Engine parity matrix + tests: a dedicated engine parity table plus tests/guardrails to keep engine capabilities and error messages consistent.
  • UI wrappers (optional engines): pandas- and PySpark-shaped wrapper modules for the Spark and Mongo DataFrame surfaces.
  • pydantable-rag: FastAPI backend for a documentation-grounded assistant (RAG), including a /chat-app UI, extractive mode, optional OpenAI-backed generation, and a committed SQLite vector index for deployment.

Documentation

  • Engines: updated SQL/Mongo/Spark engine docs plus new parity/guardrails documentation.
  • Docs assistant: floating docs chat launcher (MkDocs overrides + CSS/JS) and README pointers for the RAG backend and deployment.

Performance

  • Native engine (Rust / Polars execution): reduced materialization overhead and added benchmarks for native materialization hot paths.

CI / deployment

  • Docs assistant workflows: FastAPI Cloud deploy workflow for pydantable-rag, plus a workflow to verify the committed vector index and optionally rebuild it in CI when OPENAI_API_KEY is configured.

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.19.0.

[1.18.1] — 2026-04-16

Documentation

  • Versioning: Consolidated 2.0.0 migration pointers ( as_polars=, legacy SQL I/O names, moltres_engine=, Entei*Mongo* ) in VERSIONING Planned removals.

Maintenance

  • pydantable-core (Rust / PyO3): Migrated off deprecated IntoPy / .into_py(py) to IntoPyObject / into_py_any, updated tuple/list/dict constructors for PyO3 0.24, replaced PyErr::value_bound with value, and dropped crate-level allow(deprecated). Introduced shared literal_value_to_pyobject for LiteralValue → Python and moved datetime construction helpers to py_datetime.rs (used from expression serialization and Polars execution).
  • Narrowed optional-import except clauses (ImportError / OSError) where appropriate; documented intentional broad catches in I/O materialization fallbacks, Expr repr, and io.mongo.

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.18.1.

[1.18.0] — 2026-04-15

Added

  • PySpark (optional pydantable[spark]): SparkDataFrame / SparkDataFrameModel façade on raikou-core, with lazy pydantable.pyspark.sparkdantic integration for SparkDantic-based JVM schemas. Documentation and tests: SPARK_ENGINE, PYSPARK_UI, PYSPARK_PARITY, and related coverage.

Documentation

  • MkDocs + Material for MkDocs replace Sphinx for the published manual; Read the Docs and CI use mkdocs build --strict (mkdocs.yml, docs/mkdocs_hooks.py).
  • Lazy-SQL user guide file renamed MOLTRES_SQL.mdSQL_ENGINE.md (links and nav updated; content unchanged).

CI

  • Test installs use pip install -e ".[dev]" plus raikou-core / pyspark where needed; Windows skips @pytest.mark.spark (JVM / winutils constraints). Docs jobs include Spark stack for mkdocstrings imports.

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.18.0.

[1.17.0] — 2026-04-14

Added

  • Mongo engine (entei-core): pydantable.mongo_dataframeMongoDataFrame / MongoDataFrameModel facades (same pattern as pydantable.sql_dataframe + moltres-core). entei-core supplies MongoRoot (and materialization helpers); MongoPydantableEngine is implemented in pydantable.mongo_dataframe_engine.
  • Mongo / Beanie: pydantable.mongo_beanie.sync_pymongo_collection (also a lazy import from pydantable), MongoDataFrame.from_beanie, and MongoDataFrameModel.from_beanie for Beanie Document models with a sync pymongo.database.Database (Beanie uses async PyMongo). See MONGO_ENGINE.
  • Mongo I/O: fetch_mongo, iter_mongo, write_mongo and async afetch_mongo, aiter_mongo, awrite_mongo — eager dict[str, list] reads/writes against a PyMongo Collection (same import pattern as SQL I/O). pydantable[mongo] includes pymongo.

Documentation

  • MONGO_ENGINE — user guide for the optional Mongo engine (MongoDataFrame / MongoDataFrameModel, pydantable[mongo]).
  • Mongo: position Beanie Document models as the primary way to use pydantable with MongoDB (pydantable[mongo], from_beanie, sync_pymongo_collection); Pydantic Schema + from_collection documented as a supported alternative. See MONGO_ENGINE, README, and IO_DECISION_TREE.

Changed

  • Public API naming (SQL / Mongo): MongoDataFrame, MongoDataFrameModel, MongoPydantableEngine replace Entei*; modules pydantable.mongo_dataframe, pydantable.sql_dataframe are canonical (legacy paths mongo_entei, sql_moltres re-export). sql_engine_from_config and sql_engine= replace moltres_engine_*. pydantable[sql] includes eager SQLModel I/O, moltres-core, and the lazy SqlDataFrame stack (the separate lazy-sql and moltres extras are removed). Entei* and moltres_engine remain available with DeprecationWarning. User docs describe SQLAlchemy, PyMongo, and Beanie without third-party stack marketing names.
  • Optional dependencies: pydantable[mongo] includes Beanie (beanie>=1.24,<3) and pins entei-core to >=0.2.0,<0.3 (PyPI entei-core is versioned independently of pydantable 1.x). The separate mongo-beanie extra is removed — use pip install "pydantable[mongo]".
  • Optional dependencies: rapsqlite is not bundled in pydantable[sql] — install a DB-API driver for your database URL yourself (pip install rapsqlite only if you want the sqlite+rapsqlite dialect).

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.17.0.

[1.16.1] — 2026-04-13

Changed

  • Python: Narrow ImportError handling for optional dtype registry and native capabilities; debug logging on intentional broad I/O / Expr repr fallbacks; optional scan-column recovery iteration bound in _materialize_scan_fallback.
  • Rust: Split row-wise calendar/string helpers into expr/rowwise_support.rs.

Tests

  • DataFrame: Shared retail-style columnar scenarios (tests/_support/scenarios.py), multi-step workflow coverage (tests/dataframe/test_dataframe_realistic_workflows.py), Hypothesis strategies with Python oracles for discounted line totals and global sum/mean, and a global aggregate check on the scenario payload (tests/dataframe/test_global_agg.py).

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.16.1.

[1.16.0] — 2026-04-06

Added

  • Typing (DataFrame[Schema]): as_schema / try_as_schema / assert_schema with as_model / try_as_model / assert_model aliases.
  • Typing escape hatches (DataFrame[Schema]): *_as_schema (and *_as_model aliases) for join, melt, unpivot, and rolling_agg.
  • Typing contract tests: new Pyright + Astral ty snippet tests for the generic DataFrame API.
  • Docs: DataFrameModel “Pyright/ty golden path” section documenting the explicit *_as_model helpers.

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.16.0.

[1.15.2] — 2026-04-05

Fixed

  • _dtype_repr: On Python 3.9-3.10, builtin generics such as list[int] were shown as list because they are isinstance(..., type); resolve via get_origin / get_args like 3.11+.
  • Tooling: pyrightconfig.json include path points at tests/typing/test_pyright_dataframe_model_return_types.py (Pyright contract check).

Changed

  • Docs: Typing guide (ty vs mypy/Pyright, Any policy), troubleshooting (engine="auto" Rust fallback, async cancellation), I/O overview callouts, FASTAPI/DATAFRAMEMODEL non-deprecated SQL examples; DEVELOPER Rust unwrap note.
  • Typing: Astral ty rules (invalid-return-type, unsupported-operator at error); grouped-frame protocol; SQL afetch_sqlmodel / aiter_sqlmodel Sequence signatures; optional pyrightconfig-strict.json.
  • Tests: Scan-column engine-error regex regression tests; remove flaky wall-clock skip stub in async materialization tests; extra coverage for _describe_dtype, grouped agg, selectors, PySpark shims, and rust_engine delegation; Ruff-formatted typing stubs.

Version bump

  • Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.15.2.

[1.15.1] — 2026-04-04

Fixed

  • Typing: sql_moltres and DataFrameModel stubs aligned so mypy/pyright contract tests pass; pyrightconfig.json sets extraPaths for snippet imports.

Changed

  • Rust: Set-ops column assembly returns PyRuntimeError instead of panicking on unexpected internal map keys (exceptAll / intersectAll materialization).
  • Docs: DataFrame.join / DataFrameModel.join document that allow_parallel / force_parallel are not implemented; TYPING notes as_polars on collect / acollect deprecation for 2.0.
  • Version bump: Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published __version__ values to 1.15.1.

Tests

  • aiter_sqlmodel: early break, ThreadPoolExecutor, and batch_size=1 coverage.

[1.15.0] — 2026-04-03

Added

  • Optional Moltres integration: extra pydantable[lazy-sql] (install pydantable[sql] today — optional extras were consolidated) pulls moltres-core. New SqlDataFrame and SqlDataFrameModel in pydantable.sql_dataframe (also available as pydantable.SqlDataFrame / SqlDataFrameModel via lazy import) bind moltres_core.MoltresPydantableEngine using sql_config= (moltres_core.EngineConfig) or moltres_engine=. Helper moltres_engine_from_sql_config. User guide: SQL_ENGINE; protocol story: CUSTOM_ENGINE_PACKAGE.

Changed

  • pydantable[lazy-sql] (today pydantable[sql]) and [dev] now include greenlet so SQLAlchemy create_async_engine (for example sqlite+rapsqlite smoke tests and async SQLAlchemy workflows) works without an extra manual install.

  • Version bump: Align Python package metadata (pydantable, pydantable-protocol, pydantable-native), Rust crate pydantable-core, and published __version__ values to 1.15.0.

[1.14.1] — 2026-04-03

Changed

  • pydantable now requires pydantable-native at the same version, so pip install pydantable installs the Rust engine. Removed the pydantable-meta package and its release job.
  • Version bump: Align Python package metadata (pydantable, pydantable-protocol, pydantable-native), Rust crate pydantable-core, and published __version__ values to 1.14.1.

[1.14.0] — 2026-04-03

Added

  • Documentation: CUSTOM_ENGINE_PACKAGE — guide for authors publishing a separate engine package (dependencies, protocol implementation, wiring, expressions, I/O boundaries, testing, PyPI).
  • pydantable-protocol: zero-dependency distribution defining ExecutionEngine / PlanExecutor / SinkWriter, EngineCapabilities, UnsupportedEngineOperationError, and MissingRustExtensionError. Third-party engines (for example SQL backends) can pip install pydantable-protocol for typing and shared errors without depending on pydantable. pydantable pins the same version and re-exports protocols from pydantable.engine.protocols (and MissingRustExtensionError from pydantable._extension).
  • pydantable-native now depends only on pydantable-protocol (not pydantable). native_engine_capabilities lives in pydantable_native.capabilities. Tracing in the native engine uses pydantable.observe.span when pydantable is installed, otherwise a local pydantable_native._trace implementation (PYDANTABLE_TRACE behaves the same).

Changed

  • Internal: Introduced pydantable.engine (NativePolarsEngine, get_default_engine, get_expression_runtime) so execution is routed through a single abstraction; rust_engine remains a thin delegating module. See ADR-engines and DEVELOPER.
  • Version bump: Align Python package metadata ( pydantable, pydantable-protocol, pydantable-native, pydantable-meta), Rust crate pydantable-core, and published __version__ values to 1.14.0.

[1.13.0] — 2026-04-02

Added

  • SQLModel read I/O (Phase 0–1): pydantable[sql] includes sqlmodel. New APIs fetch_sqlmodel, iter_sqlmodel, afetch_sqlmodel, and aiter_sqlmodel in pydantable.io (also re-exported from pydantable), sharing batching and StreamingColumns semantics with fetch_sql_raw / iter_sql_raw. MissingOptionalDependency when sqlmodel is required but not installed.
  • SQLModel write I/O (Phase 2): write_sqlmodel, write_sqlmodel_batches, awrite_sqlmodel, awrite_sqlmodel_batches — DDL from SQLModel.__table__, replace_ok guard for if_exists="replace", optional validate_rows, strict column alignment. python/pydantable/io/sqlmodel_write.py; tests tests/test_sqlmodel_io_phase02.py.
  • SQLModel + DataFrameModel (Phase 3): classmethods fetch_sqlmodel, afetch_sqlmodel, iter_sqlmodel, aiter_sqlmodel, write_sqlmodel_data / awrite_sqlmodel_data; instance write_sqlmodel / awrite_sqlmodel; MyModel.Async.write_sqlmodelawrite_sqlmodel_data. python/pydantable/dataframe_model.py; stubs in python/pydantable/dataframe_model.pyi / typings/; tests tests/test_sqlmodel_dataframe_model.py.
  • Explicit string SQL (Phase 4): fetch_sql_raw, iter_sql_raw, write_sql_raw, afetch_sql_raw, aiter_sql_raw, awrite_sql_raw in pydantable.io (fetch_sql_raw / afetch_sql_raw also re-exported from pydantable root).
  • Schema bridging (Phase 5): sqlmodel_columns, DataFrameModel.assert_sqlmodel_compatiblepython/pydantable/io/sqlmodel_schema.py; tests tests/test_sqlmodel_bridge_phase05.py; docs IO_SQL, DATAFRAMEMODEL, SQLMODEL_SQL_ROADMAP.
  • Documentation + examples + testing gate (Phase 6): SQLModel-first SQLite examples docs/examples/io/sql_sqlite_sqlmodel_roundtrip.py, docs/examples/io/sql_sqlite_sqlmodel_streaming.py; IO_SQL sections for raw vs SQLModel-first examples; tests/test_doc_io_examples.py runs sql_sqlite_streaming.py and the SQLModel scripts alongside existing sql_sqlite_* examples.

Deprecated

  • Legacy string-SQL names (Phase 4): fetch_sql, iter_sql, write_sql, afetch_sql, aiter_sql, awrite_sql, write_sql_batches, awrite_sql_batches — emit DeprecationWarning; migrate to *_raw or SQLModel helpers. DataFrameModel.write_sql / awrite_sql delegate to the same deprecated pydantable.io entrypoints. Removal no earlier than 2.0.0 (VERSIONING). Tests: tests/test_sql_string_deprecation.py; default test run filters these warnings in pyproject.toml for backward-compatible suites.

Docs

Changed

  • Version bump: Align Python package metadata, Rust crate, and published __version__ to 1.13.0. (This release includes all SQLModel-first SQL I/O work since v1.12.0 — Phases 0–6 of SQLMODEL_SQL_ROADMAP — in one minor version.)

[1.12.0] — 2026-04-02

Changed

  • Version bump: Align Python package metadata, Rust crate, and published __version__ to 1.12.0.

[1.11.0] — 2026-04-01

Added

  • Tests (Phase E): tests/test_parquet_allow_missing_columns_e.py — directory scan with mismatched Parquet columns and allow_missing_columns=True.
  • Example (Phase E): docs/examples/io/parquet_allow_missing_columns.py ( tests/test_doc_io_examples.py ).
  • Parquet lazy scan_kwargs (1.11.0 Phase B1): hive_partitioning, hive_start_idx, try_parse_hive_dates, include_file_paths, row_index_name, row_index_offset forwarded to Polars ScanArgsParquet in pydantable-core (scan_kw.rs). Tests: tests/test_parquet_scan_hive_b1.py.
  • CSV lazy scan_kwargs (1.11.0 Phase B2): include_file_paths, row_index_name, row_index_offset, raise_if_empty, truncate_ragged_lines, decimal_comma, try_parse_dates forwarded to Polars LazyCsvReader in pydantable-core (lazy_csv_with_kwargs); shared row_index_* parsing with Parquet. Tests: tests/test_csv_scan_directory_b2.py (directory / *.csv glob, hive path behavior, unknown kw).
  • NDJSON lazy scan_kwargs (1.11.0 Phase B3): glob ( glob=False raises ValueError; Polars 0.53 NDJSON scans always expand paths), include_file_paths, row_index_name, row_index_offset in lazy_ndjson_with_kwargs (scan_kw.rs). Tests: tests/test_ndjson_scan_directory_b3.py.
  • IPC lazy scan_kwargs (1.11.0 Phase B4): record_batch_statistics plus UnifiedScanArgs fields (glob, cache, rechunk, n_rows, hive_partitioning, hive_start_idx, try_parse_hive_dates, include_file_paths, row_index_name, row_index_offset) via ipc_scan_from_kwargs (scan_kw.rs). Tests: tests/test_ipc_scan_directory_b4.py.
  • read_json (1.11.0 Phase B5): Confirmed alias of read_ndjson (JSON Lines lazy scan only); documentation for paths, glob, and scan_kwargs vs materialize_json for JSON array files. Tests: tests/test_read_json_paths_b5.py.
  • iter_chain_batches (pydantable.io / pydantable.io.batches): chain per-file iter_* iterators over an explicit path list. Test: tests/test_batches_chain.py.
  • Partitioned Parquet writes (1.11.0 Phase D1): DataFrame.write_parquet / DataFrameModel.write_parquetpartition_by (column names), mkdir, hive-style col=value/.../00000000.parquet shards via pydantable-core (sink_parquet_polars, Polars partition_by_stable). Tests: tests/test_write_parquet_partition_d1.py.
  • write_*_batches (Phase D2): reject an existing directory path; CSV/NDJSON mode documented. Tests: tests/test_write_batches_phase_d2.py.
  • Tests (Phase F): test_write_parquet_unknown_write_kw_raises in tests/test_write_parquet_partition_d1.py — invalid write_kwargs keys raise ValueError with unknown write_kw key (parity with scan_kwargs allowlist tests).

Docs

Changed

[1.10.0] — 2026-04-01

Added

  • Struct expressions (Polars): Expr.struct_json_encode, struct_json_path_match, struct_rename_fields, struct_with_fields (Rust ExprNode + lowering); PySpark façade pydantable.pyspark.sql.functions.struct_json_encode / struct_json_path_match. Tests: tests/test_struct_expr_phase_b.py.
  • JSON decode (Polars): Expr.str_json_decode(dtype) → struct or dict[str, T] via StringJsonDecode and shared polars_dtype mapping. Tests: tests/test_str_json_decode_phase_c.py.
  • Tests: tests/test_json_io_phase_a.py — nested materialize_json (array vs NDJSON), export_json round-trip and default=str for datetime / Decimal / UUID, lazy read_ndjson / read_json alias with nested struct + list, eager path with dict[str, T] map column.

Docs

  • JSON modeling: JSON (RFC 8259) vs column types in SUPPORTED_TYPES (heterogeneous arrays, arbitrary JSON, link from IO_JSON).
  • I/O: Eager export_json serialization in IO_JSON; extended export_json docstring in pydantable.io (json.dump + default=str); struct → JSON text pointer (struct_json_encode).
  • Structs: SUPPORTED_TYPES and INTERFACE_CONTRACT — struct JSON / with_fields / rename_fields semantics and row-wise limits.
  • FastAPI: columnar map / nested field notes with links to SUPPORTED_TYPES and IO_JSON.
  • Roadmap: Phase A + B + C JSON/struct work summarized in this 1.10.0 section and ROADMAP (Shipped in 1.10.0); str_json_decode / error semantics in SUPPORTED_TYPES and INTERFACE_CONTRACT; IO_JSON cross-link.
  • Phase D (I/O): IO_JSONread_json vs read_ndjson vs materialize_json, large-file / streaming patterns, NDJSON scan_kwargs presets; example docs/examples/io/large_ndjson_patterns.py; cross-links from DATA_IO_SOURCES, EXECUTION, IO_NDJSON.
  • Phase E (UX) & 1.10.0 JSON/struct summary: SELECTORSs.structs(), unnest, struct_field pipeline; cookbook json_logs_unnest_export (NDJSON → unnest → export_json); DOCS_MAP link. Release narrative: JSON ↔ schema matrix and I/O tests; struct expressions (struct_json_encode, path/rename/with-fields); str_json_decode; Phase D large-file NDJSON docs; Phase E selectors + cookbook + this page.

Changed

  • Versioning: Python package metadata and Rust crate aligned at 1.10.0 for this release.

[1.9.0] — 2026-04-01

Added

  • PySpark UI parity: groupBy returning PySparkGroupedDataFrame / PySparkGroupedDataFrameModel (aggregations stay Spark-flavored), sort, crossJoin, frame action count() → int (via global_row_count()), unionByName (optional allowMissingColumns), intersect / subtract / exceptAll (join-layer semantics; exceptAll aliases subtract, not Spark multiset EXCEPT ALL), fillna / dropna / .na, printSchema, explain, toPandas, and the same methods on DataFrameModel. See PYSPARK_UI, PYSPARK_PARITY, and INTERFACE_CONTRACT.
  • Engine: cast_expr / Expr.cast now accepts Literal(None) (unknown-base SQL NULL) and casts it to a nullable scalar dtype, enabling typed null padding (e.g. unionByName(..., allowMissingColumns=True)).
  • Temporal: Expr.dt_dayofyear, Expr.from_unix_time, PySpark F.dayofyear / F.from_unixtime (numeric epoch → UTC-naive datetime; Spark’s optional from_unixtime format string is not modeled—use parsing helpers on strings). Rust: TemporalPart::DayOfYear, ExprNode::FromUnixTime.
  • Introspection: DataFrame.describe() (and PySpark summary()) now includes date and datetime columns: non-null count, min, max, and null count (one to_dict() materialization). Tests: tests/test_dataframe_discovery.py.

Docs / tooling

[1.8.0] — 2026-03-31

Added

  • Selectors: selector-driven column and rename helpers (see pydantable.selectors and SELECTORS).
  • Core DataFrame ergonomics: row_count, clip, and drop_nulls arguments and convenience behavior aligned with the 1.8 parity push (see POLARS_PARITY_1_8 and PARITY_SCORECARD).
  • Joins: additional join argument parity including join_nulls and maintain_order (typed contract preserved; see INTERFACE_CONTRACT).
  • Reshape: pivot_longer / pivot_wider and related reshape ergonomics (see POLARS_WORKFLOWS and INTERFACE_CONTRACT reshape notes).

Docs

  • Add explicit 1.8.0 release entry and align “current release” references with the changelog.

[1.7.0] — 2026-03-30

Added

  • Pandas UI (schema-first): duplicated / drop_duplicates(keep=False) backed by engine plan steps where Polars is enabled; typed get_dummies with cardinality guard; eager cut / qcut, factorize_column, narrow ewm(...).mean() (may require pandas at runtime); façade pivot delegating to core. See PANDAS_UI and PARITY_SCORECARD.
  • Tests: tests/test_pandas_ui_popular_features.py — extended coverage for duplicates, dummies, binning, factorize, ewm, and pivot.

Docs

  • Refreshed PANDAS_UI (correct pivot façade, get_dummies null/boolean behavior, naming map, test links), INTERFACE_CONTRACT (duplicate detection, value_counts dict note), DOCS_MAP, QUICKSTART, TROUBLESHOOTING (optional pandas for eager helpers), EXECUTION, DEVELOPER, DATAFRAMEMODEL, POLARS_TRANSFORMATIONS_ROADMAP, PARITY_SCORECARD, and root README cross-links.

Docs / tooling

  • Versioning: bump to 1.8.0 across Python package metadata, Rust crate, and shipped stubs; docs “current release” strings aligned.

[1.6.1] — 2026-03-30

Fixed

  • Async iterator bridge: hardened pydantable.io.aiter_sql() and internal _aiter_from_iter() against deadlocks when the async consumer stops early (e.g. client disconnect / early return). Producer threads now exit cleanly and do not block forever on a bounded queue.
  • Deferred materialization: ExecutionHandle.result() now shields the underlying concurrent.futures.Future so cancelling the awaiting task cancels the wait but does not cancel the background engine work.
  • Submit cancellation race: DataFrame.submit() now avoids InvalidStateError when a handle is cancelled before work starts (mirrors ThreadPoolExecutor semantics).
  • Lazy scan missing optional columns: recovery for missing optional scan columns is now tolerant to error-message variants (not coupled to one brittle regex).

Docs / tooling

  • Read the Docs build: install pydata-sphinx-theme during RTD builds to match the configured html_theme (docs/conf.py).
  • Versioning: bump to 1.6.1 across Python package metadata, Rust crate, and shipped stubs; docs “current release” strings aligned.

[1.6.0] — 2026-03-30

Summary: FastAPI helpers (columnar OpenAPI bodies, NDJSON, register_exception_handlers), pydantable.errors, submit / stream / astream, PlanMaterialization, awaitable lazy reads (AwaitableDataFrameModel), Rust async plan execution when available, and docs/cookbooks for services. Breaking: removed legacy DataFrameModel eager SQL / materialize_* shims—use from pydantable import … (eager SQL / materialize_*) and read_* / aread_* as described under Removed below.

Removed

  • DataFrameModel eager I/O shims: materialize_*, amaterialize_*, fetch_sql, afetch_sql, iter_sql, aiter_sql, from_sql, afrom_sql. Use from pydantable import … for eager dict[str, list] loads and SQL streaming, then MyModel(cols, ...); keep lazy scans on read_* / aread_*.

Added

  • pydantable.errors: PydantableUserError, ColumnLengthMismatchError (column length mismatch at schema ingest). register_exception_handlers maps ColumnLengthMismatchError400 with JSON detail.
  • pydantable.fastapi: columnar_body_model, columnar_body_model_from_dataframe_model, columnar_dependency, rows_dependency — OpenAPI-friendly columnar bodies and Depends factories for DataFrameModel; see FASTAPI.
  • pydantable.testing.fastapi: fastapi_app_with_executor, fastapi_test_client (lifespan-aware TestClient for executor_lifespan / get_executor).
  • pydantable.fastapi: ndjson_streaming_response / ndjson_chunk_bytes for NDJSON StreamingResponse from astream() without hand-rolling encoders.
  • pydantable.fastapi (optional pip install 'pydantable[fastapi]'): executor_lifespan, get_executor (Depends), register_exception_handlers for MissingRustExtensionError / pydantic.ValidationError. See GOLDEN_PATH_FASTAPI and FASTAPI.
  • pydantable.typing.SupportsLazyAsyncMaterialize: structural Protocol for objects with async terminal materialization via acollect (DataFrameModel and AwaitableDataFrameModel).
  • AwaitableDataFrameModel: aread_parquet, aread_ipc, aread_csv, aread_ndjson, and aread_json return a chainable awaitable (select / filter / … then await …acollect()) so async routes avoid nested await on the read. Lazy metadata: await …columns / shape / empty / dtypes; then for custom sync/async steps; concat to merge multiple pending chains or concrete models. Async-first names: unprefixed terminals on the chain — collect, to_dict, to_polars, to_arrow, rows, to_dicts, stream (aliases of the a* methods); DataFrameModel.Async.read_* / Async.write_sql / Async.export_* mirror aread_* / awrite_sql / aexport_* without the a prefix (read_parquet cannot replace aread_parquet on the class itself because read_parquet is the sync lazy reader). Pending chains show a descriptive repr (read path + chained transforms).
  • DataFrameModel.aexport_parquet, aexport_csv, aexport_ndjson, aexport_ipc, aexport_json: async eager exports via the same aexport_* implementation module as pydantable.io (prefer DataFrameModel classmethods in application code).
  • Rust async bridge: async_execute_plan and async_collect_plan_batches on pydantable_native._core (Tokio + pyo3-async-runtimes); acollect / ato_* prefer this awaitable when present.
  • DataFrame.submit / DataFrameModel.submit and ExecutionHandle (result, done, cancel) for background collect.
  • DataFrame.astream / DataFrameModel.astream: async iteration of column dict chunks after one engine collect (see EXECUTION).
  • DataFrame.stream / DataFrameModel.stream: synchronous dict[str, list] chunk iterator (same semantics as astream); PlanMaterialization and plan_materialization_summary() label the four terminal modes (blocking, async, deferred, chunked).

Docs

[1.5.0] — 2026-03-29

Added

  • Batched column-dict I/O: iter_* / aiter_* readers and write_*_batches writers for core formats (Parquet, IPC, CSV, NDJSON, JSON array/lines) plus selected extras (Excel/Delta/Avro/ORC/BigQuery/Snowflake/Kafka where supported).
  • Engine streaming default propagation: engine_streaming= alias and per-frame defaults set by lazy read_* / aread_*, applied to later collect() / to_* / lazy write_* unless overridden.

Fixed

  • IPC batch iteration: iter_ipc(..., as_stream=False) now works with PyArrow RecordBatchFileReader (file format readers are not iterable in some versions).

[1.4.0] — 2026-03-29

Added

  • SQL streaming (SQLAlchemy): iter_sql / aiter_sql for batch iteration of SELECT results; DataFrameModel.iter_sql / aiter_sql yield typed batch models.
  • SQL batch sinks: write_sql_batches / awrite_sql_batches for end-to-end streaming (consume batches without building one giant in-memory dict).

Changed

  • fetch_sql / afetch_sql: support batch_size= and automatic streaming behavior for large results (may return a streaming container with .to_dict()).
  • write_sql / awrite_sql: support chunk_size= and stream inserts in chunks to reduce peak memory use.

[1.3.0] — 2026-03-29

Added

  • Expr (type-specific): list_join, list_sort, and list_unique on homogeneous lists; dt_week (ISO week, date / datetime); str_reverse, str_pad_start / str_pad_end, str_zfill, str_extract_regex, and str_json_path_match (Polars engine; semantics in SUPPORTED_TYPES and INTERFACE_CONTRACT).
  • Docs / tests: expanded expression contracts in SUPPORTED_TYPES and TYPING; integration coverage in tests/test_type_specific_expr.py.

Removed

  • CI / Release: CycloneDX SBOM generation and upload jobs (too fragile for default automation); generate SBOMs locally if required (see DEVELOPER Optional CycloneDX SBOMs).

[1.2.0] — 2026-03-28

Added

  • Column types (see SUPPORTED_TYPES):
  • typing.Literal[...] — homogeneous str, int, or bool members only; dtype descriptors include an optional literals list; invalid filter(col == ...) constants are rejected when the expression is built.
  • ipaddress.IPv4Address / IPv6Address — Polars Utf8, canonical string form; string cells coerce on ingest.
  • pydantable.types.WKBbytes subclass for Well-Known Binary geometry; Polars Binary (same Expr surface as bytes where applicable).
  • Annotated[str, ...] — logical str in the Rust plan; Pydantic applies metadata on collect() / RowModel.
  • Tests: tests/test_extended_scalar_dtypes_v12.py, typing-engine parity for these scalars, mypy/pyright DataFrameModel chain snippets.
  • Docs: practical notes for Expr comparisons (IP/WKB operands), TYPING (1.2 scalars), DATAFRAMEMODEL field list.

Fixed

  • cargo check -p pydantable-core --no-default-features: exhaustive DTypeDesc::Scalar matches and row-wise CompareOp / cast_literal_value coverage for IPv4 / IPv6 / WKB when polars_engine is off.

Typing / lint

  • __all__ and Ruff-driven cleanups on schema, types, and tests.

[1.1.0] — 2026-03-27

Added

  • Typing: DataFrameModel transform methods return derived model types so mypy/pyright can verify the schema after select, drop, rename, with_columns, join, and group_by(...).agg(...) without materializing between steps (see GitHub issue #1).
  • Tests: expanded coverage (I/O fallbacks, PySpark/expr edges, schema helpers) and mypy regression updates in tests/test_mypy_dataframe_model_return_types.py.

[1.0.0] — 2026-03-26

Scope

  • Production-ready major release focused on API stability and semver contract clarity.
  • No large new execution-engine features are required for the tag.

Added

  • Ingest/docs consistency for missing optional behavior:
  • fill_missing_optional documented consistently across constructor and typed lazy-read materialization paths.
  • Explicit schema defaults on optional fields (for example note: str | None = "n/a" or = None) now take precedence when fill_missing_optional=False instead of raising.
  • 1.0.0 readiness documentation:
  • explicit 1.x semver policy in VERSIONING,
  • release gate checklist and security-advisory handling in DEVELOPER,
  • roadmap, README, and docs index updates for 1.0 communication and support matrix policy.
  • [docs] extra includes SQLAlchemy so Sphinx (-W) and sphinx-autodoc-typehints resolve DataFrameModel Engine / Connection annotations in CI (matches Read the Docs).

Changed

  • Documentation includes migration guidance from earlier missing_optional string-style wording ("fill_none" / "error") to boolean fill_missing_optional=True/False.

Stability commitments

Upgrade guidance

  • Canonical upgrade path from 0.20.x/0.23.x is documented in README.md and linked from the docs index; I/O renames from 0.22.x/0.23.x are summarized under 0.23.0 below.

[0.23.0] — 2026-03-25

Highlights

  • Out-of-core file workflows: read_parquet, read_csv, read_ndjson, read_ipc, read_json (and aread_*) return a ScanFileRoot so DataFrame / DataFrameModel can run transforms on a Polars LazyFrame without loading the full file into Python lists first.
  • DataFrame.write_parquet (and write_csv, write_ipc, write_ndjson): write the lazy pipeline from the Rust engine without building a giant dict[str, list] for the result.
  • Breaking — public I/O renames: sync/async eager file reads into columns are materialize_* / amaterialize_*. Lazy local files use read_* / aread_*. Eager dict[str, list] → file uses export_* / aexport_*. SQL read_sql / aread_sqlfetch_sql / afetch_sql. Eager HTTP(S) column readers (0.22 read_*_url) → fetch_parquet_url, fetch_csv_url, fetch_ndjson_url. Lazy HTTP Parquet (temp file on disk) stays read_parquet_url / aread_parquet_url — use read_parquet_url_ctx / aread_parquet_url_ctx to delete the temp file when done (IO_HTTP). Top-level pydantable exports and DataFrameModel classmethods follow the same vocabulary.
  • Pre-release / internal names: development builds that still exposed scan_* / ascan_* or sink_* for lazy I/O now align with the public read_* / write_* names; pydantable re-exports read_parquet, read_parquet_url, aread_parquet, aread_parquet_url, export_parquet (replacing scan_* / write_parquet on the package root).

Added

  • JSON (array of objects): read_json, materialize_json, export_json, aread_json, amaterialize_json, aexport_json — local lazy scan and eager column dicts (see IO_JSON).
  • read_parquet_url_ctx / aread_parquet_url_ctx: context managers that delete the temporary Parquet file when the block exits (see IO_HTTP).
  • DataFrameModel: classmethods export_*, write_sql / awrite_sql, from_sql / afrom_sql delegating to pydantable.io.
  • MissingRustExtensionError: subclass of NotImplementedError when the native extension is missing or incomplete on lazy scan/sink paths and execute_plan (still catchable as NotImplementedError).
  • HTTP / object store safety: max_bytes on fetch_bytes and read_from_object_store; chunked reads with ValueError when exceeded.
  • Docs: IO_DECISION_TREE, IO_JSON, IO_HTTP updates, engine matrix in IO_OVERVIEW, FASTAPI executor guidance; README and manual pages refreshed for 0.23.x I/O.

Details

  • Rust: ScanFileRoot, plan_to_lazyframe, internal sink exports for lazy writes; join/groupby/reshape entrypoints work with lazy file roots where implemented (see EXECUTION matrix).
  • Python: read_csv_stdin uses materialize_csv internally.
  • Docs: EXECUTION memory model and streaming/collect compatibility matrix (PYDANTABLE_ENGINE_STREAMING reserved); DATA_IO_SOURCES, FASTAPI, INTERFACE_CONTRACT, ROADMAP, README.

Migration (from 0.22.x)

Old (0.22.x) Use instead (0.23.0)
Eager file → dict[str, list] via read_parquet, aread_parquet, … materialize_parquet, amaterialize_parquet, …
read_sql, aread_sql fetch_sql, afetch_sql
Eager URL → columns via read_parquet_url / read_csv_url / read_ndjson_url fetch_parquet_url, fetch_csv_url, fetch_ndjson_url
Lazy HTTP Parquet (unchanged name, new cleanup helpers) Still read_parquet_url / aread_parquet_url; prefer read_parquet_url_ctx / aread_parquet_url_ctx for automatic temp-file removal
Large local file, filter → Parquet out read_parquet + transforms + DataFrame.write_parquet

[0.22.0] — 2026-03-25

Highlights

  • Comprehensive I/O: the pydantable.io package adds Rust-backed (Polars) read_* / write_* for Parquet, Arrow IPC, CSV, and NDJSON into dict[str, list], with Python::allow_threads on read hot paths; PyArrow remains the default for buffers, column projection, and streaming IPC. Async mirrors aread_* / awrite_* use asyncio.to_thread (optional executor=), matching acollect / ato_arrow.
  • SQLAlchemy bridge: read_sql / write_sql ( pip install 'pydantable[sql]' + your DB driver) for URL/engine SELECT → column dict and append/replace inserts across SQLAlchemy-supported databases.
  • Transports (experimental): HTTP(S) fetch_bytes, read_parquet_url, read_csv_url, read_ndjson_url, and fsspec-based read_from_object_store — opt in with experimental=True or PYDANTABLE_IO_EXPERIMENTAL=1.
  • Tier-2/3 extras (best-effort): [excel], [kafka], [bq], [snowflake], [cloud]; helpers such as read_excel, read_delta, read_kafka_json_batch, read_csv_stdin / write_csv_stdout (see docs/io/data-io-sources.md).
  • Optional [rap]: true-async CSV via aread_csv_rap when rapcsv + rapfiles are installed.
  • Engine override: set PYDANTABLE_IO_ENGINE=rust or pyarrow to force file readers/writers.
  • Release quality bar: the v0.22.0 tag is cut from a commit that passes make check-full, full pytest, and Rust checks including --no-default-features.
  • Supply chain: the release workflow publishes CycloneDX SBOMs (Python + Rust) alongside wheels/sdist.
  • Support matrix: Python 3.10–3.13.

Details

  • Rust: new pydantable_native._core exports io_read_*_path / io_write_*_path; column-dict writes round-trip through Python polars.DataFrame → IPC → Rust writers (install pydantable[polars] for writes).
  • Tests: tests/test_io_comprehensive.py (round-trips, SQLite SQL, local HTTP server for URL Parquet).
  • CI: Python test job installs sqlalchemy with other dev deps.

[0.21.0] — 2026-03-25

Highlights

  • Streamlit: DataFrame and DataFrameModel implement the Python DataFrame Interchange Protocol (__dataframe__) via PyArrow so st.dataframe(df) can render a typed pydantable frame directly when pyarrow is installed (pip install 'pydantable[arrow]'). For editing, use st.data_editor(df.to_arrow()) (or to_polars()). See STREAMLIT and EXECUTION (interchange).

[0.20.0] — 2026-03-25

Supersedes: 0.19.0.

Highlights

  • UX / discovery: Core DataFrame and DataFrameModel expose columns, shape, empty, dtypes, info(), and describe() for int, float, bool, and str columns (one to_dict() materialization). shape[0] follows root-buffer semantics—see INTERFACE_CONTRACT Introspection, EXECUTION.
  • Docs: QUICKSTART (five-minute tour), repository notebooks/five_minute_tour.ipynb, EXECUTION sections on materialization costs, import styles, copy-as / interchange; naming map in PANDAS_UI / PYSPARK_UI.
  • Display: pydantable.displayget_repr_html_limits, set_display_options, reset_display_options; env PYDANTABLE_REPR_HTML_* for Jupyter HTML preview bounds.
  • DataFrame.value_counts / DataFrameModel.value_counts (group-by path); _repr_mimebundle_ on DataFrame and DataFrameModel (text/plain + text/html).
  • Debugging: PYDANTABLE_VERBOSE_ERRORS=1 appends schema context to ValueError from execute_plan.
  • Expressions: Expr, ColumnRef, WhenChain, and pending window helpers implement readable __repr__. Tests: tests/test_expr_repr.py.
  • PySpark façade: DataFrame.show() and summary() (alias of describe()). See PYSPARK_UI, PYSPARK_PARITY.
  • Documentation: README, index, ROADMAP, PARITY_SCORECARD, PANDAS_UI, DEVELOPER.

Details

  • Repr / HTML: Multi-line DataFrame.__repr__ and _repr_html_ (card-style HTML; grouped/model banners). See EXECUTION, tests/test_dataframe_repr.py.
  • Tests: tests/test_display_options.py, tests/test_dataframe_discovery.py, tests/test_rust_engine_verbose_errors.py. See EXECUTION, INTERFACE_CONTRACT.
  • Release hygiene: make check-full, full pytest, cargo test --all-features per DEVELOPER.

[0.19.0] — 2026-03-24

Highlights

  • Pre-1.0 consolidation: VERSIONING documents 0.x patch vs minor expectations; INTERFACE_CONTRACT links there for semver scope while staying the behavioral source of truth.
  • Roadmap to 1.0: ROADMAP Shipped in 0.19.0 replaces the planned checklist; Planned v1.0.0 items that belong on the 1.0.0 tag (full 1.x semver policy, SBOM, comms) remain explicitly deferred there with rationale below.
  • Parity docs: POLARS_TRANSFORMATIONS_ROADMAP, PARITY_SCORECARD, PYSPARK_PARITY, README, and index updated for current release and 0.19 → 1.0 clarity—no new table methods or PySpark functions rows.
  • Performance: PERFORMANCE adds an 0.19.0 validation note (key scripts spot-checked; no headline number refresh vs 0.18.x paths).
  • CI / tests: Grouped output comparisons in tests/test_v018_features.py sort by group key where row order is not API-guaranteed (stable pytest-xdist on Linux).

Details

See ROADMAP Shipped in 0.19.0. Release hygiene: make check-full, cargo test --all-features, cargo check --no-default-features, full pytest before tag; GitHub Actions install deps aligned with DEVELOPER / pyproject.toml [dev].

Deferred to v1.0.0 tag (not blocking 0.19.0): formal 1.x semver publication, PyPI packaging dry-run narrative, SBOM/supply-chain notes, support matrix as a 1.0.x commitment, and README/index “1.0 leads” copy—see ROADMAP Planned v1.0.0.

[0.18.0] — 2026-03-22

Highlights

  • Grouped execution errors: Polars collect() failures during group_by().agg() may include (group_by().agg()) in the ValueError text (via polars_err_ctx) so they are identifiable as grouped aggregation runtime errors. See EXECUTION.
  • Maps: Non-string map keys (dict[int, T], non-UTF-8 Arrow map keys) remain unsupported and are explicitly deferred for this release (SUPPORTED_TYPES, ROADMAP Later).
  • Documentation: Post–P7 note in POLARS_TRANSFORMATIONS_ROADMAP (phases complete; further parity is additive). PARITY_SCORECARD, PYSPARK_PARITY, DEVELOPER, ROADMAP updated. No new PySpark sql.functions wrappers or table API changes.
  • Tests: Hypothesis + integration coverage for group_by / join (tests/test_hypothesis_properties.py, tests/test_v018_features.py); Rust polars_err_ctx message format (execute_polars/common.rs, polars_err_format_tests).

Details

See ROADMAP Shipped in 0.18.0. INTERFACE_CONTRACT aggregation rules are unchanged; the doc notes optional group_by().agg() error-message context.

[0.17.0] — 2026-03-18

Highlights

  • Maps (string keys): Documented and tested Expr behavior for map_get / map_contains_key on columns ingested from PyArrow map<utf8, …> (missing key → null). Non-string Python dict[int, T] map keys remain unsupported (deferred); see ROADMAP Later.
  • PySpark façade: PYSPARK_PARITY.md — new thin pydantable.pyspark.sql.functions wrappers: str_replace, regexp_replace (alias, literal replace), strip_prefix, strip_suffix, strip_chars, strptime, binary_len, list_len, list_get, list_contains, list_min, list_max, list_sum (core Expr / Rust lowering unchanged). Tests: tests/test_pyspark_sql.py.
  • Docs: Refreshed PARITY_SCORECARD.md, POLARS_TRANSFORMATIONS_ROADMAP.md, SUPPORTED_TYPES.md (map + Arrow ingest note).

Details

See ROADMAP Shipped in 0.17.0.

[0.16.1] — 2026-03-27

Fixed

  • Expression typing: Binary arithmetic on dict[str, T] map columns (for example df.m + 1 or df.m + df.m) now raises TypeError in Rust (infer_arith_dtype) instead of panicking on an internal unwrap. Regression test: tests/test_expr_070_surfaces.py.
  • Constructors: validate_columns_strict (and therefore DataFrame[Schema](pa.Table) / RecordBatch) imported Arrow conversion helpers from the wrong submodule (pydantable.schema.io, which does not exist). Imports now use pydantable.io, matching DataFrameModel. Regression: tests/test_arrow_interchange.py (test_dataframe_generic_accepts_pa_table).

[0.16.0] — 2026-03-26

Highlights

  • Arrow interchange: read_parquet and read_ipc (optional as_stream for streaming IPC) return dict[str, list] for DataFrame / DataFrameModel. to_arrow / ato_arrow materialize a PyArrow Table after the same engine path as to_dict (not zero-copy). Optional extra pydantable[arrow] (pyarrow>=14). Constructors accept pa.Table / RecordBatch when pyarrow is installed.
  • FastAPI: FASTAPI.md — multipart Parquet upload, Depends executor pattern, background-task notes, 422 vs application error guidance. python-multipart in [dev] and CI workflows. Tests: tests/test_fastapi_recipes.py (multipart + invalid body 422), tests/test_arrow_interchange.py; scripts/verify_doc_examples.py extended.
  • Docs: EXECUTION.md, SUPPORTED_TYPES.md, INTERFACE_CONTRACT.md, ROADMAP.md, README.md, index.md.

Details

See ROADMAP Shipped in 0.16.0. Sync read_parquet / read_ipc are blocking; use asyncio.to_thread or an executor from async def routes for large files if loop latency matters.

  • CI / release: actions/cache@v5 in ci.yml and release.yml (clears GitHub Actions Node 20 deprecation warnings for actions/cache@v4). release.yml uses maturin build + twine upload --skip-existing per platform instead of deprecated maturin publish (see PyO3/maturin#2334); TWINE_USERNAME=__token__ and PYPI_API_TOKEN unchanged.

[0.15.0] — 2026-03-25

Highlights

  • Async materialization: acollect, ato_dict, ato_polars on DataFrame; DataFrameModel adds the same plus arows and ato_dicts. Work runs in asyncio.to_thread or an optional executor=. See EXECUTION, FASTAPI.
  • FastAPI: async def route examples, lifespan + ThreadPoolExecutor, and StreamingResponse guidance (manual chunking; no built-in row iterator yet). tests/test_fastapi_recipes.py and scripts/verify_doc_examples.py extended.
  • Arrow-native maps: PyArrow map<utf8, …> arrays (and chunked) ingest for dict[str, T] columns; convert to Python dict cells. String keys only; strict checks scalar value types (nested map values: best-effort). Tests: tests/test_pyarrow_map_ingest.py. SUPPORTED_TYPES updated.
  • PySpark façade: trim, abs, round, floor, ceil in pydantable.pyspark.sql.functions (and package __all__). PYSPARK_PARITY updated.
  • Constructor cleanup: validate_data removed from DataFrame and DataFrameModel. Ingest depth uses trusted_mode only (off / shape_only / strict; omit for full per-element validation). Passing validate_data=... raises TypeError. Removed internal schema helpers _VALIDATE_DATA_KW_UNSET, _warn_validate_data_kw_deprecated, and _coerce_validate_data_kw. Direct callers of validate_columns_strict may still use validate_elements as a legacy bridge. Docs (DATAFRAMEMODEL, FASTAPI, SUPPORTED_TYPES, PERFORMANCE, etc.) describe trusted_mode only on constructors.
  • Dev: pytest-asyncio in [dev]; asyncio_mode = auto in pyproject.toml.

Tests

  • tests/test_async_materialization.py, tests/test_pyarrow_map_ingest.py, tests/test_v015_features.py, tests/test_v015_constructor_api.py; extended tests/test_fastapi_recipes.py. tests/test_v014_features.py, tests/test_dataframe_model.py, tests/test_dataframe_ops.py: TypeError when validate_data is passed; trusted paths use trusted_mode only.

Details

See ROADMAP Shipped in 0.15.0. Sync collect / to_dict / to_polars are unchanged aside from constructor kwargs (drop validate_data; use trusted_mode). You may replace manual asyncio.to_thread wrappers with acollect / ato_*.

rust_version() in the extension reports env!("CARGO_PKG_VERSION") so it matches pyproject.toml / Cargo.toml.

[0.14.0] — 2026-03-23

Highlights

  • Window orderBy null placement: nulls_last on Window.partitionBy(...).orderBy(...) (per-column list or bool); framed windows use all keys; unframed Polars .over uses the first key for SortOptions. Docs: WINDOW_SQL_SEMANTICS, INTERFACE_CONTRACT.
  • Trusted shape_only: pydantable.DtypeDriftWarning when data would fail strict; env PYDANTABLE_SUPPRESS_SHAPE_ONLY_DRIFT_WARNINGS=1 to silence. See SUPPORTED_TYPES.
  • validate_data deprecation: explicit validate_data= without trusted_mode raises DeprecationWarning (removal shipped in 0.15.0). See DATAFRAMEMODEL, FASTAPI.
  • PySpark façade: dayofmonth, lower, upper in pydantable.pyspark.sql.functions. PYSPARK_PARITY updated.
  • FastAPI DX: TestClient recipes and OpenAPI notes in FASTAPI; tests/test_fastapi_recipes.py; fastapi / httpx in [dev] and CI.
  • Hypothesis: extra property test for with_columns identity; DEVELOPER documents running property tests.
  • Tests: tests/test_v014_features.py covers DtypeDriftWarning (including multi-column drift), window lag / row_number with null sort order, FastAPI 422 / OpenAPI requestBody, and PySpark dayofmonth / lower / upper. (Constructor validate_data was deprecated here and removed in 0.15.0 — see changelog [0.15.0].)

Details

See ROADMAP Shipped in 0.14.0.

[0.13.0] — 2026-03-23

Highlights

  • Stabilization + combined scope: FASTAPItrusted_mode / validate_data, column-shaped dict[str, list] bodies, sync materialization and pointers forward to async work (shipped in 0.15.0); trust-boundary guidance for large / pre-validated tables and Polars / Arrow; install notes for PyPI wheels vs git builds.
  • Sync-only I/O (at time of 0.13.0): EXECUTION and PERFORMANCE described blocking materialization; async APIs arrived in 0.15.0 (EXECUTION, FASTAPI). Tuning text prefers trusted_mode alongside validate_data.
  • Window semantics (docs): null ordering and CURRENT ROW / peer framing in WINDOW_SQL_SEMANTICS and INTERFACE_CONTRACT; Window docstring in window_spec.py. (User-facing NULLS FIRST / LAST shipped in 0.14.0.)
  • Trusted strict + PyArrow: isinstance(..., pa.Array | pa.ChunkedArray) in trusted buffers (concrete array types such as Int64Array); stricter scalars for int, float, decimal, enum, uuid, and temporal Arrow types. Tests in tests/test_trusted_strict_pyarrow.py; pyarrow>=14 in [dev] and CI. (shape_only drift warnings shipped in 0.14.0.)
  • Performance: benchmarks/framed_window_bench.py, benchmarks/trusted_polars_ingest_bench.py; PERFORMANCE table and cross-links.
  • Discoverability: index, README roadmap table, and cross-links across INTERFACE_CONTRACT, WINDOW_SQL_SEMANTICS, POLARS_TRANSFORMATIONS_ROADMAP.
  • CI: RUSTSEC-2025-0141 audit-step comment; GitHub Actions versions reviewed (actions/cache@v4, etc.).
  • Examples: scripts/verify_doc_examples.py covers new FastAPI patterns (trusted ingest + columnar body).

Details

Release audit: make check-full and full pytest green with a release maturin build. No regressions to rangeBetween, trusted strict, or map_from_entries beyond documentation and PyArrow strict hardening above.

Roadmap (editorial): 0.13.0 ships the documentation-first stabilization track together with the scope formerly planned as Remaining in 0.13.x / 0.14.0. Async materialization shipped in 0.15.0 (see changelog [0.15.0]).

See FASTAPI, EXECUTION, PERFORMANCE, ROADMAP, and INTERFACE_CONTRACT.

[0.12.0] — 2026-03-22

Highlights

  • Multi-key rangeBetween: aggregate window frames may use multiple orderBy columns; sort is lexicographic and range bounds apply to the first sort key (PostgreSQL-style). Documented in WINDOW_SQL_SEMANTICS.
  • Trusted strict ingest: Polars columns are matched structurally to nested annotations (list / dict[str, T] / nested Schema structs); columnar Python paths get the same nested shape checks.
  • Contracts and parity docs: refresh INTERFACE_CONTRACT, PySpark UI/scorecard, roadmap; add duplicate-key policy for map_from_entries (SUPPORTED_TYPES).
  • Regression tests: broader coverage for multi-key rangeBetween (desc/mixed orderBy, partitions, date/datetime axis, window_mean/window_min), PySpark window mirrors, trusted strict nested paths (Python + Polars), map_from_entries duplicate keys, and DataFrame / DataFrameModel strict parity.

Details

See INTERFACE_CONTRACT, PYSPARK_PARITY, PARITY_SCORECARD, WINDOW_SQL_SEMANTICS, and ROADMAP.

[0.11.0] — 2026-03-23

Highlights

  • Window range semantics v2: rangeBetween supports numeric, date, datetime, and duration order keys (single orderBy key), with deterministic boundary-inclusive behavior.
  • Map ergonomics expanded: add map_from_entries() and PySpark-compatible element_at() alias; map entry roundtrip coverage expanded.
  • Trusted ingest modes: add explicit trusted modes (shape_only, strict) alongside compatibility with validate_data, including stricter nullability and dtype checks for trusted columnar paths.
  • Parity coverage expansion: add dedicated DataFrame/DataFrameModel parity tests and additional PySpark map parity contracts.
  • Release hardening: update docs/contracts and version metadata for the 0.11.0 line.

Details

See INTERFACE_CONTRACT, PYSPARK_PARITY, SUPPORTED_TYPES, and ROADMAP.

[0.10.0] — 2026-03-23

Highlights

  • Framed windows expanded: framed execution now covers window_mean, window_min, window_max, lag, lead, rank, and dense_rank in addition to row_number / window_sum.
  • Map utilities: add map_keys() and map_values() to complement map_len, map_get, and map_contains_key.
  • Parity and interop hardening: PySpark parity tests for framed windows/map utilities plus trusted constructor coverage for Polars DataFrame input (validate_data=False).
  • Window range contracts tightened: rangeBetween now enforces exactly one orderBy key for supported aggregate frames, with explicit typed errors.
  • Map v2 parity expanded: add map_entries() and PySpark wrappers for map_len, map_get, and map_contains_key.
  • Trusted ingest hardening: Polars trusted constructor path rejects nulls in non-nullable schema fields when validate_data=False.

Details

See INTERFACE_CONTRACT, PYSPARK_PARITY, SUPPORTED_TYPES, and ROADMAP.

[0.9.0] — 2026-03-23

Highlights

  • Bad-input ingest controls: ignore_errors=True with on_validation_errors=... (row_index, row, errors) across DataFrameModel and DataFrame constructor paths.
  • Framed windows: rowsBetween / rangeBetween frame metadata is wired through Python/PySpark/Rust; framed execution is supported for row_number / window_sum (rangeBetween on integer order keys, range offset computed from first orderBy key).
  • Map v2 values: dict[str, T] map columns now support nested JSON-like value dtypes (lists/maps/structs and nullable unions), with map_len, map_get, and map_contains_key behavior preserved.
  • Release hardening: expanded 0.9.0 edge-case tests and full quality-gate coverage (make check-full, docs example validation, Sphinx warnings-as-errors build).

Details

See DATAFRAMEMODEL, INTERFACE_CONTRACT, SUPPORTED_TYPES, PYSPARK_PARITY, and ROADMAP.

[0.8.0] — 2026-03-23

Highlights

  • Global row count: global_row_count() and PySpark functions.count() with no column (count(*)-style) for DataFrame.select.
  • Casts: strdate / datetime via Expr.cast(...) (Polars parsing); use strptime for fixed formats.
  • Maps: Expr.map_get(key) / map_contains_key(key) on dict[str, T] columns (list-of-struct encoding).
  • Windows: window_min / window_max; IR carries optional WindowFrame::Rows for future Spark-style rowsBetween (lowering not yet implemented).
  • Docs: INTERFACE_CONTRACT, PYSPARK_PARITY, SUPPORTED_TYPES updates.

Details

See SUPPORTED_TYPES, PYSPARK_PARITY, ROADMAP, and INTERFACE_CONTRACT.

Testing

  • Broader integration tests for 0.7.0 / 0.8.0 surfaces (test_v070_features, test_v080_features), including PySpark F.count() with no column.

Documentation

[0.7.0] — 2026-03-23

Highlights

  • Global aggregates: global_count, global_min, global_max for DataFrame.select (non-null count; Polars min/max); PySpark functions.count / min / max on typed columns.
  • Windows: lag / lead (Polars shift + .over(...)); still no SQL-style rowsBetween / rangeBetween in the IR (see INTERFACE_CONTRACT.md).
  • Temporal: Expr.strptime / Expr.unix_timestamp, PySpark to_date(..., format=...) and unix_timestamp; dt_nanosecond for datetime and time.
  • Maps / binary: Expr.map_len(), Expr.binary_len() (byte length of bytes columns).

Details

See SUPPORTED_TYPES, PYSPARK_PARITY, and ROADMAP.

[0.6.0] — 2026-03-22

Highlights

  • Scalar types: datetime.time, bytes, and homogeneous dict[str, T] map columns (Polars-backed I/O; map execution surface is intentionally small).
  • Windows: row_number, rank, dense_rank, window_sum, window_mean with Window.partitionBy(...).orderBy(...) / .spec(); Rust ExprNode::Window + Polars .over(...).
  • Global aggregates (Phase D): DataFrame.select(...) with global_sum / global_mean or PySpark functions.sum / avg / mean on typed columns — single-row results.
  • PySpark façade: dropDuplicates(subset=...), date helpers (year, month, …, to_date), documentation parity fixes (union, types).

Development and testing

  • [dev] optional dependencies include numpy (for collect(as_numpy=...) tests), pytest-cov, coverage, pytest-xdist, and polars.
  • CI runs parallel pytest on Linux, optional coverage XML on Ubuntu + Python 3.11, installs Polars on that leg so to_polars() tests run, runs scripts/verify_doc_examples.py, and uses GITHUB_ACTIONS-scaled performance guardrails.

Details

See SUPPORTED_TYPES, PYSPARK_PARITY, and ROADMAP.

[0.5.0] — 2026

Highlights

  • PydanTable naming and docs alignment with the ROADMAP (0.5.x line).
  • Typed DataFrameModel and DataFrame[Schema] with a Rust execution core (Polars-backed in the native extension).
  • Materialization: collect() returns Pydantic row models; to_dict() / collect(as_lists=True) for columnar data; optional to_polars() with the [polars] extra.
  • Rich column types: nested Pydantic models (structs), homogeneous list[T], uuid.UUID, decimal.Decimal, enum.Enum, plus explode, unnest, and extended Expr helpers (see SUPPORTED_TYPES).

Details

For phase history and future direction, see ROADMAP and POLARS_TRANSFORMATIONS_ROADMAP.