Changelog¶
All notable changes to this project are documented here. The format is inspired by Keep a Changelog.
[Unreleased]¶
[1.19.2] — 2026-06-09¶
Documentation¶
- Adoption-focused docs overhaul: installation guide with verify snippet and platform matrix, migration guide, glossary, best practices, expanded API reference (I/O, engines, FastAPI, errors),
CONTRIBUTING.mdandSECURITY.md, restructured MkDocs navigation (Get started / Guides / Recipes / Reference / Internals), forward-looking roadmap split from history, unified quickstart onDataFrameModel, and core analytics example.
Maintenance¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.19.2.
[1.19.1] — 2026-06-06¶
Maintenance¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.19.1. - Strengthen the test suite: native
rust_engineroundtrips, cross-engine behavioral parity checks, join validation/order assertions, and targeted coverage cleanup.
[1.19.0] — 2026-04-20¶
Added¶
- Engine pushdown methods (typed): SQL, Mongo, and Spark DataFrame surfaces add typed “native/pushdown” methods (and explicit guardrails) so engine-specific operations are available when the frame is backed by that engine, with clear errors when not supported.
- Engine parity matrix + tests: a dedicated engine parity table plus tests/guardrails to keep engine capabilities and error messages consistent.
- UI wrappers (optional engines): pandas- and PySpark-shaped wrapper modules for the Spark and Mongo DataFrame surfaces.
- pydantable-rag: FastAPI backend for a documentation-grounded assistant (RAG), including a
/chat-appUI, extractive mode, optional OpenAI-backed generation, and a committed SQLite vector index for deployment.
Documentation¶
- Engines: updated SQL/Mongo/Spark engine docs plus new parity/guardrails documentation.
- Docs assistant: floating docs chat launcher (MkDocs overrides + CSS/JS) and README pointers for the RAG backend and deployment.
Performance¶
- Native engine (Rust / Polars execution): reduced materialization overhead and added benchmarks for native materialization hot paths.
CI / deployment¶
- Docs assistant workflows: FastAPI Cloud deploy workflow for
pydantable-rag, plus a workflow to verify the committed vector index and optionally rebuild it in CI whenOPENAI_API_KEYis configured.
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.19.0.
[1.18.1] — 2026-04-16¶
Documentation¶
- Versioning: Consolidated 2.0.0 migration pointers (
as_polars=, legacy SQL I/O names,moltres_engine=,Entei*→Mongo*) in VERSIONING Planned removals.
Maintenance¶
- pydantable-core (Rust / PyO3): Migrated off deprecated
IntoPy/.into_py(py)toIntoPyObject/into_py_any, updated tuple/list/dict constructors for PyO3 0.24, replacedPyErr::value_boundwithvalue, and dropped crate-levelallow(deprecated). Introduced sharedliteral_value_to_pyobjectforLiteralValue→ Python and moved datetime construction helpers topy_datetime.rs(used from expression serialization and Polars execution). - Narrowed optional-import
exceptclauses (ImportError/OSError) where appropriate; documented intentional broad catches in I/O materialization fallbacks,Exprrepr, andio.mongo.
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.18.1.
[1.18.0] — 2026-04-15¶
Added¶
- PySpark (optional
pydantable[spark]):SparkDataFrame/SparkDataFrameModelfaçade on raikou-core, with lazypydantable.pyspark.sparkdanticintegration for SparkDantic-based JVM schemas. Documentation and tests: SPARK_ENGINE, PYSPARK_UI, PYSPARK_PARITY, and related coverage.
Documentation¶
- MkDocs + Material for MkDocs replace Sphinx for the published manual; Read the Docs and CI use
mkdocs build --strict(mkdocs.yml,docs/mkdocs_hooks.py). - Lazy-SQL user guide file renamed
MOLTRES_SQL.md→SQL_ENGINE.md(links and nav updated; content unchanged).
CI¶
- Test installs use
pip install -e ".[dev]"plus raikou-core / pyspark where needed; Windows skips@pytest.mark.spark(JVM / winutils constraints). Docs jobs include Spark stack for mkdocstrings imports.
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.18.0.
[1.17.0] — 2026-04-14¶
Added¶
- Mongo engine (entei-core):
pydantable.mongo_dataframe—MongoDataFrame/MongoDataFrameModelfacades (same pattern aspydantable.sql_dataframe+ moltres-core).entei-coresuppliesMongoRoot(and materialization helpers);MongoPydantableEngineis implemented inpydantable.mongo_dataframe_engine. - Mongo / Beanie:
pydantable.mongo_beanie.sync_pymongo_collection(also a lazy import frompydantable),MongoDataFrame.from_beanie, andMongoDataFrameModel.from_beaniefor BeanieDocumentmodels with a syncpymongo.database.Database(Beanie uses async PyMongo). SeeMONGO_ENGINE. - Mongo I/O:
fetch_mongo,iter_mongo,write_mongoand asyncafetch_mongo,aiter_mongo,awrite_mongo— eagerdict[str, list]reads/writes against a PyMongoCollection(same import pattern as SQL I/O).pydantable[mongo]includes pymongo.
Documentation¶
MONGO_ENGINE— user guide for the optional Mongo engine (MongoDataFrame/MongoDataFrameModel,pydantable[mongo]).- Mongo: position Beanie
Documentmodels as the primary way to use pydantable with MongoDB (pydantable[mongo],from_beanie,sync_pymongo_collection); PydanticSchema+from_collectiondocumented as a supported alternative. SeeMONGO_ENGINE, README, and IO_DECISION_TREE.
Changed¶
- Public API naming (SQL / Mongo):
MongoDataFrame,MongoDataFrameModel,MongoPydantableEnginereplaceEntei*; modulespydantable.mongo_dataframe,pydantable.sql_dataframeare canonical (legacy pathsmongo_entei,sql_moltresre-export).sql_engine_from_configandsql_engine=replacemoltres_engine_*.pydantable[sql]includes eager SQLModel I/O, moltres-core, and the lazySqlDataFramestack (the separatelazy-sqlandmoltresextras are removed).Entei*andmoltres_engineremain available withDeprecationWarning. User docs describe SQLAlchemy, PyMongo, and Beanie without third-party stack marketing names. - Optional dependencies:
pydantable[mongo]includes Beanie (beanie>=1.24,<3) and pinsentei-coreto>=0.2.0,<0.3(PyPIentei-coreis versioned independently of pydantable 1.x). The separatemongo-beanieextra is removed — usepip install "pydantable[mongo]". - Optional dependencies:
rapsqliteis not bundled inpydantable[sql]— install a DB-API driver for your database URL yourself (pip install rapsqliteonly if you want thesqlite+rapsqlitedialect).
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.17.0.
[1.16.1] — 2026-04-13¶
Changed¶
- Python: Narrow
ImportErrorhandling for optional dtype registry and native capabilities; debug logging on intentional broad I/O /Exprrepr fallbacks; optional scan-column recovery iteration bound in_materialize_scan_fallback. - Rust: Split row-wise calendar/string helpers into
expr/rowwise_support.rs.
Tests¶
- DataFrame: Shared retail-style columnar scenarios (
tests/_support/scenarios.py), multi-step workflow coverage (tests/dataframe/test_dataframe_realistic_workflows.py), Hypothesis strategies with Python oracles for discounted line totals and global sum/mean, and a global aggregate check on the scenario payload (tests/dataframe/test_global_agg.py).
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.16.1.
[1.16.0] — 2026-04-06¶
Added¶
- Typing (DataFrame[Schema]):
as_schema/try_as_schema/assert_schemawithas_model/try_as_model/assert_modelaliases. - Typing escape hatches (DataFrame[Schema]):
*_as_schema(and*_as_modelaliases) forjoin,melt,unpivot, androlling_agg. - Typing contract tests: new Pyright + Astral
tysnippet tests for the genericDataFrameAPI. - Docs:
DataFrameModel“Pyright/ty golden path” section documenting the explicit*_as_modelhelpers.
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.16.0.
[1.15.2] — 2026-04-05¶
Fixed¶
_dtype_repr: On Python 3.9-3.10, builtin generics such aslist[int]were shown aslistbecause they areisinstance(..., type); resolve viaget_origin/get_argslike 3.11+.- Tooling:
pyrightconfig.jsonincludepath points attests/typing/test_pyright_dataframe_model_return_types.py(Pyright contract check).
Changed¶
- Docs: Typing guide (
tyvs mypy/Pyright,Anypolicy), troubleshooting (engine="auto"Rust fallback, async cancellation), I/O overview callouts, FASTAPI/DATAFRAMEMODEL non-deprecated SQL examples; DEVELOPER Rustunwrapnote. - Typing: Astral
tyrules (invalid-return-type,unsupported-operatorat error); grouped-frame protocol; SQLafetch_sqlmodel/aiter_sqlmodelSequencesignatures; optionalpyrightconfig-strict.json. - Tests: Scan-column engine-error regex regression tests; remove flaky wall-clock skip stub in async materialization tests; extra coverage for
_describe_dtype, grouped agg, selectors, PySpark shims, andrust_enginedelegation; Ruff-formatted typing stubs.
Version bump¶
- Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.15.2.
[1.15.1] — 2026-04-04¶
Fixed¶
- Typing:
sql_moltresandDataFrameModelstubs aligned so mypy/pyright contract tests pass;pyrightconfig.jsonsetsextraPathsfor snippet imports.
Changed¶
- Rust: Set-ops column assembly returns
PyRuntimeErrorinstead of panicking on unexpected internal map keys (exceptAll/intersectAllmaterialization). - Docs:
DataFrame.join/DataFrameModel.joindocument thatallow_parallel/force_parallelare not implemented; TYPING notesas_polarsoncollect/acollectdeprecation for 2.0. - Version bump: Align pydantable, pydantable-protocol, pydantable-native, pydantable-core, and published
__version__values to 1.15.1.
Tests¶
aiter_sqlmodel: early break,ThreadPoolExecutor, andbatch_size=1coverage.
[1.15.0] — 2026-04-03¶
Added¶
- Optional Moltres integration: extra
pydantable[lazy-sql](installpydantable[sql]today — optional extras were consolidated) pulls moltres-core. NewSqlDataFrameandSqlDataFrameModelinpydantable.sql_dataframe(also available aspydantable.SqlDataFrame/SqlDataFrameModelvia lazy import) bindmoltres_core.MoltresPydantableEngineusingsql_config=(moltres_core.EngineConfig) ormoltres_engine=. Helpermoltres_engine_from_sql_config. User guide: SQL_ENGINE; protocol story: CUSTOM_ENGINE_PACKAGE.
Changed¶
-
pydantable[lazy-sql](todaypydantable[sql]) and[dev]now includegreenletso SQLAlchemycreate_async_engine(for examplesqlite+rapsqlitesmoke tests and async SQLAlchemy workflows) works without an extra manual install. -
Version bump: Align Python package metadata (
pydantable,pydantable-protocol,pydantable-native), Rust cratepydantable-core, and published__version__values to 1.15.0.
[1.14.1] — 2026-04-03¶
Changed¶
pydantablenow requirespydantable-nativeat the same version, sopip install pydantableinstalls the Rust engine. Removed thepydantable-metapackage and its release job.- Version bump: Align Python package metadata (
pydantable,pydantable-protocol,pydantable-native), Rust cratepydantable-core, and published__version__values to 1.14.1.
[1.14.0] — 2026-04-03¶
Added¶
- Documentation: CUSTOM_ENGINE_PACKAGE — guide for authors publishing a separate engine package (dependencies, protocol implementation, wiring, expressions, I/O boundaries, testing, PyPI).
pydantable-protocol: zero-dependency distribution definingExecutionEngine/PlanExecutor/SinkWriter,EngineCapabilities,UnsupportedEngineOperationError, andMissingRustExtensionError. Third-party engines (for example SQL backends) canpip install pydantable-protocolfor typing and shared errors without depending onpydantable.pydantablepins the same version and re-exports protocols frompydantable.engine.protocols(andMissingRustExtensionErrorfrompydantable._extension).pydantable-nativenow depends only onpydantable-protocol(notpydantable).native_engine_capabilitieslives inpydantable_native.capabilities. Tracing in the native engine usespydantable.observe.spanwhenpydantableis installed, otherwise a localpydantable_native._traceimplementation (PYDANTABLE_TRACEbehaves the same).
Changed¶
- Internal: Introduced
pydantable.engine(NativePolarsEngine,get_default_engine,get_expression_runtime) so execution is routed through a single abstraction;rust_engineremains a thin delegating module. See ADR-engines and DEVELOPER. - Version bump: Align Python package metadata (
pydantable,pydantable-protocol,pydantable-native,pydantable-meta), Rust cratepydantable-core, and published__version__values to 1.14.0.
[1.13.0] — 2026-04-02¶
Added¶
- SQLModel read I/O (Phase 0–1):
pydantable[sql]includes sqlmodel. New APIsfetch_sqlmodel,iter_sqlmodel,afetch_sqlmodel, andaiter_sqlmodelinpydantable.io(also re-exported frompydantable), sharing batching andStreamingColumnssemantics withfetch_sql_raw/iter_sql_raw.MissingOptionalDependencywhen sqlmodel is required but not installed. - SQLModel write I/O (Phase 2):
write_sqlmodel,write_sqlmodel_batches,awrite_sqlmodel,awrite_sqlmodel_batches— DDL fromSQLModel.__table__,replace_okguard forif_exists="replace", optionalvalidate_rows, strict column alignment.python/pydantable/io/sqlmodel_write.py; teststests/test_sqlmodel_io_phase02.py. - SQLModel +
DataFrameModel(Phase 3): classmethodsfetch_sqlmodel,afetch_sqlmodel,iter_sqlmodel,aiter_sqlmodel,write_sqlmodel_data/awrite_sqlmodel_data; instancewrite_sqlmodel/awrite_sqlmodel;MyModel.Async.write_sqlmodel→awrite_sqlmodel_data.python/pydantable/dataframe_model.py; stubs inpython/pydantable/dataframe_model.pyi/typings/; teststests/test_sqlmodel_dataframe_model.py. - Explicit string SQL (Phase 4):
fetch_sql_raw,iter_sql_raw,write_sql_raw,afetch_sql_raw,aiter_sql_raw,awrite_sql_rawinpydantable.io(fetch_sql_raw/afetch_sql_rawalso re-exported frompydantableroot). - Schema bridging (Phase 5):
sqlmodel_columns,DataFrameModel.assert_sqlmodel_compatible—python/pydantable/io/sqlmodel_schema.py; teststests/test_sqlmodel_bridge_phase05.py; docs IO_SQL, DATAFRAMEMODEL, SQLMODEL_SQL_ROADMAP. - Documentation + examples + testing gate (Phase 6): SQLModel-first SQLite examples
docs/examples/io/sql_sqlite_sqlmodel_roundtrip.py,docs/examples/io/sql_sqlite_sqlmodel_streaming.py; IO_SQL sections for raw vs SQLModel-first examples;tests/test_doc_io_examples.pyrunssql_sqlite_streaming.pyand the SQLModel scripts alongside existingsql_sqlite_*examples.
Deprecated¶
- Legacy string-SQL names (Phase 4):
fetch_sql,iter_sql,write_sql,afetch_sql,aiter_sql,awrite_sql,write_sql_batches,awrite_sql_batches— emitDeprecationWarning; migrate to*_rawor SQLModel helpers.DataFrameModel.write_sql/awrite_sqldelegate to the same deprecatedpydantable.ioentrypoints. Removal no earlier than2.0.0(VERSIONING). Tests:tests/test_sql_string_deprecation.py; default test run filters these warnings inpyproject.tomlfor backward-compatible suites.
Docs¶
- README / site index / I/O guides: align current release (1.13.0), SQL I/O naming (
fetch_sqlmodel,fetch_sql_raw, deprecations), and pointers to IO_SQL / SQLMODEL_SQL_ROADMAP across README, index, IO_OVERVIEW, IO_DECISION_TREE, EXECUTION, DATA_IO_SOURCES, DOCS_MAP, POLARS_TRANSFORMATIONS_ROADMAP, ROADMAP, and the SQLModel roadmap introduction. - SQL I/O: IO_SQL, SQLMODEL_SQL_ROADMAP, VERSIONING — SQLModel-first default,
*_rawfor explicit string SQL, deprecation policy. Runnable examples: rawsql_sqlite_roundtrip.py/sql_sqlite_streaming.pyand SQLModel-firstsql_sqlite_sqlmodel_*.py(see IO_SQL).
Changed¶
- Version bump: Align Python package metadata, Rust crate, and published
__version__to 1.13.0. (This release includes all SQLModel-first SQL I/O work since v1.12.0 — Phases 0–6 of SQLMODEL_SQL_ROADMAP — in one minor version.)
[1.12.0] — 2026-04-02¶
Changed¶
- Version bump: Align Python package metadata, Rust crate, and published
__version__to 1.12.0.
[1.11.0] — 2026-04-01¶
Added¶
- Tests (Phase E):
tests/test_parquet_allow_missing_columns_e.py— directory scan with mismatched Parquet columns andallow_missing_columns=True. - Example (Phase E):
docs/examples/io/parquet_allow_missing_columns.py(tests/test_doc_io_examples.py). - Parquet lazy
scan_kwargs(1.11.0 Phase B1):hive_partitioning,hive_start_idx,try_parse_hive_dates,include_file_paths,row_index_name,row_index_offsetforwarded to PolarsScanArgsParquetinpydantable-core(scan_kw.rs). Tests:tests/test_parquet_scan_hive_b1.py. - CSV lazy
scan_kwargs(1.11.0 Phase B2):include_file_paths,row_index_name,row_index_offset,raise_if_empty,truncate_ragged_lines,decimal_comma,try_parse_datesforwarded to PolarsLazyCsvReaderinpydantable-core(lazy_csv_with_kwargs); sharedrow_index_*parsing with Parquet. Tests:tests/test_csv_scan_directory_b2.py(directory /*.csvglob, hive path behavior, unknown kw). - NDJSON lazy
scan_kwargs(1.11.0 Phase B3):glob(glob=FalseraisesValueError; Polars 0.53 NDJSON scans always expand paths),include_file_paths,row_index_name,row_index_offsetinlazy_ndjson_with_kwargs(scan_kw.rs). Tests:tests/test_ndjson_scan_directory_b3.py. - IPC lazy
scan_kwargs(1.11.0 Phase B4):record_batch_statisticsplusUnifiedScanArgsfields (glob,cache,rechunk,n_rows,hive_partitioning,hive_start_idx,try_parse_hive_dates,include_file_paths,row_index_name,row_index_offset) viaipc_scan_from_kwargs(scan_kw.rs). Tests:tests/test_ipc_scan_directory_b4.py. read_json(1.11.0 Phase B5): Confirmed alias ofread_ndjson(JSON Lines lazy scan only); documentation for paths,glob, andscan_kwargsvsmaterialize_jsonfor JSON array files. Tests:tests/test_read_json_paths_b5.py.iter_chain_batches(pydantable.io/pydantable.io.batches): chain per-fileiter_*iterators over an explicit path list. Test:tests/test_batches_chain.py.- Partitioned Parquet writes (1.11.0 Phase D1):
DataFrame.write_parquet/DataFrameModel.write_parquet—partition_by(column names),mkdir, hive-stylecol=value/.../00000000.parquetshards viapydantable-core(sink_parquet_polars, Polarspartition_by_stable). Tests:tests/test_write_parquet_partition_d1.py. write_*_batches(Phase D2): reject an existing directory path; CSV/NDJSONmodedocumented. Tests:tests/test_write_batches_phase_d2.py.- Tests (Phase F):
test_write_parquet_unknown_write_kw_raisesintests/test_write_parquet_partition_d1.py— invalidwrite_kwargskeys raiseValueErrorwithunknown write_kw key(parity withscan_kwargsallowlist tests).
Docs¶
- Changelog page: source file is
docs/project/changelog.md; Sphinx / Read the Docs page isCHANGELOG(CHANGELOG.html). Update any bookmarks fromchangelog.html. - Local I/O (1.11.0) — release narrative: Directory/glob/hive lazy reads,
scan_kwargs/write_kwargsallowlists, eageriter_*/materialize_*guidance, partitioned Parquet writes, multi-file Parquetallow_missing_columnsand observability — details in this 1.11.0 section; ongoing I/O work in ROADMAP.pydantable.__version__/rust_version()alignment per VERSIONING (tests/test_version_alignment.py). - Local I/O Phase E (1.11.0): Multi-file Parquet —
allow_missing_columns, Polars schema union, cast / optional-field patterns — IO_PARQUET; pointers in DATA_IO_SOURCES, IO_DECISION_TREE, SUPPORTED_TYPES, INTERFACE_CONTRACT, PLAN_AND_PLUGINS. Contributor note:pydantable-core/.../scan_kw.rs, DEVELOPER. - Writes Phase D: partitioned
write_parquet, batch-writer file vs directory—IO_PARQUET, IO_OVERVIEW, IO_DECISION_TREE, DATA_IO_SOURCES, INTERFACE_CONTRACT; exampledocs/examples/io/parquet_partitioned_write.py. - Eager / batched multi-file clarity (1.11.0 Phase C):
materialize_*single-file contract;iter_*/aiter_*one path per call and Python-side glob/directory expansion;iter_chain_batches; bounded-memory notes vsiter_concat_batchesand lazyread_*—IO_OVERVIEW, IO_DECISION_TREE, DATA_IO_SOURCES, INTERFACE_CONTRACT; exampledocs/examples/io/iter_glob_parquet_batches.py. - Local I/O audit (1.11.0 Phase A): Polars 0.53.0 vs pydantable
scan_kwargsmatrix, directory/glob/hive notes—see Polars 0.53 vs pydantable scan audit; multi-file entrypoint table—IO_DECISION_TREE; Local lazy file scans—INTERFACE_CONTRACT; path/glob subsections on IO_PARQUET, IO_CSV, IO_NDJSON, IO_IPC, IO_JSON; link from IO_OVERVIEW. - Parquet B1: DATA_IO_SOURCES audit + summary table; IO_PARQUET; INTERFACE_CONTRACT (Parquet lineage / hive kwargs).
- CSV B2: DATA_IO_SOURCES audit + summary table; IO_CSV; INTERFACE_CONTRACT (CSV
include_file_paths/row_index_*). - NDJSON B3: DATA_IO_SOURCES audit + summary table; IO_NDJSON; INTERFACE_CONTRACT (NDJSON
glob/include_file_paths/row_index_*). - IPC B4: DATA_IO_SOURCES audit + summary table; IO_IPC; INTERFACE_CONTRACT (IPC
scan_kwargs). - JSON
read_jsonB5: IO_JSON; DATA_IO_SOURCES (lazy vs array);pydantable.io.read_jsondocstring.
Changed¶
- Versioning: Python package metadata and Rust crate aligned at 1.11.0 for this release; docs “current release” strings (index, CHANGELOG, ROADMAP, POLARS_TRANSFORMATIONS_ROADMAP) aligned.
[1.10.0] — 2026-04-01¶
Added¶
- Struct expressions (Polars):
Expr.struct_json_encode,struct_json_path_match,struct_rename_fields,struct_with_fields(RustExprNode+ lowering); PySpark façadepydantable.pyspark.sql.functions.struct_json_encode/struct_json_path_match. Tests:tests/test_struct_expr_phase_b.py. - JSON decode (Polars):
Expr.str_json_decode(dtype)→ struct ordict[str, T]viaStringJsonDecodeand sharedpolars_dtypemapping. Tests:tests/test_str_json_decode_phase_c.py. - Tests:
tests/test_json_io_phase_a.py— nestedmaterialize_json(array vs NDJSON),export_jsonround-trip anddefault=strfordatetime/Decimal/UUID, lazyread_ndjson/read_jsonalias with nested struct + list, eager path withdict[str, T]map column.
Docs¶
- JSON modeling: JSON (RFC 8259) vs column types in SUPPORTED_TYPES (heterogeneous arrays, arbitrary JSON, link from IO_JSON).
- I/O: Eager
export_jsonserialization in IO_JSON; extendedexport_jsondocstring inpydantable.io(json.dump+default=str); struct → JSON text pointer (struct_json_encode). - Structs:
SUPPORTED_TYPESandINTERFACE_CONTRACT— struct JSON /with_fields/rename_fieldssemantics and row-wise limits. - FastAPI: columnar map / nested field notes with links to SUPPORTED_TYPES and IO_JSON.
- Roadmap: Phase A + B + C JSON/struct work summarized in this 1.10.0 section and ROADMAP (Shipped in 1.10.0);
str_json_decode/ error semantics in SUPPORTED_TYPES and INTERFACE_CONTRACT; IO_JSON cross-link. - Phase D (I/O): IO_JSON —
read_jsonvsread_ndjsonvsmaterialize_json, large-file /streamingpatterns, NDJSONscan_kwargspresets; exampledocs/examples/io/large_ndjson_patterns.py; cross-links from DATA_IO_SOURCES, EXECUTION, IO_NDJSON. - Phase E (UX) & 1.10.0 JSON/struct summary: SELECTORS —
s.structs(),unnest,struct_fieldpipeline; cookbook json_logs_unnest_export (NDJSON → unnest →export_json); DOCS_MAP link. Release narrative: JSON ↔ schema matrix and I/O tests; struct expressions (struct_json_encode, path/rename/with-fields);str_json_decode; Phase D large-file NDJSON docs; Phase E selectors + cookbook + this page.
Changed¶
- Versioning: Python package metadata and Rust crate aligned at 1.10.0 for this release.
[1.9.0] — 2026-04-01¶
Added¶
- PySpark UI parity:
groupByreturningPySparkGroupedDataFrame/PySparkGroupedDataFrameModel(aggregations stay Spark-flavored),sort,crossJoin, frame actioncount()→ int (viaglobal_row_count()),unionByName(optionalallowMissingColumns),intersect/subtract/exceptAll(join-layer semantics;exceptAllaliasessubtract, not Spark multisetEXCEPT ALL),fillna/dropna/.na,printSchema,explain,toPandas, and the same methods onDataFrameModel. See PYSPARK_UI, PYSPARK_PARITY, and INTERFACE_CONTRACT. - Engine:
cast_expr/Expr.castnow acceptsLiteral(None)(unknown-base SQL NULL) and casts it to a nullable scalar dtype, enabling typed null padding (e.g.unionByName(..., allowMissingColumns=True)). - Temporal:
Expr.dt_dayofyear,Expr.from_unix_time, PySparkF.dayofyear/F.from_unixtime(numeric epoch → UTC-naivedatetime; Spark’s optionalfrom_unixtimeformat string is not modeled—use parsing helpers on strings). Rust:TemporalPart::DayOfYear,ExprNode::FromUnixTime. - Introspection:
DataFrame.describe()(and PySparksummary()) now includesdateanddatetimecolumns: non-null count, min, max, and null count (oneto_dict()materialization). Tests:tests/test_dataframe_discovery.py.
Docs / tooling¶
- Versioning: bump to 1.9.0 across Python package metadata, Rust crate, and shipped stubs; docs “current release” strings (index, ROADMAP, POLARS_TRANSFORMATIONS_ROADMAP) aligned.
- Docs:
describe()/summary(),SUPPORTED_TYPEStemporal helpers, INTERFACE_CONTRACT, EXECUTION, PYSPARK_PARITY, PARITY_SCORECARD, DEVELOPER, PANDAS_UI, and POLARS_TRANSFORMATIONS_ROADMAP updated for 1.9.0 behavior.
[1.8.0] — 2026-03-31¶
Added¶
- Selectors: selector-driven column and rename helpers (see
pydantable.selectorsand SELECTORS). - Core DataFrame ergonomics:
row_count,clip, anddrop_nullsarguments and convenience behavior aligned with the 1.8 parity push (see POLARS_PARITY_1_8 and PARITY_SCORECARD). - Joins: additional join argument parity including
join_nullsandmaintain_order(typed contract preserved; see INTERFACE_CONTRACT). - Reshape:
pivot_longer/pivot_widerand related reshape ergonomics (see POLARS_WORKFLOWS and INTERFACE_CONTRACT reshape notes).
Docs¶
- Add explicit 1.8.0 release entry and align “current release” references with the changelog.
[1.7.0] — 2026-03-30¶
Added¶
- Pandas UI (schema-first):
duplicated/drop_duplicates(keep=False)backed by engine plan steps where Polars is enabled; typedget_dummieswith cardinality guard; eagercut/qcut,factorize_column, narrowewm(...).mean()(may require pandas at runtime); façadepivotdelegating to core. See PANDAS_UI and PARITY_SCORECARD. - Tests:
tests/test_pandas_ui_popular_features.py— extended coverage for duplicates, dummies, binning, factorize, ewm, and pivot.
Docs¶
- Refreshed PANDAS_UI (correct
pivotfaçade,get_dummiesnull/boolean behavior, naming map, test links), INTERFACE_CONTRACT (duplicate detection,value_countsdictnote), DOCS_MAP, QUICKSTART, TROUBLESHOOTING (optional pandas for eager helpers), EXECUTION, DEVELOPER, DATAFRAMEMODEL, POLARS_TRANSFORMATIONS_ROADMAP, PARITY_SCORECARD, and root README cross-links.
Docs / tooling¶
- Versioning: bump to 1.8.0 across Python package metadata, Rust crate, and shipped stubs; docs “current release” strings aligned.
[1.6.1] — 2026-03-30¶
Fixed¶
- Async iterator bridge: hardened
pydantable.io.aiter_sql()and internal_aiter_from_iter()against deadlocks when the async consumer stops early (e.g. client disconnect / early return). Producer threads now exit cleanly and do not block forever on a bounded queue. - Deferred materialization:
ExecutionHandle.result()now shields the underlyingconcurrent.futures.Futureso cancelling the awaiting task cancels the wait but does not cancel the background engine work. - Submit cancellation race:
DataFrame.submit()now avoidsInvalidStateErrorwhen a handle is cancelled before work starts (mirrorsThreadPoolExecutorsemantics). - Lazy scan missing optional columns: recovery for missing optional scan columns is now tolerant to error-message variants (not coupled to one brittle regex).
Docs / tooling¶
- Read the Docs build: install
pydata-sphinx-themeduring RTD builds to match the configuredhtml_theme(docs/conf.py). - Versioning: bump to 1.6.1 across Python package metadata, Rust crate, and shipped stubs; docs “current release” strings aligned.
[1.6.0] — 2026-03-30¶
Summary: FastAPI helpers (columnar OpenAPI bodies, NDJSON, register_exception_handlers), pydantable.errors, submit / stream / astream, PlanMaterialization, awaitable lazy reads (AwaitableDataFrameModel), Rust async plan execution when available, and docs/cookbooks for services. Breaking: removed legacy DataFrameModel eager SQL / materialize_* shims—use from pydantable import … (eager SQL / materialize_*) and read_* / aread_* as described under Removed below.
Removed¶
DataFrameModeleager I/O shims:materialize_*,amaterialize_*,fetch_sql,afetch_sql,iter_sql,aiter_sql,from_sql,afrom_sql. Usefrom pydantable import …for eagerdict[str, list]loads andSQLstreaming, thenMyModel(cols, ...); keep lazy scans onread_*/aread_*.
Added¶
pydantable.errors:PydantableUserError,ColumnLengthMismatchError(column length mismatch at schema ingest).register_exception_handlersmapsColumnLengthMismatchError→ 400 with JSONdetail.pydantable.fastapi:columnar_body_model,columnar_body_model_from_dataframe_model,columnar_dependency,rows_dependency— OpenAPI-friendly columnar bodies andDependsfactories forDataFrameModel; see FASTAPI.pydantable.testing.fastapi:fastapi_app_with_executor,fastapi_test_client(lifespan-awareTestClientforexecutor_lifespan/get_executor).pydantable.fastapi:ndjson_streaming_response/ndjson_chunk_bytesfor NDJSONStreamingResponsefromastream()without hand-rolling encoders.pydantable.fastapi(optionalpip install 'pydantable[fastapi]'):executor_lifespan,get_executor(Depends),register_exception_handlersforMissingRustExtensionError/pydantic.ValidationError. See GOLDEN_PATH_FASTAPI and FASTAPI.pydantable.typing.SupportsLazyAsyncMaterialize: structuralProtocolfor objects with async terminal materialization viaacollect(DataFrameModelandAwaitableDataFrameModel).AwaitableDataFrameModel:aread_parquet,aread_ipc,aread_csv,aread_ndjson, andaread_jsonreturn a chainable awaitable (select/filter/ … thenawait …acollect()) so async routes avoid nestedawaiton the read. Lazy metadata:await …columns/shape/empty/dtypes;thenfor custom sync/async steps;concatto merge multiple pending chains or concrete models. Async-first names: unprefixed terminals on the chain —collect,to_dict,to_polars,to_arrow,rows,to_dicts,stream(aliases of thea*methods);DataFrameModel.Async.read_*/Async.write_sql/Async.export_*mirroraread_*/awrite_sql/aexport_*without theaprefix (read_parquetcannot replacearead_parqueton the class itself becauseread_parquetis the sync lazy reader). Pending chains show a descriptiverepr(read path + chained transforms).DataFrameModel.aexport_parquet,aexport_csv,aexport_ndjson,aexport_ipc,aexport_json: async eager exports via the sameaexport_*implementation module aspydantable.io(preferDataFrameModelclassmethods in application code).- Rust async bridge:
async_execute_planandasync_collect_plan_batchesonpydantable_native._core(Tokio +pyo3-async-runtimes);acollect/ato_*prefer this awaitable when present. DataFrame.submit/DataFrameModel.submitandExecutionHandle(result,done,cancel) for backgroundcollect.DataFrame.astream/DataFrameModel.astream: async iteration of columndictchunks after one engine collect (see EXECUTION).DataFrame.stream/DataFrameModel.stream: synchronousdict[str, list]chunk iterator (same semantics asastream);PlanMaterializationandplan_materialization_summary()label the four terminal modes (blocking, async, deferred, chunked).
Docs¶
- New FASTAPI_ENHANCEMENTS (roadmap + “when to use what” matrix); links from GOLDEN_PATH_FASTAPI, FASTAPI, DOCS_MAP.
- FASTAPI_ENHANCEMENTS: production lifespan snippet (
executor_lifespan,get_executor,register_exception_handlers), NDJSON helper semantics, troubleshooting table (422 vs 503, empty streams, executor tuning);tests/test_pydantable_fastapi_integration.pycovers empty NDJSON, Unicode/null, custommedia_type,astreambatching, and golden-path stream parsing. - FASTAPI Columnar OpenAPI and Depends; fastapi_columnar_bodies uses generated models;
tests/test_pydantable_fastapi_columnar.pycovers OpenAPI schema, aliases,rows_dependency, andpydantable.testing.fastapi. - FASTAPI / FASTAPI_ENHANCEMENTS / cookbook: columnar 422 vs
ValueError(500), nestedlist[NestedModel],TestClient(raise_server_exceptions=False); expandedtests/test_pydantable_fastapi_columnar.py(cache, nested routes, length mismatch,register_handlers). - fastapi_observability, fastapi_background_tasks (end-to-end-style examples); example
docs/examples/fastapi/service_layout/(UserBatch, health metadata, 400 on length mismatch);tests/test_pydantable_errors.py,tests/test_pydantable_fastapi_service_layout.py, broader columnar / handler tests; FASTAPI_ENHANCEMENTS Phases 4–6 and 8 marked shipped where applicable. - TYPING: expanded
SupportsLazyAsyncMaterialize(when to use vsDataFrameModelWithRow, runtimeisinstancecaveats, examples); DATAFRAMEMODEL cross-link from async lazy I/O. - New MATERIALIZATION page; EXECUTION, INTERFACE_CONTRACT, DATAFRAMEMODEL, DOCS_MAP cross-links.
- DATAFRAMEMODEL Three layers (ASCII diagram + rule of thumb + lazy-shape warning); async_lazy_pipeline; fastapi_async_materialization prefers
collect/to_dict. - ROADMAP, DATA_IO_SOURCES, and
docs/async_ideas/aligned with async/submit/stream work where applicable. - README, index, DOCS_MAP, GOLDEN_PATH_FASTAPI, TROUBLESHOOTING: FastAPI helpers,
pydantable.errors, cookbooks,service_layout, and testing helpers cross-linked; troubleshooting bullets repaired.
[1.5.0] — 2026-03-29¶
Added¶
- Batched column-dict I/O:
iter_*/aiter_*readers andwrite_*_batcheswriters for core formats (Parquet, IPC, CSV, NDJSON, JSON array/lines) plus selected extras (Excel/Delta/Avro/ORC/BigQuery/Snowflake/Kafka where supported). - Engine streaming default propagation:
engine_streaming=alias and per-frame defaults set by lazyread_*/aread_*, applied to latercollect()/to_*/ lazywrite_*unless overridden.
Fixed¶
- IPC batch iteration:
iter_ipc(..., as_stream=False)now works with PyArrowRecordBatchFileReader(file format readers are not iterable in some versions).
[1.4.0] — 2026-03-29¶
Added¶
- SQL streaming (SQLAlchemy):
iter_sql/aiter_sqlfor batch iteration ofSELECTresults;DataFrameModel.iter_sql/aiter_sqlyield typed batch models. - SQL batch sinks:
write_sql_batches/awrite_sql_batchesfor end-to-end streaming (consume batches without building one giant in-memory dict).
Changed¶
fetch_sql/afetch_sql: supportbatch_size=and automatic streaming behavior for large results (may return a streaming container with.to_dict()).write_sql/awrite_sql: supportchunk_size=and stream inserts in chunks to reduce peak memory use.
[1.3.0] — 2026-03-29¶
Added¶
- Expr (type-specific):
list_join,list_sort, andlist_uniqueon homogeneous lists;dt_week(ISO week,date/datetime);str_reverse,str_pad_start/str_pad_end,str_zfill,str_extract_regex, andstr_json_path_match(Polars engine; semantics in SUPPORTED_TYPES and INTERFACE_CONTRACT). - Docs / tests: expanded expression contracts in SUPPORTED_TYPES and
TYPING; integration coverage in
tests/test_type_specific_expr.py.
Removed¶
- CI / Release: CycloneDX SBOM generation and upload jobs (too fragile for default automation); generate SBOMs locally if required (see DEVELOPER Optional CycloneDX SBOMs).
[1.2.0] — 2026-03-28¶
Added¶
- Column types (see SUPPORTED_TYPES):
typing.Literal[...]— homogeneousstr,int, orboolmembers only; dtype descriptors include an optionalliteralslist; invalidfilter(col == ...)constants are rejected when the expression is built.ipaddress.IPv4Address/IPv6Address— Polars Utf8, canonical string form; string cells coerce on ingest.pydantable.types.WKB—bytessubclass for Well-Known Binary geometry; Polars Binary (sameExprsurface asbyteswhere applicable).Annotated[str, ...]— logicalstrin the Rust plan; Pydantic applies metadata oncollect()/RowModel.- Tests:
tests/test_extended_scalar_dtypes_v12.py, typing-engine parity for these scalars, mypy/pyright DataFrameModel chain snippets. - Docs: practical notes for
Exprcomparisons (IP/WKB operands), TYPING (1.2 scalars), DATAFRAMEMODEL field list.
Fixed¶
cargo check -p pydantable-core --no-default-features: exhaustiveDTypeDesc::Scalarmatches and row-wiseCompareOp/cast_literal_valuecoverage for IPv4 / IPv6 / WKB whenpolars_engineis off.
Typing / lint¶
__all__and Ruff-driven cleanups onschema,types, and tests.
[1.1.0] — 2026-03-27¶
Added¶
- Typing:
DataFrameModeltransform methods return derived model types so mypy/pyright can verify the schema afterselect,drop,rename,with_columns,join, andgroup_by(...).agg(...)without materializing between steps (see GitHub issue #1). - Tests: expanded coverage (I/O fallbacks, PySpark/expr edges, schema helpers) and mypy
regression updates in
tests/test_mypy_dataframe_model_return_types.py.
[1.0.0] — 2026-03-26¶
Scope¶
- Production-ready major release focused on API stability and semver contract clarity.
- No large new execution-engine features are required for the tag.
Added¶
- Ingest/docs consistency for missing optional behavior:
fill_missing_optionaldocumented consistently across constructor and typed lazy-read materialization paths.- Explicit schema defaults on optional fields (for example
note: str | None = "n/a"or= None) now take precedence whenfill_missing_optional=Falseinstead of raising. - 1.0.0 readiness documentation:
- explicit 1.x semver policy in VERSIONING,
- release gate checklist and security-advisory handling in DEVELOPER,
- roadmap, README, and docs index updates for 1.0 communication and support matrix policy.
[docs]extra includes SQLAlchemy so Sphinx (-W) andsphinx-autodoc-typehintsresolveDataFrameModelEngine/Connectionannotations in CI (matches Read the Docs).
Changed¶
- Documentation includes migration guidance from earlier
missing_optionalstring-style wording ("fill_none"/"error") to booleanfill_missing_optional=True/False.
Stability commitments¶
- 1.x patch/minor/major policy is defined in VERSIONING.
- Behavioral semantics continue to be defined in INTERFACE_CONTRACT.
Upgrade guidance¶
- Canonical upgrade path from 0.20.x/0.23.x is documented in
README.mdand linked from the docs index; I/O renames from 0.22.x/0.23.x are summarized under 0.23.0 below.
[0.23.0] — 2026-03-25¶
Highlights¶
- Out-of-core file workflows:
read_parquet,read_csv,read_ndjson,read_ipc,read_json(andaread_*) return aScanFileRootsoDataFrame/DataFrameModelcan run transforms on a PolarsLazyFramewithout loading the full file into Python lists first. DataFrame.write_parquet(andwrite_csv,write_ipc,write_ndjson): write the lazy pipeline from the Rust engine without building a giantdict[str, list]for the result.- Breaking — public I/O renames: sync/async eager file reads into columns are
materialize_*/amaterialize_*. Lazy local files useread_*/aread_*. Eagerdict[str, list]→ file usesexport_*/aexport_*. SQLread_sql/aread_sql→fetch_sql/afetch_sql. Eager HTTP(S) column readers (0.22read_*_url) →fetch_parquet_url,fetch_csv_url,fetch_ndjson_url. Lazy HTTP Parquet (temp file on disk) staysread_parquet_url/aread_parquet_url— useread_parquet_url_ctx/aread_parquet_url_ctxto delete the temp file when done (IO_HTTP). Top-levelpydantableexports andDataFrameModelclassmethods follow the same vocabulary. - Pre-release / internal names: development builds that still exposed
scan_*/ascan_*orsink_*for lazy I/O now align with the publicread_*/write_*names;pydantablere-exportsread_parquet,read_parquet_url,aread_parquet,aread_parquet_url,export_parquet(replacingscan_*/write_parqueton the package root).
Added¶
- JSON (array of objects):
read_json,materialize_json,export_json,aread_json,amaterialize_json,aexport_json— local lazy scan and eager column dicts (see IO_JSON). read_parquet_url_ctx/aread_parquet_url_ctx: context managers that delete the temporary Parquet file when the block exits (see IO_HTTP).DataFrameModel: classmethodsexport_*,write_sql/awrite_sql,from_sql/afrom_sqldelegating topydantable.io.MissingRustExtensionError: subclass ofNotImplementedErrorwhen the native extension is missing or incomplete on lazy scan/sink paths andexecute_plan(still catchable asNotImplementedError).- HTTP / object store safety:
max_bytesonfetch_bytesandread_from_object_store; chunked reads withValueErrorwhen exceeded. - Docs: IO_DECISION_TREE, IO_JSON, IO_HTTP updates, engine matrix in IO_OVERVIEW, FASTAPI executor guidance; README and manual pages refreshed for 0.23.x I/O.
Details¶
- Rust:
ScanFileRoot,plan_to_lazyframe, internal sink exports for lazy writes; join/groupby/reshape entrypoints work with lazy file roots where implemented (see EXECUTION matrix). - Python:
read_csv_stdinusesmaterialize_csvinternally. - Docs: EXECUTION memory model and streaming/collect compatibility matrix (
PYDANTABLE_ENGINE_STREAMINGreserved); DATA_IO_SOURCES, FASTAPI, INTERFACE_CONTRACT, ROADMAP, README.
Migration (from 0.22.x)¶
| Old (0.22.x) | Use instead (0.23.0) |
|---|---|
Eager file → dict[str, list] via read_parquet, aread_parquet, … |
materialize_parquet, amaterialize_parquet, … |
read_sql, aread_sql |
fetch_sql, afetch_sql |
Eager URL → columns via read_parquet_url / read_csv_url / read_ndjson_url |
fetch_parquet_url, fetch_csv_url, fetch_ndjson_url |
| Lazy HTTP Parquet (unchanged name, new cleanup helpers) | Still read_parquet_url / aread_parquet_url; prefer read_parquet_url_ctx / aread_parquet_url_ctx for automatic temp-file removal |
| Large local file, filter → Parquet out | read_parquet + transforms + DataFrame.write_parquet |
[0.22.0] — 2026-03-25¶
Highlights¶
- Comprehensive I/O: the
pydantable.iopackage adds Rust-backed (Polars)read_*/write_*for Parquet, Arrow IPC, CSV, and NDJSON intodict[str, list], withPython::allow_threadson read hot paths; PyArrow remains the default for buffers, column projection, and streaming IPC. Async mirrorsaread_*/awrite_*useasyncio.to_thread(optionalexecutor=), matchingacollect/ato_arrow. - SQLAlchemy bridge:
read_sql/write_sql(pip install 'pydantable[sql]'+ your DB driver) for URL/engineSELECT→ column dict and append/replace inserts across SQLAlchemy-supported databases. - Transports (experimental): HTTP(S)
fetch_bytes,read_parquet_url,read_csv_url,read_ndjson_url, andfsspec-basedread_from_object_store— opt in withexperimental=TrueorPYDANTABLE_IO_EXPERIMENTAL=1. - Tier-2/3 extras (best-effort):
[excel],[kafka],[bq],[snowflake],[cloud]; helpers such asread_excel,read_delta,read_kafka_json_batch,read_csv_stdin/write_csv_stdout(seedocs/io/data-io-sources.md). - Optional
[rap]: true-async CSV viaaread_csv_rapwhenrapcsv+rapfilesare installed. - Engine override: set
PYDANTABLE_IO_ENGINE=rustorpyarrowto force file readers/writers. - Release quality bar: the
v0.22.0tag is cut from a commit that passesmake check-full, fullpytest, and Rust checks including--no-default-features. - Supply chain: the release workflow publishes CycloneDX SBOMs (Python + Rust) alongside wheels/sdist.
- Support matrix: Python 3.10–3.13.
Details¶
- Rust: new
pydantable_native._coreexportsio_read_*_path/io_write_*_path; column-dict writes round-trip through Pythonpolars.DataFrame→ IPC → Rust writers (installpydantable[polars]for writes). - Tests:
tests/test_io_comprehensive.py(round-trips, SQLite SQL, local HTTP server for URL Parquet). - CI: Python test job installs
sqlalchemywith other dev deps.
[0.21.0] — 2026-03-25¶
Highlights¶
- Streamlit:
DataFrameandDataFrameModelimplement the Python DataFrame Interchange Protocol (__dataframe__) via PyArrow sost.dataframe(df)can render a typedpydantableframe directly whenpyarrowis installed (pip install 'pydantable[arrow]'). For editing, usest.data_editor(df.to_arrow())(orto_polars()). See STREAMLIT and EXECUTION (interchange).
[0.20.0] — 2026-03-25¶
Supersedes: 0.19.0.
Highlights¶
- UX / discovery: Core
DataFrameandDataFrameModelexposecolumns,shape,empty,dtypes,info(), anddescribe()for int, float, bool, and str columns (oneto_dict()materialization).shape[0]follows root-buffer semantics—see INTERFACE_CONTRACT Introspection, EXECUTION. - Docs: QUICKSTART (five-minute tour), repository
notebooks/five_minute_tour.ipynb, EXECUTION sections on materialization costs, import styles, copy-as / interchange; naming map in PANDAS_UI / PYSPARK_UI. - Display:
pydantable.display—get_repr_html_limits,set_display_options,reset_display_options; envPYDANTABLE_REPR_HTML_*for Jupyter HTML preview bounds. DataFrame.value_counts/DataFrameModel.value_counts(group-by path);_repr_mimebundle_onDataFrameandDataFrameModel(text/plain+text/html).- Debugging:
PYDANTABLE_VERBOSE_ERRORS=1appends schema context toValueErrorfromexecute_plan. - Expressions:
Expr,ColumnRef,WhenChain, and pending window helpers implement readable__repr__. Tests:tests/test_expr_repr.py. - PySpark façade:
DataFrame.show()andsummary()(alias ofdescribe()). See PYSPARK_UI, PYSPARK_PARITY. - Documentation: README, index, ROADMAP, PARITY_SCORECARD, PANDAS_UI, DEVELOPER.
Details¶
- Repr / HTML: Multi-line
DataFrame.__repr__and_repr_html_(card-style HTML; grouped/model banners). See EXECUTION,tests/test_dataframe_repr.py. - Tests:
tests/test_display_options.py,tests/test_dataframe_discovery.py,tests/test_rust_engine_verbose_errors.py. See EXECUTION, INTERFACE_CONTRACT. - Release hygiene:
make check-full, full pytest,cargo test --all-featuresper DEVELOPER.
[0.19.0] — 2026-03-24¶
Highlights¶
- Pre-1.0 consolidation: VERSIONING documents 0.x patch vs minor expectations; INTERFACE_CONTRACT links there for semver scope while staying the behavioral source of truth.
- Roadmap to 1.0: ROADMAP Shipped in 0.19.0 replaces the planned checklist; Planned v1.0.0 items that belong on the 1.0.0 tag (full 1.x semver policy, SBOM, comms) remain explicitly deferred there with rationale below.
- Parity docs: POLARS_TRANSFORMATIONS_ROADMAP, PARITY_SCORECARD, PYSPARK_PARITY, README, and index updated for current release and 0.19 → 1.0 clarity—no new table methods or PySpark
functionsrows. - Performance: PERFORMANCE adds an 0.19.0 validation note (key scripts spot-checked; no headline number refresh vs 0.18.x paths).
- CI / tests: Grouped output comparisons in
tests/test_v018_features.pysort by group key where row order is not API-guaranteed (stablepytest-xdiston Linux).
Details¶
See ROADMAP Shipped in 0.19.0. Release hygiene: make check-full, cargo test --all-features, cargo check --no-default-features, full pytest before tag; GitHub Actions install deps aligned with DEVELOPER / pyproject.toml [dev].
Deferred to v1.0.0 tag (not blocking 0.19.0): formal 1.x semver publication, PyPI packaging dry-run narrative, SBOM/supply-chain notes, support matrix as a 1.0.x commitment, and README/index “1.0 leads” copy—see ROADMAP Planned v1.0.0.
[0.18.0] — 2026-03-22¶
Highlights¶
- Grouped execution errors: Polars
collect()failures duringgroup_by().agg()may include(group_by().agg())in theValueErrortext (viapolars_err_ctx) so they are identifiable as grouped aggregation runtime errors. See EXECUTION. - Maps: Non-string map keys (
dict[int, T], non-UTF-8 Arrow map keys) remain unsupported and are explicitly deferred for this release (SUPPORTED_TYPES, ROADMAP Later). - Documentation: Post–P7 note in POLARS_TRANSFORMATIONS_ROADMAP (phases complete; further parity is additive). PARITY_SCORECARD, PYSPARK_PARITY, DEVELOPER, ROADMAP updated. No new PySpark
sql.functionswrappers or table API changes. - Tests: Hypothesis + integration coverage for
group_by/join(tests/test_hypothesis_properties.py,tests/test_v018_features.py); Rustpolars_err_ctxmessage format (execute_polars/common.rs,polars_err_format_tests).
Details¶
See ROADMAP Shipped in 0.18.0. INTERFACE_CONTRACT aggregation rules are unchanged; the doc notes optional group_by().agg() error-message context.
[0.17.0] — 2026-03-18¶
Highlights¶
- Maps (string keys): Documented and tested Expr behavior for
map_get/map_contains_keyon columns ingested from PyArrowmap<utf8, …>(missing key → null). Non-string Pythondict[int, T]map keys remain unsupported (deferred); see ROADMAP Later. - PySpark façade:
PYSPARK_PARITY.md— new thinpydantable.pyspark.sql.functionswrappers:str_replace,regexp_replace(alias, literal replace),strip_prefix,strip_suffix,strip_chars,strptime,binary_len,list_len,list_get,list_contains,list_min,list_max,list_sum(coreExpr/ Rust lowering unchanged). Tests:tests/test_pyspark_sql.py. - Docs: Refreshed
PARITY_SCORECARD.md,POLARS_TRANSFORMATIONS_ROADMAP.md,SUPPORTED_TYPES.md(map + Arrow ingest note).
Details¶
See ROADMAP Shipped in 0.17.0.
[0.16.1] — 2026-03-27¶
Fixed¶
- Expression typing: Binary arithmetic on
dict[str, T]map columns (for exampledf.m + 1ordf.m + df.m) now raisesTypeErrorin Rust (infer_arith_dtype) instead of panicking on an internal unwrap. Regression test:tests/test_expr_070_surfaces.py. - Constructors:
validate_columns_strict(and thereforeDataFrame[Schema](pa.Table)/RecordBatch) imported Arrow conversion helpers from the wrong submodule (pydantable.schema.io, which does not exist). Imports now usepydantable.io, matchingDataFrameModel. Regression:tests/test_arrow_interchange.py(test_dataframe_generic_accepts_pa_table).
[0.16.0] — 2026-03-26¶
Highlights¶
- Arrow interchange:
read_parquetandread_ipc(optionalas_streamfor streaming IPC) returndict[str, list]forDataFrame/DataFrameModel.to_arrow/ato_arrowmaterialize a PyArrowTableafter the same engine path asto_dict(not zero-copy). Optional extrapydantable[arrow](pyarrow>=14). Constructors acceptpa.Table/RecordBatchwhenpyarrowis installed. - FastAPI:
FASTAPI.md— multipart Parquet upload,Dependsexecutor pattern, background-task notes, 422 vs application error guidance.python-multipartin[dev]and CI workflows. Tests:tests/test_fastapi_recipes.py(multipart + invalid body 422),tests/test_arrow_interchange.py;scripts/verify_doc_examples.pyextended. - Docs:
EXECUTION.md,SUPPORTED_TYPES.md,INTERFACE_CONTRACT.md,ROADMAP.md,README.md,index.md.
Details¶
See ROADMAP Shipped in 0.16.0. Sync read_parquet / read_ipc are blocking; use asyncio.to_thread or an executor from async def routes for large files if loop latency matters.
- CI / release:
actions/cache@v5inci.ymlandrelease.yml(clears GitHub Actions Node 20 deprecation warnings foractions/cache@v4).release.ymlusesmaturin build+twine upload --skip-existingper platform instead of deprecatedmaturin publish(see PyO3/maturin#2334);TWINE_USERNAME=__token__andPYPI_API_TOKENunchanged.
[0.15.0] — 2026-03-25¶
Highlights¶
- Async materialization:
acollect,ato_dict,ato_polarsonDataFrame;DataFrameModeladds the same plusarowsandato_dicts. Work runs inasyncio.to_threador an optionalexecutor=. See EXECUTION, FASTAPI. - FastAPI:
async defroute examples,lifespan+ThreadPoolExecutor, andStreamingResponseguidance (manual chunking; no built-in row iterator yet).tests/test_fastapi_recipes.pyandscripts/verify_doc_examples.pyextended. - Arrow-native maps: PyArrow
map<utf8, …>arrays (and chunked) ingest fordict[str, T]columns; convert to Pythondictcells. String keys only;strictchecks scalar value types (nested map values: best-effort). Tests:tests/test_pyarrow_map_ingest.py. SUPPORTED_TYPES updated. - PySpark façade:
trim,abs,round,floor,ceilinpydantable.pyspark.sql.functions(and package__all__). PYSPARK_PARITY updated. - Constructor cleanup:
validate_dataremoved fromDataFrameandDataFrameModel. Ingest depth usestrusted_modeonly (off/shape_only/strict; omit for full per-element validation). Passingvalidate_data=...raisesTypeError. Removed internal schema helpers_VALIDATE_DATA_KW_UNSET,_warn_validate_data_kw_deprecated, and_coerce_validate_data_kw. Direct callers ofvalidate_columns_strictmay still usevalidate_elementsas a legacy bridge. Docs (DATAFRAMEMODEL,FASTAPI,SUPPORTED_TYPES,PERFORMANCE, etc.) describetrusted_modeonly on constructors. - Dev:
pytest-asyncioin[dev];asyncio_mode = autoinpyproject.toml.
Tests¶
tests/test_async_materialization.py,tests/test_pyarrow_map_ingest.py,tests/test_v015_features.py,tests/test_v015_constructor_api.py; extendedtests/test_fastapi_recipes.py.tests/test_v014_features.py,tests/test_dataframe_model.py,tests/test_dataframe_ops.py:TypeErrorwhenvalidate_datais passed; trusted paths usetrusted_modeonly.
Details¶
See ROADMAP Shipped in 0.15.0. Sync collect / to_dict / to_polars are unchanged aside from constructor kwargs (drop validate_data; use trusted_mode). You may replace manual asyncio.to_thread wrappers with acollect / ato_*.
rust_version() in the extension reports env!("CARGO_PKG_VERSION") so it matches pyproject.toml / Cargo.toml.
[0.14.0] — 2026-03-23¶
Highlights¶
- Window
orderBynull placement:nulls_lastonWindow.partitionBy(...).orderBy(...)(per-column list or bool); framed windows use all keys; unframed Polars.overuses the first key forSortOptions. Docs: WINDOW_SQL_SEMANTICS, INTERFACE_CONTRACT. - Trusted
shape_only:pydantable.DtypeDriftWarningwhen data would failstrict; envPYDANTABLE_SUPPRESS_SHAPE_ONLY_DRIFT_WARNINGS=1to silence. See SUPPORTED_TYPES. validate_datadeprecation: explicitvalidate_data=withouttrusted_moderaisesDeprecationWarning(removal shipped in 0.15.0). See DATAFRAMEMODEL, FASTAPI.- PySpark façade:
dayofmonth,lower,upperinpydantable.pyspark.sql.functions. PYSPARK_PARITY updated. - FastAPI DX:
TestClientrecipes and OpenAPI notes in FASTAPI;tests/test_fastapi_recipes.py;fastapi/httpxin[dev]and CI. - Hypothesis: extra property test for
with_columnsidentity; DEVELOPER documents running property tests. - Tests:
tests/test_v014_features.pycoversDtypeDriftWarning(including multi-column drift), windowlag/row_numberwith null sort order, FastAPI 422 / OpenAPIrequestBody, and PySparkdayofmonth/lower/upper. (Constructorvalidate_datawas deprecated here and removed in 0.15.0 — see changelog [0.15.0].)
Details¶
See ROADMAP Shipped in 0.14.0.
[0.13.0] — 2026-03-23¶
Highlights¶
- Stabilization + combined scope: FASTAPI —
trusted_mode/validate_data, column-shapeddict[str, list]bodies, sync materialization and pointers forward to async work (shipped in 0.15.0); trust-boundary guidance for large / pre-validated tables and Polars / Arrow; install notes for PyPI wheels vs git builds. - Sync-only I/O (at time of 0.13.0): EXECUTION and PERFORMANCE described blocking materialization; async APIs arrived in 0.15.0 (EXECUTION, FASTAPI). Tuning text prefers
trusted_modealongsidevalidate_data. - Window semantics (docs): null ordering and
CURRENT ROW/ peer framing in WINDOW_SQL_SEMANTICS and INTERFACE_CONTRACT;Windowdocstring inwindow_spec.py. (User-facingNULLS FIRST/LASTshipped in 0.14.0.) - Trusted
strict+ PyArrow:isinstance(..., pa.Array | pa.ChunkedArray)in trusted buffers (concrete array types such asInt64Array); stricter scalars for int, float, decimal, enum, uuid, and temporal Arrow types. Tests intests/test_trusted_strict_pyarrow.py;pyarrow>=14in[dev]and CI. (shape_onlydrift warnings shipped in 0.14.0.) - Performance:
benchmarks/framed_window_bench.py,benchmarks/trusted_polars_ingest_bench.py; PERFORMANCE table and cross-links. - Discoverability: index, README roadmap table, and cross-links across INTERFACE_CONTRACT, WINDOW_SQL_SEMANTICS, POLARS_TRANSFORMATIONS_ROADMAP.
- CI:
RUSTSEC-2025-0141audit-step comment; GitHub Actions versions reviewed (actions/cache@v4, etc.). - Examples:
scripts/verify_doc_examples.pycovers new FastAPI patterns (trusted ingest + columnar body).
Details¶
Release audit: make check-full and full pytest green with a release maturin build. No regressions to rangeBetween, trusted strict, or map_from_entries beyond documentation and PyArrow strict hardening above.
Roadmap (editorial): 0.13.0 ships the documentation-first stabilization track together with the scope formerly planned as Remaining in 0.13.x / 0.14.0. Async materialization shipped in 0.15.0 (see changelog [0.15.0]).
See FASTAPI, EXECUTION, PERFORMANCE, ROADMAP, and INTERFACE_CONTRACT.
[0.12.0] — 2026-03-22¶
Highlights¶
- Multi-key
rangeBetween: aggregate window frames may use multipleorderBycolumns; sort is lexicographic and range bounds apply to the first sort key (PostgreSQL-style). Documented in WINDOW_SQL_SEMANTICS. - Trusted
strictingest: Polars columns are matched structurally to nested annotations (list/dict[str, T]/ nestedSchemastructs); columnar Python paths get the same nested shape checks. - Contracts and parity docs: refresh
INTERFACE_CONTRACT, PySpark UI/scorecard, roadmap; add duplicate-key policy formap_from_entries(SUPPORTED_TYPES). - Regression tests: broader coverage for multi-key
rangeBetween(desc/mixedorderBy, partitions,date/datetimeaxis,window_mean/window_min), PySpark window mirrors, trustedstrictnested paths (Python + Polars),map_from_entriesduplicate keys, andDataFrame/DataFrameModelstrict parity.
Details¶
See INTERFACE_CONTRACT, PYSPARK_PARITY, PARITY_SCORECARD, WINDOW_SQL_SEMANTICS, and ROADMAP.
[0.11.0] — 2026-03-23¶
Highlights¶
- Window range semantics v2:
rangeBetweensupports numeric,date,datetime, anddurationorder keys (singleorderBykey), with deterministic boundary-inclusive behavior. - Map ergonomics expanded: add
map_from_entries()and PySpark-compatibleelement_at()alias; map entry roundtrip coverage expanded. - Trusted ingest modes: add explicit trusted modes (
shape_only,strict) alongside compatibility withvalidate_data, including stricter nullability and dtype checks for trusted columnar paths. - Parity coverage expansion: add dedicated DataFrame/DataFrameModel parity tests and additional PySpark map parity contracts.
- Release hardening: update docs/contracts and version metadata for the 0.11.0 line.
Details¶
See INTERFACE_CONTRACT, PYSPARK_PARITY, SUPPORTED_TYPES, and ROADMAP.
[0.10.0] — 2026-03-23¶
Highlights¶
- Framed windows expanded: framed execution now covers
window_mean,window_min,window_max,lag,lead,rank, anddense_rankin addition torow_number/window_sum. - Map utilities: add
map_keys()andmap_values()to complementmap_len,map_get, andmap_contains_key. - Parity and interop hardening: PySpark parity tests for framed windows/map utilities plus trusted constructor coverage for Polars DataFrame input (
validate_data=False). - Window range contracts tightened:
rangeBetweennow enforces exactly oneorderBykey for supported aggregate frames, with explicit typed errors. - Map v2 parity expanded: add
map_entries()and PySpark wrappers formap_len,map_get, andmap_contains_key. - Trusted ingest hardening: Polars trusted constructor path rejects nulls in non-nullable schema fields when
validate_data=False.
Details¶
See INTERFACE_CONTRACT, PYSPARK_PARITY, SUPPORTED_TYPES, and ROADMAP.
[0.9.0] — 2026-03-23¶
Highlights¶
- Bad-input ingest controls:
ignore_errors=Truewithon_validation_errors=...(row_index,row,errors) acrossDataFrameModelandDataFrameconstructor paths. - Framed windows:
rowsBetween/rangeBetweenframe metadata is wired through Python/PySpark/Rust; framed execution is supported forrow_number/window_sum(rangeBetweenon integer order keys, range offset computed from firstorderBykey). - Map v2 values:
dict[str, T]map columns now support nested JSON-like value dtypes (lists/maps/structs and nullable unions), withmap_len,map_get, andmap_contains_keybehavior preserved. - Release hardening: expanded 0.9.0 edge-case tests and full quality-gate coverage (
make check-full, docs example validation, Sphinx warnings-as-errors build).
Details¶
See DATAFRAMEMODEL, INTERFACE_CONTRACT, SUPPORTED_TYPES, PYSPARK_PARITY, and ROADMAP.
[0.8.0] — 2026-03-23¶
Highlights¶
- Global row count:
global_row_count()and PySparkfunctions.count()with no column (count(*)-style) forDataFrame.select. - Casts:
str→date/datetimeviaExpr.cast(...)(Polars parsing); usestrptimefor fixed formats. - Maps:
Expr.map_get(key)/map_contains_key(key)ondict[str, T]columns (list-of-struct encoding). - Windows:
window_min/window_max; IR carries optionalWindowFrame::Rowsfor future Spark-stylerowsBetween(lowering not yet implemented). - Docs:
INTERFACE_CONTRACT,PYSPARK_PARITY,SUPPORTED_TYPESupdates.
Details¶
See SUPPORTED_TYPES, PYSPARK_PARITY, ROADMAP, and INTERFACE_CONTRACT.
Testing¶
- Broader integration tests for 0.7.0 / 0.8.0 surfaces (
test_v070_features,test_v080_features), including PySparkF.count()with no column.
Documentation¶
- README feature bullets; INTERFACE_CONTRACT (global
select); POLARS_WORKFLOWS (single-row globals example); index, EXECUTION, PYSPARK_UI, PYSPARK_PARITY, PYSPARK_INTERFACE.
[0.7.0] — 2026-03-23¶
Highlights¶
- Global aggregates:
global_count,global_min,global_maxforDataFrame.select(non-nullcount; Polarsmin/max); PySparkfunctions.count/min/maxon typed columns. - Windows:
lag/lead(Polarsshift+.over(...)); still no SQL-stylerowsBetween/rangeBetweenin the IR (seeINTERFACE_CONTRACT.md). - Temporal:
Expr.strptime/Expr.unix_timestamp, PySparkto_date(..., format=...)andunix_timestamp;dt_nanosecondfordatetimeandtime. - Maps / binary:
Expr.map_len(),Expr.binary_len()(byte length ofbytescolumns).
Details¶
See SUPPORTED_TYPES, PYSPARK_PARITY, and ROADMAP.
[0.6.0] — 2026-03-22¶
Highlights¶
- Scalar types:
datetime.time,bytes, and homogeneousdict[str, T]map columns (Polars-backed I/O; map execution surface is intentionally small). - Windows:
row_number,rank,dense_rank,window_sum,window_meanwithWindow.partitionBy(...).orderBy(...)/.spec(); RustExprNode::Window+ Polars.over(...). - Global aggregates (Phase D):
DataFrame.select(...)withglobal_sum/global_meanor PySparkfunctions.sum/avg/meanon typed columns — single-row results. - PySpark façade:
dropDuplicates(subset=...), date helpers (year,month, …,to_date), documentation parity fixes (union, types).
Development and testing¶
[dev]optional dependencies includenumpy(forcollect(as_numpy=...)tests),pytest-cov,coverage,pytest-xdist, andpolars.- CI runs parallel
pyteston Linux, optional coverage XML on Ubuntu + Python 3.11, installs Polars on that leg soto_polars()tests run, runsscripts/verify_doc_examples.py, and usesGITHUB_ACTIONS-scaled performance guardrails.
Details¶
See SUPPORTED_TYPES, PYSPARK_PARITY, and ROADMAP.
[0.5.0] — 2026¶
Highlights¶
- PydanTable naming and docs alignment with the ROADMAP (0.5.x line).
- Typed
DataFrameModelandDataFrame[Schema]with a Rust execution core (Polars-backed in the native extension). - Materialization:
collect()returns Pydantic row models;to_dict()/collect(as_lists=True)for columnar data; optionalto_polars()with the[polars]extra. - Rich column types: nested Pydantic models (structs), homogeneous
list[T],uuid.UUID,decimal.Decimal,enum.Enum, plusexplode,unnest, and extendedExprhelpers (see SUPPORTED_TYPES).
Details¶
For phase history and future direction, see ROADMAP and POLARS_TRANSFORMATIONS_ROADMAP.