SQLModel-first SQL I/O roadmap¶
This document records the plan and phased roadmap for pydantable’s SQLModel-first SQL I/O. v1.13.0 is the release where the new APIs are the recommended default and Phases 0–6 are implemented (see status lines under each phase below).
Goals (what “SQLModel-first” means)¶
- SQLModel is required for SQL I/O: if you want to read/write from a database using
from pydantable import …(SQL helpers) orDataFrameModel, you installpydantable[sql], which includes SQLModel + SQLAlchemy (drivers remain user-installed). - User defines table schema explicitly via a
SQLModelclass withtable=True. pydantable does not infer SQL DDL from column dict samples. - Reads return column dict batches (
dict[str, list]) and can be wrapped into a typedDataFrameModel(MyDF(cols, ...)) just like today. - Writes are schema-driven:
if_exists="replace"is supported, but it requires a fully defined SQLModel table class for the new schema.
Non-goals (v1.13.0)¶
- A SQL execution engine for pydantable logical plans (lowering transforms to SQL) — that’s “future engine” scope.
- Full ORM relationship loading / joined eager loads — SQLModel support is for table schemas + queries, not ORM graphs.
- Automatic generation of SQLModel from
DataFrameModel(or vice versa) as the primary UX. We may add helpers later, but v1.13.0 prioritizes explicit user-owned models.
Design principles¶
- Schema authority: SQL table schema comes from SQLModel, dataframe schema comes from
DataFrameModelannotations + Pydantic. - Two-phase typing: operations that can’t be statically typed remain
DataFrameModel[Any], and users can “re-type” withas_model(...). - Safety by default: dangerous actions (dropping tables) require explicit user intent and trusted identifiers.
- Streaming-friendly: preserve the current
iter_*/fetch_*split and batch semantics.
Phase 0 — Dependency + policy decision (v1.13.0)¶
Deliverables
pydantable[sql]includessqlmodel(and stillsqlalchemy).- Update docs to state: SQL I/O requires SQLModel (DBAPI drivers still required).
- Define import / availability checks and error messages (e.g. a single
MissingOptionalDependencyError-style message forsqlmodel).
Acceptance criteria
- Installing
pydantable[sql]yieldsimport sqlmodelsuccess. - Importing
pydantable.iodoes not require SQLModel unless SQL functions are used (keep optional deps optional).
Status (v1.13.0 dev): Implemented — sqlmodel is listed under pydantable[sql] (and dev/docs extras where applicable); lazy import via pydantable.io.sqlmodel_read._require_sqlmodel() raises pydantable.errors.MissingOptionalDependency with an install hint.
Phase 1 — New SQLModel-first read APIs (v1.13.0)¶
Problem: current read path is raw sql: str + text(sql). It’s flexible, but it makes the schema contract implicit.
Deliverables
- Add
pydantable.io.fetch_sqlmodel(...)andpydantable.io.iter_sqlmodel(...). - Add async variants mirroring the existing pattern:
afetch_sqlmodel(..., executor=None)aiter_sqlmodel(..., executor=None)- Ensure parity with existing knobs where it matters:
batch_sizeauto_stream+ threshold behavior (return aStreamingColumns-like object or keep the same type)
Proposed API shape
from sqlmodel import SQLModel
from sqlalchemy.engine import Connection, Engine # bind types (SQLModel uses SQLAlchemy engines under the hood)
def fetch_sqlmodel(
model: type[SQLModel],
bind: str | Engine | Connection,
*,
where: object | None = None,
parameters: dict[str, object] | None = None,
columns: list[object] | None = None,
order_by: list[object] | None = None,
limit: int | None = None,
batch_size: int | None = None,
auto_stream: bool = True,
auto_stream_threshold_rows: int | None = None,
) -> dict[str, list[object]] | object: ...
def iter_sqlmodel(
model: type[SQLModel],
bind: str | Engine | Connection,
*,
where: object | None = None,
parameters: dict[str, object] | None = None,
columns: list[object] | None = None,
order_by: list[object] | None = None,
limit: int | None = None,
batch_size: int | None = None,
) -> "Iterator[dict[str, list[object]]]": ...
Notes
where/order_by/columnscan be SQLAlchemy expressions; keep typing permissive in v1.13.0 and tighten later.parametersexists for cases where thewhereclause includes bound params.
Acceptance criteria
- A round-trip example can do:
cols = fetch_sqlmodel(UserTable, engine)df = UsersDF(cols, trusted_mode=...)iter_sqlmodelyields rectangular batches with stable column keys.
Status (v1.13.0 dev): Implemented — synchronous fetch_sqlmodel / iter_sqlmodel in python/pydantable/io/sqlmodel_read.py, async afetch_sqlmodel / aiter_sqlmodel in python/pydantable/io/__init__.py, re-exported from pydantable. Batching and StreamingColumns / auto_stream_threshold_rows match fetch_sql. Tests: tests/test_sqlmodel_io_phase01.py.
Phase 2 — New SQLModel-first write APIs (v1.13.0)¶
Problem: current if_exists="replace" uses inferred SQL column types (bool|int|float|text), which is not robust.
Deliverables
- Add
pydantable.io.write_sqlmodel(data, model, bind, ..., if_exists="append|replace"). - Add batch helpers mirroring existing
write_sql_batches: write_sqlmodel_batches(batches, model, bind, ...)- async variants using
asyncio.to_thread. - Implement
if_exists="replace"as: - drop existing table (if present)
- create new table from
model.__table__ - insert rows
Proposed API shape
def write_sqlmodel(
data: dict[str, list[object]],
model: type[SQLModel],
bind: str | Engine | Connection,
*,
if_exists: str = "append",
chunk_size: int | None = None,
validate_rows: bool = False,
replace_ok: bool = False,
) -> None: ...
Safety
- Require
replace_ok=True(or similar) whenif_exists="replace"to make destructive intent explicit.
Acceptance criteria
replaceuses the SQLModel schema (no inference).- Writes are chunked, transactional, and error messages include useful context (table name, offending column, row index if validation is enabled).
Status (v1.13.0 dev): Implemented — write_sqlmodel in python/pydantable/io/sqlmodel_write.py (chunked inserts, schema= must match model.__table__.schema, strict column-key alignment with the table, replace_ok required for if_exists="replace", optional validate_rows with row index on ValidationError). write_sqlmodel_batches, awrite_sqlmodel, and awrite_sqlmodel_batches in python/pydantable/io/__init__.py; write_sqlmodel registered as a built-in writer. Package re-exports match reads. Tests: tests/test_sqlmodel_io_phase02.py.
Phase 3 — DataFrameModel conveniences (v1.13.0)¶
Deliverables
- Add
MyDF.write_sqlmodel(UserTable, bind=..., if_exists=..., ...)instance method. - Add
MyDF.iter_sqlmodel(UserTable, bind=..., ...)classmethod that yieldsIterator[MyDF]batches (mirroringiter_*file readers). - Optionally add
MyDF.fetch_sqlmodel(...) -> MyDFas a convenience wrapper over constructor +fetch_sqlmodel.
Acceptance criteria
- Services can be written in a “model-first” style on both ends:
- SQLModel defines the DB table
DataFrameModeldefines the dataframe schema and Pydantic behavior
Status (v1.13.0 dev): Implemented — DataFrameModel exposes SQLModel I/O alongside the pydantable.io functions: instance write_sqlmodel / awrite_sqlmodel (they call to_dict() then the dict writers); classmethods fetch_sqlmodel, afetch_sqlmodel (AwaitableDataFrameModel), iter_sqlmodel, aiter_sqlmodel, and dict-path write_sqlmodel_data / awrite_sqlmodel_data (the Async namespace write_sqlmodel delegates to awrite_sqlmodel_data). Implementation: python/pydantable/dataframe_model.py. Tests: tests/test_sqlmodel_dataframe_model.py.
Phase 4 — Deprecate/reshape the legacy SQLAlchemy-string APIs (v1.13.0)¶
This project already has users of:
fetch_sql(sql: str, ...)iter_sql(sql: str, ...)write_sql(data, table_name, ...)
Deliverables (documentation / policy)
- Update docs to recommend SQLModel-first and describe raw SQL as the advanced / migration path — see IO_SQL, DATAFRAMEMODEL, and this page.
- Keep legacy string-SQL APIs (
fetch_sql,iter_sql,write_sql, and async mirrors) for compatibility.pydantable[sql]remains the single extra for SQL I/O (it bundles SQLModel); there is no separate extra split for “raw SQL only.”
Deliverables (runtime migration path, v1.13.0)
- Deprecation warnings on legacy string-SQL APIs, with replacements
fetch_sql_raw,iter_sql_raw,write_sql_raw,afetch_sql_raw,aiter_sql_raw,awrite_sql_raw(and deprecation ofwrite_sql_batches/awrite_sql_batches). - Explicit
*_rawnaming for string-SQL I/O (see IO_SQL).
Status (v1.13.0): Implemented — docs and DataFrameModel pointers in place; warnings in python/pydantable/io/sql.py and python/pydantable/io/__init__.py; tests tests/test_sql_string_deprecation.py; suite-wide DeprecationWarning filter in pyproject.toml for existing tests; plugin registry marks legacy readers/writers stable=False and *_raw as stable=True.
Removal policy
- Legacy names remain until a major release (no earlier than
2.0.0); see VERSIONING.
Acceptance criteria
- Users can migrate without losing streaming/batching behavior.
- Docs clearly state the “two-tier” story: SQLModel-first APIs are the default; explicit
*_rawstring-SQL is advanced; deprecated unprefixed names warn.
Phase 5 — Typed schema bridging helpers (v1.13.0)¶
Goal: reduce friction between SQLModel and DataFrameModel while preserving explicit ownership.
Deliverables
pydantable.io.sqlmodel_columns(model)→ ordered list of SQLAlchemy column keys formodel.__table__(same keys as full-tablefetch_sqlmodel/write_sqlmodel).MyDF.assert_sqlmodel_compatible(UserTable, *, direction="read"|"write", column_map=None, read_keys=None)— dev-time check;direction="write"enforces exact key parity with the table (aftercolumn_map);direction="read"requires every mapped dataframe field to appear in the expected result keys (default: full table, or passread_keysforfetch_sqlmodel(..., columns=...)).- Optional
column_map: dict[str, str]mapping dataframe field name → SQL column key when names differ.
Status (v1.13.0): Implemented — python/pydantable/io/sqlmodel_schema.py, DataFrameModel.assert_sqlmodel_compatible in python/pydantable/dataframe_model.py; tests tests/test_sqlmodel_bridge_phase05.py; docs IO_SQL, DATAFRAMEMODEL.
Phase 6 — Documentation + examples + testing gates (v1.13.0)¶
Docs
- Update IO_SQL to reflect SQLModel-first APIs and the
replace_oksafety knob. - Add a runnable example:
- SQLite round-trip using SQLModel table + pydantable
DataFrameModel - streaming example mirroring the existing
iter_sqlexample
Tests
- Unit tests for:
- batch shapes for
iter_sqlmodel replaceDDL correctness (table columns and basic types)appendinto existing tablereplace_okguardrail- Integration tests using SQLite (like current SQL tests) that run under CI without external services.
Release gate
make check-fullgreen on the release commit- docs examples validated (if you keep example-verification tooling)
Status (v1.13.0): Implemented — IO_SQL documents raw string SQL vs SQLModel-first APIs; the runnable doc example is docs/examples/io/sql_sqlite_sqlmodel_roundtrip.py (tests/test_doc_io_examples.py). SQLModel I/O behavior is covered by tests/test_sqlmodel_io_phase01.py, tests/test_sqlmodel_io_phase02.py, tests/test_sqlmodel_dataframe_model.py.
Proposed milestone mapping¶
- M1 (core): Phase 0–2 (extras + SQLModel read/write APIs)
- M2 (DX): Phase 3 (DataFrameModel convenience wrappers)
- M3 (ship): Phase 6 (docs + examples + tests) — v1.13.0
- Also in v1.13.0: Phase 4 (legacy string-SQL deprecation +
*_rawAPIs); Phase 5 (schema bridging)