Optional formats and bridges (pydantable.io.extras)¶
Primary: pass returned dict[str, list] to DataFrame[Schema](cols) or MyModel(cols) (see DATAFRAMEMODEL). Secondary: pydantable.io.extras helpers (also re-exported on pydantable.io where applicable). Many are experimental; pass experimental=True or set PYDANTABLE_IO_EXPERIMENTAL=1.
There are no DataFrameModel.read_excel (or similar) classmethods — extras always return a column dict first.
DataFrame / DataFrameModel¶
After read_excel, read_delta, read_bigquery, …:
MyModel(cols)orDataFrame[Schema](cols)
Stdin: read_csv_stdin → MyModel(cols) the same way.
pydantable.io.extras¶
Spreadsheets¶
read_excel(path, *, sheet_name=0, experimental=True)—pydantable[excel](openpyxl).
Lake / columnar files¶
read_delta(path, *, experimental=True)— Delta via PyArrow dataset (pydantable[arrow]).read_avro(path, *, experimental=True)— PyArrow Avro (pydantable[arrow]).read_orc(path, *, experimental=True)— PyArrow ORC (pydantable[arrow]).
Cloud warehouses (SDK bridges)¶
read_bigquery(query, *, project=None, experimental=True, **kwargs)—pydantable[bq];kwargs→bigquery.Client.read_snowflake(sql, *, experimental=True, **connect_kwargs)—pydantable[snowflake].
Streaming / messaging¶
read_kafka_json_batch(topic, *, bootstrap_servers, max_messages=100, experimental=True, **consumer_config)—pydantable[kafka].- Batch iterators (1.5.0+): when a backend supports chunked reads,
iter_excel,iter_delta,iter_avro,iter_orc,iter_bigquery,iter_snowflake, anditer_kafka_jsonyield the samedict[str, list]shape asiter_csv/iter_parquet(see IO_OVERVIEW). Some sources still buffer internally (e.g. full JSON-array loads); check docstrings and optional extras.
Stdin / stdout¶
read_csv_stdin(stream=None, *, engine="auto")write_csv_stdout(data, stream=None, *, engine="auto")— usesexport_csvinternally; forDataFrame, preferto_dict()+export_csvorwrite_csvfrom a lazy frame.
Async CSV (RAP)¶
pydantable.io.rap_support.aread_csv_rap — see IO_CSV.
Runnable examples¶
Stdin / stdout (no optional extras)¶
"""CSV round-trip with lazy :meth:~pydantable.dataframe_model.DataFrameModel.read_csv
+ :meth:~pydantable.dataframe.DataFrame.to_dict and write_csv.
For true stdin/stdout streaming helpers, see :doc:IO_EXTRAS (optional extras).
Run::
python docs/examples/io/extras_stdin_stdout.py
"""
from future import annotations
import os import tempfile from pathlib import Path
from pydantable import DataFrameModel
class Shipment(DataFrameModel): """Inbound CSV from a carrier (numeric codes in text-heavy exports)."""
order_id: int
carton_id: int
class ShipmentStr(DataFrameModel): """Same layout written back out as strings (labels / IDs)."""
order_id: str
carton_id: str
def main() -> None: with tempfile.NamedTemporaryFile( mode="w", suffix=".csv", delete=False, encoding="utf-8" ) as f: f.write("order_id,carton_id\n44021,90001\n") path = f.name try: d = Shipment.read_csv(path).to_dict() assert [int(x) for x in d["order_id"]] == [44021] assert [int(x) for x in d["carton_id"]] == [90001] finally: os.unlink(path)
with tempfile.NamedTemporaryFile(suffix=".csv", delete=False) as out:
out_path = out.name
try:
ShipmentStr({"order_id": ["44021"], "carton_id": ["90001"]}).write_csv(out_path)
body = Path(out_path).read_text(encoding="utf-8")
assert "order_id" in body and "44021" in body
finally:
os.unlink(out_path)
print("extras_stdin_stdout: ok")
if name == "main": main()
Output¶
Optional: Excel (pydantable[excel])¶
Install pydantable[excel] (openpyxl). read_excel / iter_excel live in pydantable.io.extras; they return dict[str, list] batches (not a Polars lazy scan). Wrap with DataFrameModel(...) for typed rows. See the module docstrings for experimental=True and batch_size.
Other helpers follow the same pattern: install the matching extra, then call the function and wrap with DataFrame / DataFrameModel.
See also¶
IO_OVERVIEW · IO_CSV · DATA_IO_SOURCES (tiering) · pyproject.toml optional dependency groups