CSV I/O¶
Primary: DataFrame[Schema].read_csv, write_csv, and DataFrameModel classmethods / instance methods. Secondary: pydantable.io — ScanFileRoot, materialize_csv, export_csv, fetch_csv_url.
Read (sources)¶
DataFrame[Schema] and DataFrameModel¶
DataFrame[Schema].read_csv(path, *, columns=None, **scan_kwargs)MyModel.read_csv(...),await MyModel.aread_csv(..., executor=None)materialize_csv/await amaterialize_csvfrompydantable.io, thenMyModel(cols)for eager typed frames
pydantable.io¶
read_csv,aread_csv— lazyScanFileRootmaterialize_csv,amaterialize_csv— eagerdict[str, list](engine,use_rapon sync path)fetch_csv_url— HTTP(S) → temp file → read; temp removed after readiter_csv,aiter_csv,write_csv_batches— stdlibcsvbatching over paths or text streams; cell values are strings (orNonefor short rows). See IO_OVERVIEW (Batched column dict I/O).
scan_kwargs: for example separator, has_header, skip_rows, skip_lines, n_rows, infer_schema_length, ignore_errors, low_memory, rechunk, glob, cache, quote_char, eol_char, include_file_paths, row_index_name, row_index_offset, raise_if_empty, truncate_ragged_lines, decimal_comma, try_parse_dates. Unknown keys raise ValueError. See DATA_IO_SOURCES.
Paths, directories, and glob¶
glob defaults to true in Polars LazyCsvReader; pass glob=False via scan_kwargs to scan a single path literally. A directory path or a pattern such as *.csv expands to multiple files; rows are concatenated in Polars scan order (see tests tests/test_csv_scan_directory_b2.py). In Polars 0.53, the lazy CSV scan wires HiveOptions::new_disabled() into the unified scan, so hive-style partition columns from directory paths are not applied for CSV—see Polars 0.53 vs pydantable scan audit.
use_rap=True ( materialize_csv only): uses aread_csv_rap when no event loop; in async code await aread_csv_rap(path) from pydantable.io.rap_support.
Write (targets)¶
DataFrame[Schema] and DataFrameModel¶
df.write_csv(path, *, separator=..., compression=..., write_kwargs=..., streaming=...)model.write_csv(...)— same.
write_kwargs: include_header, include_bom. See DATA_IO_SOURCES.
pydantable.io¶
export_csv,aexport_csv— eager column dict → file.write_csv_batches— append many rectangular batches to one CSV (mode="w"/"a",write_header).
Note
If you pass engine="rust" to export_csv, the Rust writer may require the optional polars package at runtime. Prefer engine="auto" unless you want to force the Rust path.
Runnable example¶
Run conventions: IO_OVERVIEW (Runnable example).
"""Lazy CSV: European-style ; separator (common ERP exports) and lazy write_csv.
Needs pydantable._core. Run::
python docs/examples/io/csv_lazy_roundtrip.py
"""
from future import annotations
import tempfile from pathlib import Path
from pydantable import DataFrameModel
class InventorySnapshot(DataFrameModel): """SKU-level stock row from a vendor CSV (semicolon-delimited)."""
sku: int
qty_on_hand: int
def main() -> None: with tempfile.TemporaryDirectory() as data_dir: # Many EU exports use ';' because ',' is the decimal separator in locale. erp_export = Path(data_dir) / "stock_export.csv" normalized = Path(data_dir) / "stock_utf8_comma.csv" erp_export.write_text("sku;qty_on_hand\n1001;42\n", encoding="utf-8")
df = InventorySnapshot.read_csv(str(erp_export), separator=";")
df.write_csv(str(normalized))
d = InventorySnapshot.read_csv(str(normalized)).to_dict()
assert [int(x) for x in d["sku"]] == [1001]
assert [int(x) for x in d["qty_on_hand"]] == [42]
print("csv_lazy_roundtrip: ok")
if name == "main": main()