Skip to content

CSV I/O

Primary: DataFrame[Schema].read_csv, write_csv, and DataFrameModel classmethods / instance methods. Secondary: pydantable.ioScanFileRoot, materialize_csv, export_csv, fetch_csv_url.

Read (sources)

DataFrame[Schema] and DataFrameModel

  • DataFrame[Schema].read_csv(path, *, columns=None, **scan_kwargs)
  • MyModel.read_csv(...), await MyModel.aread_csv(..., executor=None)
  • materialize_csv / await amaterialize_csv from pydantable.io, then MyModel(cols) for eager typed frames

pydantable.io

  • read_csv, aread_csv — lazy ScanFileRoot
  • materialize_csv, amaterialize_csv — eager dict[str, list] (engine, use_rap on sync path)
  • fetch_csv_url — HTTP(S) → temp file → read; temp removed after read
  • iter_csv, aiter_csv, write_csv_batches — stdlib csv batching over paths or text streams; cell values are strings (or None for short rows). See IO_OVERVIEW (Batched column dict I/O).

scan_kwargs: for example separator, has_header, skip_rows, skip_lines, n_rows, infer_schema_length, ignore_errors, low_memory, rechunk, glob, cache, quote_char, eol_char, include_file_paths, row_index_name, row_index_offset, raise_if_empty, truncate_ragged_lines, decimal_comma, try_parse_dates. Unknown keys raise ValueError. See DATA_IO_SOURCES.

Paths, directories, and glob

glob defaults to true in Polars LazyCsvReader; pass glob=False via scan_kwargs to scan a single path literally. A directory path or a pattern such as *.csv expands to multiple files; rows are concatenated in Polars scan order (see tests tests/test_csv_scan_directory_b2.py). In Polars 0.53, the lazy CSV scan wires HiveOptions::new_disabled() into the unified scan, so hive-style partition columns from directory paths are not applied for CSV—see Polars 0.53 vs pydantable scan audit.

use_rap=True ( materialize_csv only): uses aread_csv_rap when no event loop; in async code await aread_csv_rap(path) from pydantable.io.rap_support.

Write (targets)

DataFrame[Schema] and DataFrameModel

  • df.write_csv(path, *, separator=..., compression=..., write_kwargs=..., streaming=...)
  • model.write_csv(...) — same.

write_kwargs: include_header, include_bom. See DATA_IO_SOURCES.

pydantable.io

  • export_csv, aexport_csv — eager column dict → file.
  • write_csv_batches — append many rectangular batches to one CSV (mode="w" / "a", write_header).

Note

If you pass engine="rust" to export_csv, the Rust writer may require the optional polars package at runtime. Prefer engine="auto" unless you want to force the Rust path.

Runnable example

Run conventions: IO_OVERVIEW (Runnable example).

python docs/examples/io/csv_lazy_roundtrip.py

"""Lazy CSV: European-style ; separator (common ERP exports) and lazy write_csv.

Needs pydantable._core. Run::

python docs/examples/io/csv_lazy_roundtrip.py

"""

from future import annotations

import tempfile from pathlib import Path

from pydantable import DataFrameModel

class InventorySnapshot(DataFrameModel): """SKU-level stock row from a vendor CSV (semicolon-delimited)."""

sku: int
qty_on_hand: int

def main() -> None: with tempfile.TemporaryDirectory() as data_dir: # Many EU exports use ';' because ',' is the decimal separator in locale. erp_export = Path(data_dir) / "stock_export.csv" normalized = Path(data_dir) / "stock_utf8_comma.csv" erp_export.write_text("sku;qty_on_hand\n1001;42\n", encoding="utf-8")

    df = InventorySnapshot.read_csv(str(erp_export), separator=";")
    df.write_csv(str(normalized))

    d = InventorySnapshot.read_csv(str(normalized)).to_dict()
    assert [int(x) for x in d["sku"]] == [1001]
    assert [int(x) for x in d["qty_on_hand"]] == [42]

print("csv_lazy_roundtrip: ok")

if name == "main": main()

Output

csv_lazy_roundtrip: ok

See also

IO_OVERVIEW · IO_HTTP · EXECUTION