NDJSON I/O (newline-delimited JSON)¶

Primary: DataFrame[Schema].read_ndjson, write_ndjson, and DataFrameModel methods. Secondary: pydantable.io.

Each line of the file is one JSON object; the scanner infers or aligns columns across lines.

Read (sources)¶

`DataFrame[Schema]` and `DataFrameModel`¶

DataFrame[Schema].read_ndjson(path, *, columns=None, **scan_kwargs)
MyModel.read_ndjson(...), await MyModel.aread_ndjson(..., executor=None)
materialize_ndjson, await amaterialize_ndjson from pydantable.io, then MyModel(cols)

`pydantable.io`¶

read_ndjson, aread_ndjson
materialize_ndjson, amaterialize_ndjson
fetch_ndjson_url — HTTP(S) → temp file → read
iter_ndjson, iter_json_lines (alias), aiter_ndjson, aiter_json_lines, write_ndjson_batches — JSON-object lines batched into dict[str, list] (IO_OVERVIEW).

scan_kwargs: low_memory, rechunk, ignore_errors, n_rows, infer_schema_length, glob, include_file_paths, row_index_name, row_index_offset. Unknown keys raise ValueError. See DATA_IO_SOURCES.

Paths, directories, and `glob`¶

Use glob=True (or omit it) when reading a directory or a glob pattern so your call matches Parquet / CSV lazy reads. Polars 0.53 builds NDJSON lazy scans with UnifiedScanArgs { glob: true, … } internally; glob expansion cannot be disabled from the LazyJsonLineReader API. Passing glob=False raises ValueError from pydantable.

Hive-style partitions are disabled for NDJSON in Polars 0.53 (no partition columns from paths). A single glob such as *.jsonl only matches that extension; use another pattern or a second read for .ndjson files. Details: Polars 0.53 vs pydantable scan audit.

Write (targets)¶

`DataFrame[Schema]` and `DataFrameModel`¶

df.write_ndjson(path, *, write_kwargs=..., streaming=...)
model.write_ndjson(...)

write_kwargs: json_format ("lines" / "json"). See DATA_IO_SOURCES.

`pydantable.io`¶

export_ndjson, aexport_ndjson
write_ndjson_batches — stream many batches to one NDJSON file.

Runnable example¶

Run conventions: IO_OVERVIEW (Runnable example).

python docs/examples/io/ndjson_roundtrip.py

"""NDJSON: append-only API / audit log → lazy scan; round-trip via write_ndjson.

Each line is one JSON object (common for log shipping and CDC-style exports).

Needs pydantable._core. Run::

python docs/examples/io/ndjson_roundtrip.py

"""

from future import annotations

import tempfile from pathlib import Path

from pydantable import DataFrameModel

class ApiAccessEvent(DataFrameModel): """One request line from an edge log (NDJSON)."""

status: int
path: str

def main() -> None: with tempfile.TemporaryDirectory() as logs: access_log = Path(logs) / "access-20250325.ndjson" access_log.write_text( '{"status": 200, "path": "/v1/health"}\n' '{"status": 404, "path": "/v1/missing"}\n', encoding="utf-8", )

    df = ApiAccessEvent.read_ndjson(str(access_log))
    rows = df.collect()
    assert [r.status for r in rows] == [200, 404]
    assert [r.path for r in rows] == ["/v1/health", "/v1/missing"]

    replay = Path(logs) / "replay.ndjson"
    ApiAccessEvent({"status": [500], "path": ["/v1/checkout"]}).write_ndjson(
        str(replay)
    )
    got = ApiAccessEvent.read_ndjson(str(replay))
    assert got.to_dict() == {"status": [500], "path": ["/v1/checkout"]}

print("ndjson_roundtrip: ok")

if name == "main": main()

Output¶

ndjson_roundtrip: ok

Large-file patterns (lazy scan; optional iter_ndjson batches in IO_JSON): python docs/examples/io/large_ndjson_patterns.py.

NDJSON I/O (newline-delimited JSON)¶

Read (sources)¶

DataFrame[Schema] and DataFrameModel¶

pydantable.io¶

Paths, directories, and glob¶

Write (targets)¶

DataFrame[Schema] and DataFrameModel¶

pydantable.io¶

Runnable example¶

Output¶

See also¶

`DataFrame[Schema]` and `DataFrameModel`¶

`pydantable.io`¶

Paths, directories, and `glob`¶

`DataFrame[Schema]` and `DataFrameModel`¶

`pydantable.io`¶