Skip to content

JSON Lines logs: read → unnest → write NDJSON

Append-only NDJSON (one JSON object per line) is a common log and CDC shape. This recipe uses a lazy scan so transforms run on Polars before materialization, then unnests a nested struct field for flat columns, and writes NDJSON with write_ndjson (lazy pipeline sink).

Recipe

The runnable script lives in the repository at docs/examples/cookbook/json_logs_unnest_export.py (same code as below).

"""Cookbook: NDJSON logs → filter → unnest → lazy write_ndjson (see RTD cookbook).

Run from repo root::

PYTHONPATH=python python docs/examples/cookbook/json_logs_unnest_export.py

Needs pydantable._core. """

from future import annotations

import tempfile from pathlib import Path

from pydantable import DataFrameModel, Schema

class Meta(Schema): """Nested object carried on each log line."""

region: str
code: int

class LogLine(DataFrameModel): """One NDJSON object per line."""

event: str
meta: Meta

def main() -> None: with tempfile.TemporaryDirectory() as tmp: src = Path(tmp) / "events.ndjson" src.write_text( '{"event":"ping","meta":{"region":"us","code":1}}\n' '{"event":"pong","meta":{"region":"eu","code":2}}\n', encoding="utf-8", )

    df = LogLine.read_ndjson(str(src))
    us_only = df.filter(df.meta.struct_field("region") == "us")
    flat = us_only.unnest("meta")
    out_path = Path(tmp) / "flat.ndjson"
    flat.write_ndjson(str(out_path))

    text = out_path.read_text(encoding="utf-8")
    assert "us" in text and ("meta_region" in text or "region" in text)

print("json_logs_unnest_export: ok")

if name == "main": main()

Example output

From the repository root, with the extension built:

PYTHONPATH=python python docs/examples/cookbook/json_logs_unnest_export.py
json_logs_unnest_export: ok

Notes

  • Lazy read / write: read_ndjson keeps a scan root until collect / write_ndjson / other terminals (see EXECUTION and IO_JSON).
  • Unnest naming: columns become meta_region, meta_code, … per INTERFACE_CONTRACT.
  • Selectors: to pick all struct columns before unnesting, use s.structs() as in SELECTORS (Nested structs).
  • Egress: this recipe uses write_ndjson (IO_NDJSON). For a single JSON array file, use DataFrameModel.export_json (eager column dict → file; see IO_JSON).

See also

IO_JSON · IO_NDJSON · SELECTORS · CHANGELOG (1.10.0)