JSON Lines logs: read → unnest → write NDJSON¶
Append-only NDJSON (one JSON object per line) is a common log and CDC shape. This recipe uses a lazy scan so transforms run on Polars before materialization, then unnests a nested struct field for flat columns, and writes NDJSON with write_ndjson (lazy pipeline sink).
Recipe¶
The runnable script lives in the repository at docs/examples/cookbook/json_logs_unnest_export.py (same code as below).
"""Cookbook: NDJSON logs → filter → unnest → lazy write_ndjson (see RTD cookbook).
Run from repo root::
PYTHONPATH=python python docs/examples/cookbook/json_logs_unnest_export.py
Needs pydantable._core.
"""
from future import annotations
import tempfile from pathlib import Path
from pydantable import DataFrameModel, Schema
class Meta(Schema): """Nested object carried on each log line."""
region: str
code: int
class LogLine(DataFrameModel): """One NDJSON object per line."""
event: str
meta: Meta
def main() -> None: with tempfile.TemporaryDirectory() as tmp: src = Path(tmp) / "events.ndjson" src.write_text( '{"event":"ping","meta":{"region":"us","code":1}}\n' '{"event":"pong","meta":{"region":"eu","code":2}}\n', encoding="utf-8", )
df = LogLine.read_ndjson(str(src))
us_only = df.filter(df.meta.struct_field("region") == "us")
flat = us_only.unnest("meta")
out_path = Path(tmp) / "flat.ndjson"
flat.write_ndjson(str(out_path))
text = out_path.read_text(encoding="utf-8")
assert "us" in text and ("meta_region" in text or "region" in text)
print("json_logs_unnest_export: ok")
if name == "main": main()
Example output¶
From the repository root, with the extension built:
Notes¶
- Lazy read / write:
read_ndjsonkeeps a scan root untilcollect/write_ndjson/ other terminals (see EXECUTION and IO_JSON). - Unnest naming: columns become
meta_region,meta_code, … per INTERFACE_CONTRACT. - Selectors: to pick all struct columns before unnesting, use
s.structs()as in SELECTORS (Nested structs). - Egress: this recipe uses
write_ndjson(IO_NDJSON). For a single JSON array file, useDataFrameModel.export_json(eager column dict → file; see IO_JSON).