Selector DSL (schema-first)¶
PydanTable includes a schema-driven selector DSL for picking columns by name patterns and dtype groups. Unlike Polars selectors, these selectors resolve only against the current logical schema (df.schema_fields()), which keeps them deterministic and compatible with schema-first typing.
Import¶
Name-based selectors¶
df.select(s.by_name("id", "age"))
df.select(s.starts_with("age"))
df.select(s.ends_with("_id"))
df.select(s.contains("score"))
df.select(s.matches(r"^age\\d+$"))
Dtype-group selectors¶
df.select(s.numeric())
df.select(s.integers())
df.select(s.floats())
df.select(s.decimals())
df.select(s.string())
df.select(s.temporal())
df.select(s.boolean())
df.select(s.binary())
df.select(s.lists())
df.select(s.maps())
df.select(s.structs())
df.select(s.enums())
df.select(s.ipv4s())
df.select(s.ipv6s())
df.select(s.uuids())
df.select(s.wkbs())
Nested structs: selector, unnest, and struct_field¶
Use s.structs() to project only struct-typed columns (nested Pydantic / Schema fields). Then unnest expands one struct column into top-level columns named {parent}_{field} (see INTERFACE_CONTRACT). To read a single subfield without unnesting, use Expr.struct_field(...) on the struct column.
from pydantable import DataFrame, Schema
from pydantable import selectors as s
class Payload(Schema):
score: int
label: str | None
class Row(Schema):
id: int
payload: Payload
df = DataFrame[Row]({
"id": [1, 2],
"payload": [
{"score": 10, "label": "a"},
{"score": 20, "label": None},
],
})
# Project struct columns only, then flatten `payload` into `payload_*` top-level columns.
flat = df.select(s.structs()).unnest("payload")
# Or project one field without unnesting the whole struct:
one = df.with_columns(lbl=df.payload.struct_field("label"))
Nested modeling and nullability are described in SUPPORTED_TYPES. Prefer struct_field when you need one inner column; prefer unnest when you want flat top-level columns for select / joins.
Composition¶
Selectors can be composed:
# union, intersection, difference
df.select(s.starts_with("age") | s.by_name("id"))
df.select(s.numeric() & ~s.by_name("id"))
df.drop(s.starts_with("tmp_") - s.by_name("tmp_keep"))
# exclude helper (same as s1 - s2)
df.select(s.everything().exclude(s.ends_with("_debug")))
Excluding columns in select¶
Use exclude= to remove columns from a projection (names or selectors):
df.select("id", "age", "age2", exclude=s.starts_with("age"))
df.select(exclude=["debug_col"]) # everything except debug_col
Error behavior¶
select(Selector)raisesValueErrorwhen the selector matches no columns (includes the selector summary and available schema columns).drop(Selector)follows the existingdrop(strict=...)rules:strict=True: missing columns error at plan validation timestrict=False: missing columns are ignored (no-op if all requested columns are missing)
Rename helper¶
Use rename_with_selector to rename a subset of columns based on a selector:
You can also build a mapping using rename_map and pass it to rename(...):
m = s.rename_map(s.starts_with("tmp_"), lambda c: c.removeprefix("tmp_"))(df.schema_fields())
df2 = df.rename(m)
Selector-driven column transforms¶
Some schema-first convenience helpers expand a selector into a concrete column list and then apply a typed-safe transform:
# cast a subset
df2 = df.with_columns_cast(s.numeric(), float)
# fill nulls for a subset
df3 = df.with_columns_fill_null(s.by_name("age"), value=0)
# explicit selector-first projection (alias of select(selector))
df4 = df.select_schema(s.starts_with("tmp_"))