Beanie ODM integration (MongoDB)¶

This page documents the Beanie-first MongoDB integration in PydanTable.

Beanie is an async ODM for MongoDB (Pydantic-based). See the upstream docs: Beanie documentation.

What pydantable supports (feature map)¶

PydanTable splits MongoDB support into three layers. Which one you should use depends on whether you want driver-level I/O, ODM-aware behavior (hooks), or lazy DataFrame transforms.

Goal	Recommended API
Eager column-dict I/O with a sync PyMongo `Collection`	`fetch_mongo` / `iter_mongo` / `write_mongo`
Eager column-dict I/O with `pymongo.asynchronous.AsyncCollection`	`afetch_mongo` / `aiter_mongo` / `awrite_mongo` (native async; see PyMongo surface area in MONGO_ENGINE)
Eager column-dict I/O using Beanie async ODM features (full query DSL / hooks)	`afetch_beanie` / `aiter_beanie` / `awrite_beanie`
Lazy `DataFrame` API over a Mongo collection (typed transforms, then materialize)	`MongoDataFrame` / `MongoDataFrameModel` (`from_beanie`, `from_beanie_async`, …)

For which PyMongo operations pydantable wraps on sync vs async collections (and what remains “use the driver directly”), see the PyMongo surface area subsection in MONGO_ENGINE.

This page focuses on the Beanie-first pieces. For the engine details and the MongoRoot plan story, see MONGO_ENGINE.

Install¶

pip install "pydantable[mongo]"

This installs:

Beanie (ODM)
pymongo (driver)
Mongo plan stack (lazy Mongo roots + columnar materialization used by MongoPydantableEngine)

Beanie settings that matter¶

Beanie config lives on the Document.Settings inner class (upstream: Defining a document).

Relevant settings for pydantable usage:

Collection name: Settings.name
Indexes: Settings.indexes (including Indexed(...) fields)
Encoders: Settings.bson_encoders (how Python values are represented in BSON)
Keep nulls: Settings.keep_nulls
Validation-on-save: Settings.validate_on_save
Link nesting depth limits: Settings.max_nesting_depth*

PydanTable treats Beanie as the source of truth for:

what collection to read/write (via Document.get_collection_name())
whether you want ODM-level hooks/validation (when you choose ODM-aware APIs)

Eager async Beanie I/O (full ODM leverage)¶

These functions return dict[str, list] like other eager I/O helpers.

`afetch_beanie` (load all results)¶

Use this when you want a single in-memory column dict:

from pydantable import afetch_beanie

# docs = await MyDocument.find(...).to_list() but columnar
cols = await afetch_beanie(MyDocument)

You can also pass a Beanie query object directly, so you can use Beanie’s full query DSL (operators, chained .find(), .sort(), .project(), etc.) upstream:

query = MyDocument.find(MyDocument.some_field == 1, fetch_links=True)
cols = await afetch_beanie(query)

Projections¶

Beanie supports projections via .project(ProjectionModel) (upstream: Finding documents → Projections).

PydanTable supports two projection styles:

projection_model=MyProjectionModel: forwards to Beanie .project(...)
fields=[...]: convenience helper that builds a temporary projection model

cols = await afetch_beanie(MyDocument, fields=["id", "name"])

Relations / links¶

Beanie can prefetch links with fetch_links=True (upstream: Relations).

cols = await afetch_beanie(
    MyDocument,
    fetch_links=True,
    nesting_depth=2,
)

By default, pydantable flattens nested objects to dot-path columns (e.g. door.height). You can turn flattening off with flatten=False (you’ll then get nested values inside the column dict).

`_id` vs `id`¶

Beanie maps Mongo _id to Document.id. PydanTable normalizes this with:

id_column="id" (default): outputs an id column
id_column="_id": outputs an _id column

`aiter_beanie` (stream batches)¶

Use this for bounded-memory ingestion:

from pydantable import aiter_beanie

async for batch in aiter_beanie(MyDocument, batch_size=10_000):
    # batch is dict[str, list]
    ...

`awrite_beanie` (ODM-aware inserts)¶

Use this when you want Beanie features to run during inserts:

Settings.validate_on_save = True (upstream: On save validation)
event-based actions (@before_event, @after_event) (upstream: Event-based actions)

from pydantable import BeanieWriteOptions, awrite_beanie

opts = BeanieWriteOptions(skip_actions=None, link_rule=None)
inserted = await awrite_beanie(MyDocument, {"x": [1, 2], "y": ["a", "b"]}, options=opts)

Important

write_mongo / awrite_mongo are driver-level insert_many helpers on a PyMongo collection (sync or async). They do not run Beanie validation-on-save or event hooks. Use awrite_beanie when you need ODM semantics.

Lazy execution over Mongo without a sync client¶

If you want the typed lazy DataFrame API over a Beanie Document without creating a sync PyMongo client, use the async-root constructors:

MongoDataFrame[Schema].from_beanie_async(...)
MongoDataFrameModel.from_beanie_async(...)

from pydantable import MongoDataFrame, Schema

class Row(Schema):
    id: str
    x: int

df = MongoDataFrame[Row].from_beanie_async(MyDocument, criteria=MyDocument.x > 0)

# Or pass a pre-built Beanie query (chains like ``.sort()`` / ``.project()`` — same rules as ``afetch_beanie``):
# df = MongoDataFrame[Row].from_beanie_async(MyDocument.find(MyDocument.x > 0).sort("-name"))

# IMPORTANT: async-only materialization
rows = await df.acollect()
cols = await df.ato_dict()

Warning

from_beanie_async(...) is async-only. Calling sync terminals like collect(), to_dict(), or sync lazy sinks will raise. Use await acollect() / await ato_dict() instead.

Migrations and schema evolution¶

Beanie has first-class migration tooling (upstream: Migrations).

PydanTable does not run migrations for you, but the recommended workflow is:

keep Beanie Document classes as the schema source of truth
run Beanie migrations when document shape changes
keep your pydantable Schema / DataFrameModel types aligned with the post-migration shape

For engine-backed lazy execution details, see MONGO_ENGINE.