Skip to content

Beanie ODM integration (MongoDB)

This page documents the Beanie-first MongoDB integration in PydanTable.

Beanie is an async ODM for MongoDB (Pydantic-based). See the upstream docs: Beanie documentation.

What pydantable supports (feature map)

PydanTable splits MongoDB support into three layers. Which one you should use depends on whether you want driver-level I/O, ODM-aware behavior (hooks), or lazy DataFrame transforms.

Goal Recommended API
Eager column-dict I/O with a sync PyMongo Collection fetch_mongo / iter_mongo / write_mongo
Eager column-dict I/O with pymongo.asynchronous.AsyncCollection afetch_mongo / aiter_mongo / awrite_mongo (native async; see PyMongo surface area in MONGO_ENGINE)
Eager column-dict I/O using Beanie async ODM features (full query DSL / hooks) afetch_beanie / aiter_beanie / awrite_beanie
Lazy DataFrame API over a Mongo collection (typed transforms, then materialize) MongoDataFrame / MongoDataFrameModel (from_beanie, from_beanie_async, …)

For which PyMongo operations pydantable wraps on sync vs async collections (and what remains “use the driver directly”), see the PyMongo surface area subsection in MONGO_ENGINE.

This page focuses on the Beanie-first pieces. For the engine details and the MongoRoot plan story, see MONGO_ENGINE.

Install

pip install "pydantable[mongo]"

This installs:

  • Beanie (ODM)
  • pymongo (driver)
  • Mongo plan stack (lazy Mongo roots + columnar materialization used by MongoPydantableEngine)

Beanie settings that matter

Beanie config lives on the Document.Settings inner class (upstream: Defining a document).

Relevant settings for pydantable usage:

  • Collection name: Settings.name
  • Indexes: Settings.indexes (including Indexed(...) fields)
  • Encoders: Settings.bson_encoders (how Python values are represented in BSON)
  • Keep nulls: Settings.keep_nulls
  • Validation-on-save: Settings.validate_on_save
  • Link nesting depth limits: Settings.max_nesting_depth*

PydanTable treats Beanie as the source of truth for:

  • what collection to read/write (via Document.get_collection_name())
  • whether you want ODM-level hooks/validation (when you choose ODM-aware APIs)

Eager async Beanie I/O (full ODM leverage)

These functions return dict[str, list] like other eager I/O helpers.

afetch_beanie (load all results)

Use this when you want a single in-memory column dict:

from pydantable import afetch_beanie

# docs = await MyDocument.find(...).to_list() but columnar
cols = await afetch_beanie(MyDocument)

You can also pass a Beanie query object directly, so you can use Beanie’s full query DSL (operators, chained .find(), .sort(), .project(), etc.) upstream:

query = MyDocument.find(MyDocument.some_field == 1, fetch_links=True)
cols = await afetch_beanie(query)

Projections

Beanie supports projections via .project(ProjectionModel) (upstream: Finding documents → Projections).

PydanTable supports two projection styles:

  • projection_model=MyProjectionModel: forwards to Beanie .project(...)
  • fields=[...]: convenience helper that builds a temporary projection model
cols = await afetch_beanie(MyDocument, fields=["id", "name"])

Beanie can prefetch links with fetch_links=True (upstream: Relations).

cols = await afetch_beanie(
    MyDocument,
    fetch_links=True,
    nesting_depth=2,
)

By default, pydantable flattens nested objects to dot-path columns (e.g. door.height). You can turn flattening off with flatten=False (you’ll then get nested values inside the column dict).

_id vs id

Beanie maps Mongo _id to Document.id. PydanTable normalizes this with:

  • id_column="id" (default): outputs an id column
  • id_column="_id": outputs an _id column

aiter_beanie (stream batches)

Use this for bounded-memory ingestion:

from pydantable import aiter_beanie

async for batch in aiter_beanie(MyDocument, batch_size=10_000):
    # batch is dict[str, list]
    ...

awrite_beanie (ODM-aware inserts)

Use this when you want Beanie features to run during inserts:

from pydantable import BeanieWriteOptions, awrite_beanie

opts = BeanieWriteOptions(skip_actions=None, link_rule=None)
inserted = await awrite_beanie(MyDocument, {"x": [1, 2], "y": ["a", "b"]}, options=opts)

Important

write_mongo / awrite_mongo are driver-level insert_many helpers on a PyMongo collection (sync or async). They do not run Beanie validation-on-save or event hooks. Use awrite_beanie when you need ODM semantics.

Lazy execution over Mongo without a sync client

If you want the typed lazy DataFrame API over a Beanie Document without creating a sync PyMongo client, use the async-root constructors:

  • MongoDataFrame[Schema].from_beanie_async(...)
  • MongoDataFrameModel.from_beanie_async(...)
from pydantable import MongoDataFrame, Schema

class Row(Schema):
    id: str
    x: int

df = MongoDataFrame[Row].from_beanie_async(MyDocument, criteria=MyDocument.x > 0)

# Or pass a pre-built Beanie query (chains like ``.sort()`` / ``.project()`` — same rules as ``afetch_beanie``):
# df = MongoDataFrame[Row].from_beanie_async(MyDocument.find(MyDocument.x > 0).sort("-name"))

# IMPORTANT: async-only materialization
rows = await df.acollect()
cols = await df.ato_dict()

Warning

from_beanie_async(...) is async-only. Calling sync terminals like collect(), to_dict(), or sync lazy sinks will raise. Use await acollect() / await ato_dict() instead.

Migrations and schema evolution

Beanie has first-class migration tooling (upstream: Migrations).

PydanTable does not run migrations for you, but the recommended workflow is:

  • keep Beanie Document classes as the schema source of truth
  • run Beanie migrations when document shape changes
  • keep your pydantable Schema / DataFrameModel types aligned with the post-migration shape

For engine-backed lazy execution details, see MONGO_ENGINE.