-
Notifications
You must be signed in to change notification settings - Fork 8
feat(repository): relationships loading #304
Description
Hi @peterschutt!
Rationale
The current implementation of SQLAlchemyRepository lacks fine control over which relationships are loaded and how they are. A quick workaround is to extend the select statement by directly accessing it or to provide one on repository/service instantiation.
Consider the following models/repositories:
class Address(Base):
address_line: Mapped[str]
city: Mapped[str]
class Company(Base):
name: Mapped[str]
headquarters: Mapped[Address]
class Employee(Base):
name: Mapped[str]
salary: Mapped[float]
address_id: Mapped[UUID] = mapped_colmun(ForeignKey("address.id"))
comapny_id: Mapped[UUID] = mapped_colmun(ForeignKey("comapny.id"))
company: Mapped[Company] = relationship(Company)
address: Mapped[Address] = relationship(Address)
class EmployeeReadDTO(dto.FromMapped[Annotated[Employee, "read"]]):
pass
class EmployeeRepo(SQLAlchemyRepository[Employee]):
model_type = EmployeeIf we try to blindly serialize a freshly retrieved instance, sqlalchemy will fail to access relationships that have not been eagerly loaded:
repo = EmployeeRepo(session=session)
instance = await repo.get("some_id")
# Raise an sqlalchemy.exc.MissingGreenlet error
dct = EmployeeReadDTO.from_orm(instance)A quick fix, as mentioned before, is to extend the select statement to explicitly load relationships:
repo = EmployeeRepo(
session=session,
select_=select(Employee).options(
joinedload(Employee.address),
joinedload(Employee.company).options(Company.headquarters)
)
)
instance = await repo.get("some_id")
# All relationships have been loaded so the instance can safely be serialized
dct = EmployeeReadDTO.from_orm(instance)If we often access some relation, eager loading can also be specified directly in the relationship definition:
# address is always loaded implicitly unless explicitly told otherwise.
address: Mapped[Address] = relationship(Address, lazy="joined")But as you add more, deeper relationships, always having to modify the select statement or relationships can quickly become cumbersome. On top of that, sqlalchemy does not tell you which relationship can't be loaded, making debugging harder for deeply nested paths. Taking the previous example of failing access to, say, the address field, the error you see should looks like this:
sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_() here.
Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/14/xd2s)Also, changing relationships on your models will force you to check if all queries are loading needed ones depending on what you do with the resulting sqlalchemy instances.
dto easily allow serializing sqlalchemy instances, to pass them to worker jobs for example. But as dto are themselves generated from sqlalchemy models, it can be difficult to know exactly what relationships are needed to be loaded for models to serialize properly.
Proposal
After some experimentations with sqlalchemy relationship loading techniques, I propose a way to expose them through the SQLAlchemyRepository implementation by adding the following methods to it:
def load(
self,
config: SQLAlchemyLoadConfig | None = None,
/,
**kwargs: bool | EllipsisType,
) -> SQLAlchemyRepository:
"""Add relationships to load
Args:
config: Load config. Defaults to None.
kwargs: relationship paths to (not) load
Returns:
The repository instance
"""
...
def load_from_dto(
self, dto: Any, config: SQLAlchemyLoadConfig | None = None
) -> SQLAlchemyRepository:
"""Load necessary relationships so the orm object
can be safely serialized by the dto.
Args:
config: Load config. Defaults to None.
dto: The DTO class to extract relationships from
Returns:
The repository instance
"""
...Although the examples below focus on sqlalchemy usage, keep in mind that this load API is actually agnostic from the repository implementation.
load
The kwargs parameter from the load method accepts a Django like keyword arguments representing relationship paths that should be loaded:
# Set Employee.address and Company.address to be eagerly loaded
await EmployeeRepo(session=session).load(address=True, company__headquarters=True).get("1")
# It's the same as doing this:
await EmployeeRepo(
session=session,
select_=select(Employee).options(
joinedload(Employee.address),
joinedload(Employee.company).options(Company.headquarters)
)
).get("1")keyword arguments represent relationship paths and indicate if the relation should be loaded or not. They are three possible values:
True: Relationship must be loaded (and do so all relationships before)False: Relationship must not be loaded (will useraiseloadin sqlalchemy)...: This relationship, and any under that one must loaded (wildcard loading)
await EmployeeRepo(session=session).load(address=False, company=...).get("1")
# Equivalent to the follwoing statement:
stmt = select(Employee).where(id="1").options(
raiseload(Employee.address),
joinedload(Employee.company).options(joinedload("*"))
)The loader type is determined depending by looking at the relationship: joinedload is used by default unless it's a list relationship, in this case selectinload is used.
load_from_dto
This pulls loading paths from the specified dto class, so the sqlalchemy instance can be serialized by using this same dto:
instance = await EmployeeRepo(session=session).load_from_dto(EmployeeReadDTO).get("1")
dct = EmployeeReadDTO.from_orm(instance)This method is particularly useful when you have to serialize an instance with a dto to pass it around (to a worker job for example). No matter how the model or the dto change, you are always guaranteed to load the minimal relationships needed for serialization to work flawlessly.
Load config
A config object can be specified on both methods to have control over loading strategies. It's possible to set a default loading strategy, that will load everything unless told otherwise for a specific relationship:
load_all_strat = SQLAlchemyLoadConfig(default_strategy=SQLALoadStrategy.SELECT_IN)
# Load all relationship, including nested ones
await EmployeeRepo(session=session).load(load_all_strat).get("1")
# Load all relationship, excluding Company_headquarters
await EmployeeRepo(session=session).load(load_all_strat, company_headquarters=False).get("1")Thoughts
-
Do we want to allow filtering no relations? Can we reuse the same filter classes that we use for
.list()? -
Does explicitly specifying the loading technique per relationship would be desirable? Something like this:
await EmployeeRepo(session=session).load(address=SQLALoadStrategy.SELECT_IN).get("1")
Any thoughts on this?