feat(repository): relationships loading

Hi @peterschutt!

## Rationale

The current implementation of `SQLAlchemyRepository` lacks fine control over which relationships are loaded and how they are. A quick workaround is to extend the select statement by directly accessing it or to provide one on repository/service instantiation.

Consider the following models/repositories:

```python
class Address(Base):
    address_line: Mapped[str]
    city: Mapped[str]

class Company(Base):
    name: Mapped[str]
    headquarters: Mapped[Address]

class Employee(Base):
    name: Mapped[str]
    salary: Mapped[float]
    address_id: Mapped[UUID] = mapped_colmun(ForeignKey("address.id"))
    comapny_id: Mapped[UUID] = mapped_colmun(ForeignKey("comapny.id"))
    company: Mapped[Company] = relationship(Company)
    address: Mapped[Address] = relationship(Address)

class EmployeeReadDTO(dto.FromMapped[Annotated[Employee, "read"]]):
    pass

class EmployeeRepo(SQLAlchemyRepository[Employee]):
    model_type = Employee
```

If we try to blindly serialize a freshly retrieved instance, sqlalchemy will fail to access relationships that have not been eagerly loaded:

```python
repo = EmployeeRepo(session=session)
instance = await repo.get("some_id")
# Raise an sqlalchemy.exc.MissingGreenlet error
dct = EmployeeReadDTO.from_orm(instance)
```

A quick fix, as mentioned before, is to extend the select statement to explicitly load relationships:

```python
repo = EmployeeRepo(
    session=session,
    select_=select(Employee).options(
        joinedload(Employee.address),
        joinedload(Employee.company).options(Company.headquarters)
    )
)
instance = await repo.get("some_id")
# All relationships have been loaded so the instance can safely be serialized
dct = EmployeeReadDTO.from_orm(instance)
```

If we often access some relation, eager loading can also be specified directly in the relationship definition:

```python
# address is always loaded implicitly unless explicitly told otherwise.
address: Mapped[Address] = relationship(Address, lazy="joined")
```

But as you add more, deeper relationships, always having to modify the select statement or relationships can quickly become cumbersome. On top of that, sqlalchemy does not tell you *which relationship* can't be loaded, making debugging harder for deeply nested paths. Taking the previous example of failing access to, say, the `address` field, the error you see should looks like this:

```console
sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_() here.
Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/14/xd2s)
```

Also, changing relationships on your models will force you to check if all queries are loading needed ones depending on what you do with the resulting sqlalchemy instances.

dto easily allow serializing sqlalchemy instances, to pass them to worker jobs for example. But as dto are themselves generated from sqlalchemy models, it can be difficult to know exactly what relationships are needed to be loaded for models to serialize properly.

## Proposal

After some experimentations with [sqlalchemy relationship loading techniques](https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#), I propose a way to expose them through the `SQLAlchemyRepository` implementation by adding the following methods to it:

```python
def load(
    self,
    config: SQLAlchemyLoadConfig | None = None,
    /,
    **kwargs: bool | EllipsisType,
) -> SQLAlchemyRepository:
    """Add relationships to load

    Args:
        config: Load config. Defaults to None.
        kwargs: relationship paths to (not) load

    Returns:
        The repository instance
    """
    ...

def load_from_dto(
    self, dto: Any, config: SQLAlchemyLoadConfig | None = None
) -> SQLAlchemyRepository:
    """Load necessary relationships so the orm object
    can be safely serialized by the dto.

    Args:
    	config: Load config. Defaults to None.
        dto: The DTO class to extract relationships from

    Returns:
        The repository instance
    """
    ...
```

Although the examples below focus on sqlalchemy usage, keep in mind that this load API is actually agnostic from the repository implementation.

### `load`

The `kwargs` parameter from the `load` method accepts a Django like keyword arguments representing relationship paths that should be loaded:

```python
# Set Employee.address and Company.address to be eagerly loaded
await EmployeeRepo(session=session).load(address=True, company__headquarters=True).get("1")
# It's the same as doing this:
await EmployeeRepo(
    session=session,
    select_=select(Employee).options(
        joinedload(Employee.address),
        joinedload(Employee.company).options(Company.headquarters)
    )
).get("1")
```

keyword arguments represent relationship paths and indicate if the relation should be loaded or not. They are three possible values:

- `True`: Relationship must be loaded (and do so all relationships before)
- `False`: Relationship must *not* be loaded (will use `raiseload` in sqlalchemy)
- `...` : This relationship, and any under that one must loaded (wildcard loading)

```python
await EmployeeRepo(session=session).load(address=False, company=...).get("1")
# Equivalent to the follwoing statement:
stmt = select(Employee).where(id="1").options(
    raiseload(Employee.address),
    joinedload(Employee.company).options(joinedload("*"))
)
```

The loader type is determined depending by looking at the relationship: `joinedload` is used by default unless it's a list relationship, in this case `selectinload` is used.

### `load_from_dto`

This pulls loading paths from the specified dto class, so the sqlalchemy instance can be serialized by using this same dto:

```python
instance = await EmployeeRepo(session=session).load_from_dto(EmployeeReadDTO).get("1")
dct = EmployeeReadDTO.from_orm(instance)
```

This method is particularly useful when you have to serialize an instance with a dto to pass it around (to a worker job for example). No matter how the model or the dto change, you are always guaranteed to load the minimal relationships needed for serialization to work flawlessly.

### Load config

A `config` object can be specified on both methods to have control over loading strategies. It's possible to set a default loading strategy, that will load everything unless told otherwise for a specific relationship:

```python
load_all_strat = SQLAlchemyLoadConfig(default_strategy=SQLALoadStrategy.SELECT_IN)
# Load all relationship, including nested ones
await EmployeeRepo(session=session).load(load_all_strat).get("1")
# Load all relationship, excluding Company_headquarters
await EmployeeRepo(session=session).load(load_all_strat, company_headquarters=False).get("1")
```

## Thoughts

- Do we want to allow filtering no relations? Can we reuse the same filter classes that we use for `.list()`?

- Does explicitly specifying the loading technique per relationship would be desirable? Something like this:

  ```python
  await EmployeeRepo(session=session).load(address=SQLALoadStrategy.SELECT_IN).get("1")
  ```

Any thoughts on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(repository): relationships loading #304

Rationale

Proposal

`load`

`load_from_dto`

Load config

Thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(repository): relationships loading #304

Description

Rationale

Proposal

load

load_from_dto

Load config

Thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`load`

`load_from_dto`