Skip to content

[FEAT] Recursive DLPack container conversion for auto torch.Tensor return#517

Merged
junrushao merged 3 commits intoapache:mainfrom
Kathryn-cat:dlpack-container-convert
Apr 2, 2026
Merged

[FEAT] Recursive DLPack container conversion for auto torch.Tensor return#517
junrushao merged 3 commits intoapache:mainfrom
Kathryn-cat:dlpack-container-convert

Conversation

@Kathryn-cat
Copy link
Copy Markdown
Contributor

@Kathryn-cat Kathryn-cat commented Mar 30, 2026

Problem

When a packed FFI function receives torch.Tensor inputs, the return value is automatically converted back to torch.Tensor via DLPack — but only for bare ffi.Tensor returns. When the return is a container (Array, List, Map, Dict) containing tensors, the tensors inside remain as ffi.Tensor, requiring manual per-element conversion.

Solution

We perform a lazy conversion of each element in the container from ffi::Tensor to torch::Tensor when they're retrieved. A lazy conversion ensures both semantic correctness of containers and reduce runtime overhead compared with eager conversion.

Stream Propagation

Container Element Stream set? How
list/tuple torch.Tensor Yes ConstructorCall → each element hits DLPackExchangeAPI_ setter
list/tuple ffi.Tensor No ConstructorCall → each element hits Tensor_ setter (no stream)
dict torch.Tensor (values) Yes Same as above via ConstructorCall
dict ffi.Tensor (values) No Same as above
ffi.Array/ffi.List (tagged) ffi.Tensor Yes ContainerObject_ setter → _scan_seq_for_stream
ffi.Map/ffi.Dict (tagged) ffi.Tensor Yes ContainerObject_ setter → _scan_map_for_stream
ffi.Array/ffi.List (untagged) ffi.Tensor No Object_ setter (pass-through)
ffi.Map/ffi.Dict (untagged) ffi.Tensor No Object_ setter (pass-through)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements recursive conversion of FFI containers—specifically Array, List, Map, and Dict—into native Python lists and dictionaries when the DLPack exchange API is active. This allows for the seamless return of nested structures containing tensors (e.g., PyTorch tensors) through the FFI. The changes include new Cython conversion logic, updated return value handling, and comprehensive tests for various nested and mixed-type scenarios. Feedback was provided to include an early return for empty sequences in the list conversion function to improve consistency and performance.

@junrushao
Copy link
Copy Markdown
Member

The PR looks good in functionalities, but I'm slightly worried about the semantics and runtime overhead due to eager recursive conversion. Is there a way to opt out this behavior?

@Kathryn-cat
Copy link
Copy Markdown
Contributor Author

The PR looks good in functionalities, but I'm slightly worried about the semantics and runtime overhead due to eager recursive conversion. Is there a way to opt out this behavior?

@junrushao I'd like to make sure that I address your right concern. Are you more worried about:

  • a) semantics: In the current implementation, the return type silently changes from ffi.Array to list when a ffi tensor is in the inputs.

  • b) runtime overhead: The eager version is converting elements that are never accessed. Do you prefer we change it into a lazy conversion, or we use a flag / API to let users disable the eager conversion behavior manually?

@junrushao
Copy link
Copy Markdown
Member

I do worry about both a) and b) separately, but b) over a).

a) semantics: In the current implementation, the return type silently changes from ffi.Array to list when a ffi tensor is in the inputs.

I do worry about it in a usecase where we may want to return a mutable ffi.List for the user to update the list inplace. If we do eager conversion, it means the update will effectively become a no-op which changes the semantics.

b) runtime overhead: The eager version is converting elements that are never accessed. Do you prefer we change it into a lazy conversion, or we use a flag / API to let users disable the eager conversion behavior manually?

I don't know how we could make it lazy, but a flag could definitely work. I'd love to hear from you and @tqchen what makes the most sense.

@tqchen
Copy link
Copy Markdown
Member

tqchen commented Mar 30, 2026

agree about the List mutation case is something we want to think about, and wasn't on the radar before.

@Kathryn-cat Kathryn-cat force-pushed the dlpack-container-convert branch from 0e91025 to 90f066d Compare April 2, 2026 01:52
@Kathryn-cat Kathryn-cat force-pushed the dlpack-container-convert branch from 90f066d to 9a0f26e Compare April 2, 2026 01:52
@Kathryn-cat
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements lazy DLPack conversion for FFI containers (Array, List, Map, and Dict) by introducing a CContainerBase class. This change enables the propagation of the DLPack exchange API and includes logic to scan containers for non-CPU tensors to automatically configure the stream context. Review feedback highlights that the current scanning mechanism is shallow and does not account for nested containers, which could lead to synchronization issues. Additionally, there are concerns regarding the performance overhead of scanning large containers during FFI calls.

getitem_args[0].type_index = type_index
getitem_args[0].v_obj = <TVMFFIObject*>chandle

for i in range(n):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Scanning a large container (e.g., an Array with thousands of elements) for a tensor can introduce significant overhead during FFI calls. Since this scan happens on every FFI call where a container is passed as an argument and the stream context isn't already set, consider adding a limit to the number of elements scanned or optimizing this check if possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this somewhat unavoidable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, scanning for O(n) is for correctness here.


@register_object("ffi.Array")
class Array(core.Object, Sequence[T]):
class Array(core.CContainerBase, core.Object, Sequence[T]):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are inheriting from both core.CContainerBase and core.Object, which both inherits from core.CObject. Should be fine :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep:)

Copy link
Copy Markdown
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! I left two minor comments

@junrushao junrushao merged commit 3539d70 into apache:main Apr 2, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants