[FEAT] Recursive DLPack container conversion for auto torch.Tensor return#517
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements recursive conversion of FFI containers—specifically Array, List, Map, and Dict—into native Python lists and dictionaries when the DLPack exchange API is active. This allows for the seamless return of nested structures containing tensors (e.g., PyTorch tensors) through the FFI. The changes include new Cython conversion logic, updated return value handling, and comprehensive tests for various nested and mixed-type scenarios. Feedback was provided to include an early return for empty sequences in the list conversion function to improve consistency and performance.
|
The PR looks good in functionalities, but I'm slightly worried about the semantics and runtime overhead due to eager recursive conversion. Is there a way to opt out this behavior? |
@junrushao I'd like to make sure that I address your right concern. Are you more worried about:
|
|
I do worry about both a) and b) separately, but b) over a).
I do worry about it in a usecase where we may want to return a mutable
I don't know how we could make it lazy, but a flag could definitely work. I'd love to hear from you and @tqchen what makes the most sense. |
|
agree about the List mutation case is something we want to think about, and wasn't on the radar before. |
0e91025 to
90f066d
Compare
90f066d to
9a0f26e
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements lazy DLPack conversion for FFI containers (Array, List, Map, and Dict) by introducing a CContainerBase class. This change enables the propagation of the DLPack exchange API and includes logic to scan containers for non-CPU tensors to automatically configure the stream context. Review feedback highlights that the current scanning mechanism is shallow and does not account for nested containers, which could lead to synchronization issues. Additionally, there are concerns regarding the performance overhead of scanning large containers during FFI calls.
| getitem_args[0].type_index = type_index | ||
| getitem_args[0].v_obj = <TVMFFIObject*>chandle | ||
|
|
||
| for i in range(n): |
There was a problem hiding this comment.
Scanning a large container (e.g., an Array with thousands of elements) for a tensor can introduce significant overhead during FFI calls. Since this scan happens on every FFI call where a container is passed as an argument and the stream context isn't already set, consider adding a limit to the number of elements scanned or optimizing this check if possible.
There was a problem hiding this comment.
Yes, scanning for O(n) is for correctness here.
|
|
||
| @register_object("ffi.Array") | ||
| class Array(core.Object, Sequence[T]): | ||
| class Array(core.CContainerBase, core.Object, Sequence[T]): |
There was a problem hiding this comment.
Looks like we are inheriting from both core.CContainerBase and core.Object, which both inherits from core.CObject. Should be fine :)
junrushao
left a comment
There was a problem hiding this comment.
Overall looks good! I left two minor comments
Problem
When a packed FFI function receives
torch.Tensorinputs, the return value is automatically converted back totorch.Tensorvia DLPack — but only for bare ffi.Tensor returns. When the return is a container (Array, List, Map, Dict) containing tensors, the tensors inside remain as ffi.Tensor, requiring manual per-element conversion.Solution
We perform a lazy conversion of each element in the container from ffi::Tensor to torch::Tensor when they're retrieved. A lazy conversion ensures both semantic correctness of containers and reduce runtime overhead compared with eager conversion.
Stream Propagation