-
Notifications
You must be signed in to change notification settings - Fork 753
feat: add python bridge #4946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add python bridge #4946
Conversation
|
👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis pull request introduces a global Python-Rust bridge singleton that manages async conversions and GIL access in a dedicated thread pool. The bridge replaces direct Python GIL handling throughout the Python bindings, with corresponding refactoring in engine, library, and protocol layers to use the new async conversion and GIL acquisition APIs. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (2)
lib/bindings/python/rust/lib.rs (1)
1042-1044: Consider implementing the timeout to prevent resource leaks.The TODO is valid - if the Python side fails to consume the stream, the sender will block indefinitely, potentially leaking the spawned task. Consider using
tokio::time::timeoutas noted.Example implementation:
use tokio::time::{timeout, Duration}; if let Err(_) = timeout(Duration::from_secs(300), tx.send(annotated)).await { tracing::error!("Timeout sending response - Python consumer may be stalled"); break; }Would you like me to open an issue to track this improvement?
lib/bindings/python/rust/bridge.rs (1)
26-41: Bridge initialization design looks reasonable.The single-worker design is appropriate given Python's GIL is itself a global singleton - multiple threads would just contend on the GIL. The bounded channel (1024) provides backpressure. Consider documenting these design decisions.
Consider adding a brief doc comment explaining the design rationale:
/// Global Python bridge for GIL-managed operations. /// /// Uses a single worker thread since Python's GIL serializes execution anyway. /// Jobs are queued via a bounded channel (capacity 1024) to provide backpressure. pub fn global() -> &'static Bridge {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/bindings/python/rust/bridge.rs(1 hunks)lib/bindings/python/rust/engine.rs(4 hunks)lib/bindings/python/rust/lib.rs(4 hunks)lib/runtime/src/protocols/annotated.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-14T21:25:56.930Z
Learnt from: ryanolson
Repo: ai-dynamo/dynamo PR: 1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Applied to files:
lib/bindings/python/rust/engine.rs
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.
Applied to files:
lib/bindings/python/rust/lib.rs
🧬 Code graph analysis (3)
lib/bindings/python/rust/bridge.rs (1)
lib/bindings/python/rust/engine.rs (4)
mpsc(165-165)new(80-87)new(110-126)e(317-317)
lib/bindings/python/rust/engine.rs (1)
lib/bindings/python/rust/bridge.rs (1)
global(22-42)
lib/bindings/python/rust/lib.rs (2)
lib/bindings/python/rust/bridge.rs (1)
global(22-42)lib/runtime/src/protocols/annotated.rs (2)
map_data_async(121-143)is_error(145-147)
🪛 GitHub Actions: Copyright Checks
lib/bindings/python/rust/bridge.rs
[error] 1-1: Copyright check failed. Missing or invalid copyright header in lib/bindings/python/rust/bridge.rs.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build and Test - dynamo
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/runtime/examples)
🔇 Additional comments (5)
lib/runtime/src/protocols/annotated.rs (1)
126-143: Implementation logic is correct.The async transformation properly handles all cases: transforms data when present, propagates errors via
from_error, and preserves metadata (id, event, comment) in both success and None branches. This mirrors the synchronousmap_databehavior appropriately.lib/bindings/python/rust/lib.rs (1)
942-952: Consistent pattern for bridge-based request conversion.The request handling pattern using
Bridge::global().from_py()is consistent acrossround_robin,random, anddirectmethods. This properly moves Python object conversion off the async task thread.lib/bindings/python/rust/engine.rs (2)
182-203: Good migration to bridge-based GIL handling.The change from
spawn_blocking+ directPython::with_giltobridge::Bridge::global().with_gil()centralizes GIL management and aligns with the PR's goal of reducing Tokio runtime pressure from GIL contention.
307-333: Improved error classification with bridge-based GIL handling.The reworked
process_itemfunction properly:
- Extracts the error message before moving into the bridge closure
- Uses bridge for GIL-dependent
is_instance_ofcheck- Correctly maps errors to appropriate
ResponseProcessingErrorvariants- Uses
bridge.from_py::<Resp>for deserializationlib/bindings/python/rust/bridge.rs (1)
44-94: Serialization/deserialization helpers are well-structured.The
to_pyandfrom_pymethods properly:
- Use oneshot channels for async response handling
- Map channel errors to appropriate PyErr variants
- Ensure Python objects are dropped while GIL is held (line 82 comment)
Signed-off-by: michaelfeil <[email protected]>
4d6869e to
6151c2e
Compare
Signed-off-by: michaelfeil <[email protected]>
5471ae6 to
4a6ca0d
Compare
Signed-off-by: michaelfeil <[email protected]>
f540e00 to
d22156b
Compare
Overview:
Details:
This PR comes after my company had a memory leak in our dynamo version (circular reference). Profiling large json payloads (3MB or larger each - 500k tokens, lead to a lot of profiling efforts.
In the profiling efforts, it became clear that a lot of allocations are queued around
async_openai/chat.rs Structtypes and memory in pythonize, which seems to cause some fragmentation We are also seeing some chunky behaviour for large allocations around serde_json::Value. In the end - it seemed dynamo was fine, we upgraded to another master as base + rewrote some things to make them less likley.As side effect, we created the Python Bridge, to debug when things got moved into python and back. On the way we discovered that the team used a lot of
tokio::spawn_blockingand has had performance issues due to the sync lock nature of python gil.The python bridge is a global python singleton that makes it simpler to serialize the gil from rust, by submitting Rust. The goal is to be easier on the tokio runtime by awaiting the result of
Python::with_gilusing async without tokio.Enjoy reading though it. I am pretty happy with the design as a global. Once thing is that the bridge remains active until you close the python process (which is fine, you don't want to tie it to a single engine's runtime).
Please skip DCO, merged main to often - the commits are all under DCO.
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)