-
Notifications
You must be signed in to change notification settings - Fork 766
feat: video decoder in the frontend #4719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request adds comprehensive video decoding support to the media preprocessing system. It introduces a new VideoDecoder implementation with frame sampling, a new video-rs dependency with ffmpeg-next for video processing, updates the Python bindings API, and removes the placeholder "not supported yet" error in favor of actual video decoding. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
lib/llm/src/preprocessor/media/README.md (1)
49-49: Fix markdown heading formatting.There's an extra space after the hash in the heading, which violates markdown lint rules (MD019).
-### Sampling +### Samplinglib/llm/src/preprocessor/media/decoders/video.rs (1)
128-134: Consider logging decode errors for debugging.Silently continuing past decode errors (line 131-133) may hide video corruption or decoder issues that are difficult to diagnose. A trace-level log would help with troubleshooting without impacting normal operation.
Err(video_rs::Error::ReadExhausted | video_rs::Error::DecodeExhausted) => { break; } - Err(_) => { + Err(e) => { + tracing::trace!("Skipping frame {current_frame_idx}: {e}"); continue; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
lib/bindings/python/rust/llm/preprocessor.rs(1 hunks)lib/llm/Cargo.toml(2 hunks)lib/llm/src/preprocessor.rs(1 hunks)lib/llm/src/preprocessor/media/README.md(2 hunks)lib/llm/src/preprocessor/media/decoders.rs(2 hunks)lib/llm/src/preprocessor/media/decoders/video.rs(1 hunks)lib/llm/src/preprocessor/media/loader.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T20:51:51.324Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.
Applied to files:
lib/llm/src/preprocessor.rs
🧬 Code graph analysis (2)
lib/llm/src/preprocessor/media/loader.rs (1)
lib/llm/src/preprocessor/media/common.rs (1)
from_url(16-42)
lib/llm/src/preprocessor/media/decoders/video.rs (1)
lib/llm/src/preprocessor/media/decoders.rs (1)
decode(17-17)
🪛 GitHub Actions: Copyright Checks
lib/llm/src/preprocessor/media/decoders/video.rs
[error] 1-1: Copyright header check failed: invalid/missing header in lib/llm/src/preprocessor/media/decoders/video.rs
🪛 LanguageTool
lib/llm/src/preprocessor/media/README.md
[style] ~46-~46: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...eds this value, abort the decoding. - max_alloc (uint64, > 0): Maximum allowed ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
lib/llm/src/preprocessor/media/README.md
49-49: Multiple spaces after hash on atx style heading
(MD019, no-multiple-space-atx)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (7)
lib/llm/src/preprocessor/media/README.md (1)
42-59: Documentation looks good.The video decoding options are well documented with clear explanations of the two sampling modes (fixed
num_framesvsfps-based), strict mode behavior, and themax_pixelssafety limit. This provides good guidance for users configuring video decoding.lib/llm/src/preprocessor.rs (1)
30-31: LGTM!The import simplification to only bring in
MediaLoaderaligns with how the preprocessor uses the media module—MediaLoaderis constructed frommdc.media_decoderandmdc.media_fetcherrather than directly using the types here.lib/bindings/python/rust/llm/preprocessor.rs (1)
102-109: LGTM!The
video_decodermethod implementation correctly mirrors theimage_decoderpattern, usingpythonize::depythonizefor Python dict to Rust struct conversion with appropriate error mapping. The API is consistent and straightforward.lib/llm/src/preprocessor/media/loader.rs (1)
124-129: Video decoding implementation is correct.The implementation correctly mirrors the image decoding pattern: validate URL, fetch encoded data, then delegate to the video decoder. Error propagation via
?is consistent with the rest of the codebase.Video processing is already tested in the
test_media_url_passthroughintegration test, which includes parametrized cases forsingle_videoand mixed video scenarios, exercising the video decoding path through the realVideoDecoderimplementation.lib/llm/src/preprocessor/media/decoders.rs (1)
10-13: LGTM!The video module integration follows the established pattern for
ImageDecoder. The#[serde(default)]attribute ensures backward-compatible deserialization, and theVideovariant cleanly extends the metadata enum.Also applies to: 32-34, 40-40
lib/llm/src/preprocessor/media/decoders/video.rs (2)
15-33: LGTM on struct design.The configuration struct is well-designed with sensible optional fields and
deny_unknown_fieldsfor strict config validation.
98-109: LGTM on frame sampling and tensor construction.The uniform sampling logic using linspace formula is mathematically sound, and the 4D tensor construction (N, H, W, C) follows standard conventions for video data.
Also applies to: 149-161
05b8f94 to
2cab45b
Compare
2cab45b to
6e47c66
Compare
6e47c66 to
738998c
Compare
738998c to
3f818ac
Compare
KrishnanPrash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a description to make it easier on the reviewers of this PR? Some things that aren't immediately obvious to me:
- To my understanding,
ffmpeg-nextis a rust-wrapper offfmpegand the binary still needs to installed separately. - Build-wise what additional dependencies would audio support require?
Added some details. For audio decoding this is not in scope of this PR but I'm assuming that if ffmpeg is installed on the system, we shouldn't need much else. |
69635b6 to
2c7cb81
Compare
2c7cb81 to
2ea53b8
Compare
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
2ea53b8 to
218ab0e
Compare
Signed-off-by: Alexandre Milesi <[email protected]>
95b96f3 to
387aa86
Compare
Signed-off-by: Alexandre Milesi <[email protected]>
Video decoding section of this diagram:
Gated under a feature flag, default off as discussed with @grahamking , since it relies on ffmpeg libraries being installed on the system. We enable the flag in framework containers and CI/CD where we have ffmpeg.
Rust dependency chain:
This PR adds video decoder configuration that can be passed via python bindings and the MDC the same way as the existing image decoder. Similar to image decoding we also have some output metadata sent to the backend. See the README changes for some more details on the configuration.
The demuxing and decoding itself happens on CPU via video-rs and ffmpeg. We first try a "seek-forward" approach where we place the decoder cursor close the the frames we want to decode, and try to reach that frame. If the seek fails for any reason, we fall back to a less efficient "full decode" of all the frames sequentially.
For some context, not all frames in a video can be decoded independently, there are only a few key frames scattered in the video, and other frames are compressed with a dependency on these key frames. So if a frame we wish to sample is not a key frame. We must start decoding from the closest key frame until we reach the frame of interest.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.