Skip to content

Conversation

@milesial
Copy link
Contributor

@milesial milesial commented Dec 3, 2025

Video decoding section of this diagram:

507753170-3ea77800-5a60-4d3a-8ec4-dc8603ff90fe

Gated under a feature flag, default off as discussed with @grahamking , since it relies on ffmpeg libraries being installed on the system. We enable the flag in framework containers and CI/CD where we have ffmpeg.

Rust dependency chain:

  • dynamo-llm → video-rs → ffmpeg-next → ffmpeg-sys-next
  • ffmpeg-sys-next builds a FFI with bindgen and needs ffmpeg headers and libs to be present

This PR adds video decoder configuration that can be passed via python bindings and the MDC the same way as the existing image decoder. Similar to image decoding we also have some output metadata sent to the backend. See the README changes for some more details on the configuration.

The demuxing and decoding itself happens on CPU via video-rs and ffmpeg. We first try a "seek-forward" approach where we place the decoder cursor close the the frames we want to decode, and try to reach that frame. If the seek fails for any reason, we fall back to a less efficient "full decode" of all the frames sequentially.

For some context, not all frames in a video can be decoded independently, there are only a few key frames scattered in the video, and other frames are compressed with a dependency on these key frames. So if a frame we wish to sample is not a key frame. We must start decoding from the closest key frame until we reach the frame of interest.

Summary by CodeRabbit

  • New Features
    • Introduced video decoding functionality with customizable sampling options including frame count, fps, maximum frames, and pixel constraints.
    • Video decoder supports frame selection and uniform sampling with metadata tracking (source fps and frame count).
    • Added strict mode validation for video decoding operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@milesial milesial requested a review from a team as a code owner December 3, 2025 07:36
@github-actions github-actions bot added the feat label Dec 3, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

This pull request adds comprehensive video decoding support to the media preprocessing system. It introduces a new VideoDecoder implementation with frame sampling, a new video-rs dependency with ffmpeg-next for video processing, updates the Python bindings API, and removes the placeholder "not supported yet" error in favor of actual video decoding.

Changes

Cohort / File(s) Summary
Dependency Management
lib/llm/Cargo.toml
Adds video-rs (0.10.5 with ndarray feature) and ffmpeg-next (7.1) as public dependencies; introduces media-ffmpeg-build feature flag.
Video Decoder Implementation
lib/llm/src/preprocessor/media/decoders/video.rs
Introduces VideoDecoder and VideoMetadata types with Decoder trait implementation; handles frame sampling via fps or frame count, temporary file writing, pixel constraint enforcement, and uniform frame selection.
Core Decoder Extension
lib/llm/src/preprocessor/media/decoders.rs
Adds pub mod video with re-exports; extends MediaDecoder struct with video_decoder field; adds Video variant to DecodedMediaMetadata enum.
Media Loader Integration
lib/llm/src/preprocessor/media/loader.rs
Replaces VideoUrl error placeholder with actual video_decoder invocation for asynchronous decoding.
Python Bindings
lib/bindings/python/rust/llm/preprocessor.rs
Adds video_decoder(&mut self, video_decoder: &Bound<'_, PyDict>) -> PyResult<()> method to MediaDecoder, deserializing Python dict to Rust VideoDecoder.
Module & Documentation Updates
lib/llm/src/preprocessor.rs, lib/llm/src/preprocessor/media/README.md
Removes MediaDecoder/MediaFetcher re-exports; documents new video decoding API with configuration options (num_frames, fps, max_frames, strict, max_pixels).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Video decoding logic density: Frame sampling algorithm, temporary file handling, uniform frame index selection, and 4D tensor assembly in video.rs require careful validation
  • Edge case handling: Validate mutual exclusivity constraints (fps vs. num_frames, max_frames vs. num_frames), strict mode enforcement, and pixel limit boundaries
  • Error propagation: Verify FFmpeg error handling and graceful degradation for malformed video inputs
  • File system interactions: Confirm temporary file creation, cleanup, and potential race conditions in concurrent scenarios

Poem

🐰 Hoppy hops through frames so bright,
Video streams now decode right!
From pixel streams to tensor dreams,
A rabbit's joy in video schemes! 🎬✨

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive PR description lacks required template structure; missing explicit 'Details' section and 'Where should reviewer start?' guidance despite comprehensive technical context. Reorganize description to follow template sections (Overview, Details, Where should reviewer start, Related Issues) for consistency and clarity.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: video decoder in the frontend' clearly describes the main change—adding video decoder functionality to the frontend, which aligns with the substantial additions to video decoding logic across multiple files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
lib/llm/src/preprocessor/media/README.md (1)

49-49: Fix markdown heading formatting.

There's an extra space after the hash in the heading, which violates markdown lint rules (MD019).

-###  Sampling
+### Sampling
lib/llm/src/preprocessor/media/decoders/video.rs (1)

128-134: Consider logging decode errors for debugging.

Silently continuing past decode errors (line 131-133) may hide video corruption or decoder issues that are difficult to diagnose. A trace-level log would help with troubleshooting without impacting normal operation.

                 Err(video_rs::Error::ReadExhausted | video_rs::Error::DecodeExhausted) => {
                     break;
                 }
-                Err(_) => {
+                Err(e) => {
+                    tracing::trace!("Skipping frame {current_frame_idx}: {e}");
                     continue;
                 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2a13a and 05b8f94.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • lib/bindings/python/rust/llm/preprocessor.rs (1 hunks)
  • lib/llm/Cargo.toml (2 hunks)
  • lib/llm/src/preprocessor.rs (1 hunks)
  • lib/llm/src/preprocessor/media/README.md (2 hunks)
  • lib/llm/src/preprocessor/media/decoders.rs (2 hunks)
  • lib/llm/src/preprocessor/media/decoders/video.rs (1 hunks)
  • lib/llm/src/preprocessor/media/loader.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T20:51:51.324Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

  • lib/llm/src/preprocessor.rs
🧬 Code graph analysis (2)
lib/llm/src/preprocessor/media/loader.rs (1)
lib/llm/src/preprocessor/media/common.rs (1)
  • from_url (16-42)
lib/llm/src/preprocessor/media/decoders/video.rs (1)
lib/llm/src/preprocessor/media/decoders.rs (1)
  • decode (17-17)
🪛 GitHub Actions: Copyright Checks
lib/llm/src/preprocessor/media/decoders/video.rs

[error] 1-1: Copyright header check failed: invalid/missing header in lib/llm/src/preprocessor/media/decoders/video.rs

🪛 LanguageTool
lib/llm/src/preprocessor/media/README.md

[style] ~46-~46: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...eds this value, abort the decoding. - max_alloc (uint64, > 0): Maximum allowed ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)
lib/llm/src/preprocessor/media/README.md

49-49: Multiple spaces after hash on atx style heading

(MD019, no-multiple-space-atx)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: sglang (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (7)
lib/llm/src/preprocessor/media/README.md (1)

42-59: Documentation looks good.

The video decoding options are well documented with clear explanations of the two sampling modes (fixed num_frames vs fps-based), strict mode behavior, and the max_pixels safety limit. This provides good guidance for users configuring video decoding.

lib/llm/src/preprocessor.rs (1)

30-31: LGTM!

The import simplification to only bring in MediaLoader aligns with how the preprocessor uses the media module—MediaLoader is constructed from mdc.media_decoder and mdc.media_fetcher rather than directly using the types here.

lib/bindings/python/rust/llm/preprocessor.rs (1)

102-109: LGTM!

The video_decoder method implementation correctly mirrors the image_decoder pattern, using pythonize::depythonize for Python dict to Rust struct conversion with appropriate error mapping. The API is consistent and straightforward.

lib/llm/src/preprocessor/media/loader.rs (1)

124-129: Video decoding implementation is correct.

The implementation correctly mirrors the image decoding pattern: validate URL, fetch encoded data, then delegate to the video decoder. Error propagation via ? is consistent with the rest of the codebase.

Video processing is already tested in the test_media_url_passthrough integration test, which includes parametrized cases for single_video and mixed video scenarios, exercising the video decoding path through the real VideoDecoder implementation.

lib/llm/src/preprocessor/media/decoders.rs (1)

10-13: LGTM!

The video module integration follows the established pattern for ImageDecoder. The #[serde(default)] attribute ensures backward-compatible deserialization, and the Video variant cleanly extends the metadata enum.

Also applies to: 32-34, 40-40

lib/llm/src/preprocessor/media/decoders/video.rs (2)

15-33: LGTM on struct design.

The configuration struct is well-designed with sensible optional fields and deny_unknown_fields for strict config validation.


98-109: LGTM on frame sampling and tensor construction.

The uniform sampling logic using linspace formula is mathematically sound, and the 4D tensor construction (N, H, W, C) follows standard conventions for video data.

Also applies to: 149-161

Copy link
Contributor

@KrishnanPrash KrishnanPrash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a description to make it easier on the reviewers of this PR? Some things that aren't immediately obvious to me:

  • To my understanding,ffmpeg-next is a rust-wrapper of ffmpeg and the binary still needs to installed separately.
  • Build-wise what additional dependencies would audio support require?

@milesial
Copy link
Contributor Author

Could you add a description to make it easier on the reviewers of this PR? Some things that aren't immediately obvious to me:

* To my understanding,`ffmpeg-next` is a rust-wrapper of `ffmpeg` and the binary still needs to installed separately.

* Build-wise what additional dependencies would audio support require?

Added some details. For audio decoding this is not in scope of this PR but I'm assuming that if ffmpeg is installed on the system, we shouldn't need much else.

Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
@milesial milesial merged commit 74fcd4a into main Dec 16, 2025
38 of 40 checks passed
@milesial milesial deleted the alexandrem/video-decoding branch December 16, 2025 22:28
smatta-star pushed a commit to smatta-star/dynamo that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants