Skip to content

Conversation

@jthomson04
Copy link
Contributor

@jthomson04 jthomson04 commented Dec 11, 2025

  • Fixes some issues with the --use-vllm-tokenizer flag.
  • Changes model card handling for mistral-format models. We only look for files like config.json, generation_config.json, etc. if it's an HF checkpoint.

Summary by CodeRabbit

  • New Features

    • Added support for Mistral-format models with automatic format detection and appropriate resource loading
  • Improvements

    • Optimized input processing by transitioning to token-based operations for both text and token input flows
    • Updated error messages to provide better guidance for tokenizer configuration scenarios

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: jthomson04 <[email protected]>
@jthomson04 jthomson04 requested review from a team as code owners December 11, 2025 04:54
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

Three files are updated to implement token-based processing for input handling and add Mistral-format model detection. Python components replace text inputs with tokenizer-encoded tokens in request handling and prompt construction. The Rust model card loader introduces conditional logic to skip artifact loading for Mistral-format models and updates error messaging accordingly.

Changes

Cohort / File(s) Summary
Tokenizer-based Input Encoding
components/src/dynamo/common/utils/input_params.py, components/src/dynamo/vllm/handlers.py
Replaces direct text input retrieval with tokenizer-based encoding. Updates prompt construction to use TokensPrompt with prompt_token_ids instead of TextPrompt with raw text. Changes applied to text-in-token OpenAI-like flow and PrefillWorker paths.
Mistral Model Handling
lib/llm/src/model_card.rs
Introduces is_exclusively_mistral_model() helper to detect Mistral-format models. Conditionalizes loading of model artifacts (model_info, tokenizer, gen_config, prompt_formatter) to None for Mistral models. Updates chat template and error messaging logic to account for Mistral model scenarios across both from_repo_checkout and from_local_path paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • lib/llm/src/model_card.rs — New helper function logic and conditional branching across multiple code paths warrant careful verification of model detection accuracy and impact on artifact loading flows
  • components/src/dynamo/vllm/handlers.py — Token-based prompt construction changes should be validated to ensure consistency with tokenizer output format
  • components/src/dynamo/common/utils/input_params.py — Verify tokenizer initialization state and encoding behavior across request types

Poem

🐰 Hop, hop—the tokens dance their way,
No more plain text clutters the fray,
Mistral hops through its own special door,
While prompts are encoded to the core!

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete; it lacks the required template sections including Overview, Details, and Where should the reviewer start sections. Expand the description to follow the template structure with proper Overview, Details, and reviewer guidance sections, including specific files to review.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main focus of the PR, which adds support for Mistral models with changes to model card handling and tokenizer flag fixes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
components/src/dynamo/common/utils/input_params.py (2)

9-29: Fix inconsistent return types in get_input_param.

The function now returns different types depending on the input:

  • List[int] for "prompt" and "text" (lines 24, 26: tokenizer.encode)
  • str for "messages" (line 20-21: apply_chat_template with tokenize=False)
  • Optional[List[int]] when use_tokenizer=False (line 29: token_ids)

This type inconsistency will cause runtime errors in callers that expect a uniform return type.

Apply this diff to make the return type consistent:

 def get_input_param(self, request: dict, use_tokenizer: bool):
     """
     Get the input parameter for the request.
+    Returns token IDs (List[int]) when use_tokenizer=True, or falls back to 
+    request["token_ids"] when use_tokenizer=False.
     """

     if use_tokenizer:
         print(f"Request: {request}")
         if self.tokenizer is None:
             raise ValueError("Tokenizer is not available")

         if "messages" in request:
             return self.tokenizer.apply_chat_template(
-                request["messages"], tokenize=False, add_generation_prompt=True
+                request["messages"], tokenize=True, add_generation_prompt=True
             )
         elif "prompt" in request:
             return self.tokenizer.encode(request["prompt"])
         elif "text" in request:
             return self.tokenizer.encode(request["text"])
         else:
             raise ValueError("No input parameter found in request")
     return request.get("token_ids")

15-15: Remove debug print statement.

The print statement on line 15 appears to be debugging code and should be removed or replaced with proper logging.

Apply this diff:

     if use_tokenizer:
-        print(f"Request: {request}")
+        # Consider using logging.debug() if needed for debugging
         if self.tokenizer is None:
components/src/dynamo/vllm/handlers.py (1)

1167-1176: Critical: PrefillWorkerHandler._generate_text_mode is broken.

Lines 1170-1175 were not updated but get_input_param now returns List[int] (tokens) instead of str (text). The code still assigns to input_text and passes it to TextPrompt(prompt=input_text), which expects a string. This will cause a runtime type error.

Apply this diff to fix the inconsistency:

 async def _generate_text_mode(self, request, context, request_id):
     """Generate prefill using OpenAI-compatible format (text-in-text-out)."""
     # Get text input using InputParamManager
-    input_text = self.input_param_manager.get_input_param(
+    input_tokens = self.input_param_manager.get_input_param(
         request, use_tokenizer=True
     )

     # Build prompt for vLLM
-    prompt = TextPrompt(prompt=input_text)
+    prompt = TokensPrompt(prompt_token_ids=input_tokens)

     # Build sampling params from OpenAI-style request
🧹 Nitpick comments (2)
lib/llm/src/model_card.rs (1)

507-518: Consider logging when mistral-format model is detected.

The conditional logic cleanly handles mistral vs HF models, but there's no indication to the user that mistral-format detection occurred. This could make debugging harder if the detection logic fails or behaves unexpectedly.

Add a log statement:

 let is_mistral_model = is_exclusively_mistral_model(local_path);
+
+if is_mistral_model {
+    tracing::debug!(
+        path = %local_path.display(),
+        "Detected Mistral-format model (no config.json, has params.json). Skipping HF artifact loading."
+    );
+}

 let (model_info, tokenizer, gen_config, prompt_formatter) = if !is_mistral_model {
components/src/dynamo/common/utils/input_params.py (1)

9-9: Add return type hint to get_input_param.

The function lacks a return type hint, which would make the type inconsistency more obvious and help catch bugs earlier.

Add type hints:

+from typing import List, Optional, Union
+
 class InputParamManager:
     def __init__(self, tokenizer):
         self.tokenizer = tokenizer

-    def get_input_param(self, request: dict, use_tokenizer: bool):
+    def get_input_param(self, request: dict, use_tokenizer: bool) -> Optional[List[int]]:
         """
         Get the input parameter for the request.
+        Returns token IDs as a list of integers.
         """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5250303 and 79ce9a3.

📒 Files selected for processing (3)
  • components/src/dynamo/common/utils/input_params.py (1 hunks)
  • components/src/dynamo/vllm/handlers.py (1 hunks)
  • lib/llm/src/model_card.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/llm/src/model_card.rs
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

  • lib/llm/src/model_card.rs
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects. Attempting to use attribute access (prompt.prompt_token_ids) will result in an error.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (2)
lib/llm/src/model_card.rs (3)
lib/llm/src/tokenizers.rs (1)
  • tokenizer (340-342)
lib/llm/src/common/checked_file.rs (1)
  • from_disk (43-57)
lib/llm/src/local_model.rs (1)
  • custom_template_path (153-156)
components/src/dynamo/vllm/handlers.py (1)
components/src/dynamo/common/utils/input_params.py (1)
  • get_input_param (9-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: clippy (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
components/src/dynamo/vllm/handlers.py (1)

949-954: LGTM! Correctly updated to use token-based input.

The DecodeWorkerHandler's text-mode path now correctly uses input_tokens with TokensPrompt, aligning with the changes in InputParamManager that now returns encoded tokens.

Comment on lines +165 to +168
/// Check if our model only has config fields for a Mistral-format model.
fn is_exclusively_mistral_model(directory: &Path) -> bool {
!directory.join("config.json").exists() && directory.join("params.json").exists()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify params.json existence before returning true.

The function checks that config.json doesn't exist but only verifies the presence of params.json without ensuring it's actually a valid file. Consider what happens if params.json exists but is unreadable or if the directory contains both files.

Apply this diff to make the check more robust:

 fn is_exclusively_mistral_model(directory: &Path) -> bool {
-    !directory.join("config.json").exists() && directory.join("params.json").exists()
+    let config_json = directory.join("config.json");
+    let params_json = directory.join("params.json");
+    !config_json.exists() && params_json.exists() && params_json.is_file()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Check if our model only has config fields for a Mistral-format model.
fn is_exclusively_mistral_model(directory: &Path) -> bool {
!directory.join("config.json").exists() && directory.join("params.json").exists()
}
/// Check if our model only has config fields for a Mistral-format model.
fn is_exclusively_mistral_model(directory: &Path) -> bool {
let config_json = directory.join("config.json");
let params_json = directory.join("params.json");
!config_json.exists() && params_json.exists() && params_json.is_file()
}
🤖 Prompt for AI Agents
In lib/llm/src/model_card.rs around lines 165 to 168, the current check returns
true if config.json is missing and params.json exists without verifying
params.json is a regular, readable file or guarding against both files present;
update the function to (1) confirm config.json does not exist, (2) check
params.json exists AND fs::metadata(params.json).is_file() (or attempt to open
it) to ensure it's a file and readable, and (3) return false if both config.json
and params.json are present; use proper error-safe APIs (metadata or File::open)
rather than just Path::exists to make the check robust.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, get the full context of the function and surrounding code
cat -n lib/llm/src/model_card.rs | sed -n '155,180p'

Repository: ai-dynamo/dynamo

Length of output: 1005


🏁 Script executed:

# Check the struct definition of ModelDeploymentCard
rg -n 'struct ModelDeploymentCard' -A 20 lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 887


🏁 Script executed:

# Now run the verification script as suggested in the comment
# Search for direct field access that might panic on None
rg -nP --type=rust -C3 '\.model_info\s*\.\s*unwrap|\.tokenizer\s*\.\s*unwrap|\.gen_config\s*\.\s*unwrap|\.prompt_formatter\s*\.\s*unwrap' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 42


🏁 Script executed:

# Search for pattern matching that doesn't handle None
rg -nP --type=rust -C3 'model_info:\s*Some\(|tokenizer:\s*Some\(' -g '!**/test/**' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 42


🏁 Script executed:

# Check if there are other ModelDeploymentCard construction paths
ast-grep --pattern 'ModelDeploymentCard { $$$ }' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 369


🏁 Script executed:

# Also check the sections mentioned: 507-518, 521-544
cat -n lib/llm/src/model_card.rs | sed -n '500,550p'

Repository: ai-dynamo/dynamo

Length of output: 2313


🏁 Script executed:

# See the full construction and error handling context
cat -n lib/llm/src/model_card.rs | sed -n '545,570p'

Repository: ai-dynamo/dynamo

Length of output: 1082


🏁 Script executed:

# Check the function signature and return type to understand error handling
rg -n 'fn.*from_disk.*ModelDeploymentCard' -B5 -A2 lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 42


🏁 Script executed:

# Check how ModelInfoType::from_disk handles errors
rg -n 'ModelInfoType::from_disk' -B2 -A10 lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 659


🏁 Script executed:

# Look for any other calls to is_exclusively_mistral_model to understand its usage
rg -n 'is_exclusively_mistral_model' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 200


🏁 Script executed:

# Check if there are any tests or documentation about mistral model handling
rg -n 'mistral' -i lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 617


The mistral-format detection logic is sound, but clarify edge case handling.

The detection approach is correct: it requires both the absence of config.json AND presence of params.json for exclusive mistral detection. However, address one edge case:

If a mistral model directory lacks params.json: The function returns false, causing the code to attempt regular model loading via ModelInfoType::from_disk(local_path)?. This will fail if the model truly doesn't have the expected fields. Either document this as intentional (params.json is required for mistral format) or add explicit validation.

The downstream concerns are not issues:

  • Downstream code at line 361 explicitly expects tokenizer to be None for mistral models
  • All optional fields (model_info, tokenizer, gen_config, prompt_formatter) are designed as Option<T> and properly handled
  • Only one ModelDeploymentCard construction path exists (line 249), using ..Default::default() which is consistent

Comment on lines +360 to +362
anyhow::bail!(
"Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required."
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the placeholder pattern in the error message.

The error message uses --use-<framework>-tokenizer as a placeholder, but users might not understand they need to replace <framework> with their specific framework (e.g., vllm). Consider being more explicit or providing examples.

Apply this diff:

 antml::bail!(
-    "Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required."
+    "Blank ModelDeploymentCard does not have a tokenizer. Is this a Mistral-format model? If so, use the `--use-vllm-tokenizer` (or equivalent framework-specific) flag in the engine command."
 );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
anyhow::bail!(
"Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required."
);
anyhow::bail!(
"Blank ModelDeploymentCard does not have a tokenizer. Is this a Mistral-format model? If so, use the `--use-vllm-tokenizer` (or equivalent framework-specific) flag in the engine command."
);

@jthomson04 jthomson04 changed the title Mistral-3-large support feat: Mistral-3-large support Dec 11, 2025
@github-actions github-actions bot added the feat label Dec 11, 2025
Signed-off-by: jthomson04 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants