feat: Mistral-3-large support #4885

jthomson04 · 2025-12-11T04:54:38Z

Fixes some issues with the --use-vllm-tokenizer flag.
Changes model card handling for mistral-format models. We only look for files like config.json, generation_config.json, etc. if it's an HF checkpoint.

Summary by CodeRabbit

New Features
- Added support for Mistral-format models with automatic format detection and appropriate resource loading
Improvements
- Optimized input processing by transitioning to token-based operations for both text and token input flows
- Updated error messages to provide better guidance for tokenizer configuration scenarios

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: jthomson04 <[email protected]>

coderabbitai · 2025-12-11T04:58:54Z

Walkthrough

Three files are updated to implement token-based processing for input handling and add Mistral-format model detection. Python components replace text inputs with tokenizer-encoded tokens in request handling and prompt construction. The Rust model card loader introduces conditional logic to skip artifact loading for Mistral-format models and updates error messaging accordingly.

Changes

Cohort / File(s)	Summary
Tokenizer-based Input Encoding `components/src/dynamo/common/utils/input_params.py`, `components/src/dynamo/vllm/handlers.py`	Replaces direct text input retrieval with tokenizer-based encoding. Updates prompt construction to use `TokensPrompt` with `prompt_token_ids` instead of `TextPrompt` with raw text. Changes applied to text-in-token OpenAI-like flow and PrefillWorker paths.
Mistral Model Handling `lib/llm/src/model_card.rs`	Introduces `is_exclusively_mistral_model()` helper to detect Mistral-format models. Conditionalizes loading of model artifacts (model_info, tokenizer, gen_config, prompt_formatter) to None for Mistral models. Updates chat template and error messaging logic to account for Mistral model scenarios across both `from_repo_checkout` and `from_local_path` paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

lib/llm/src/model_card.rs — New helper function logic and conditional branching across multiple code paths warrant careful verification of model detection accuracy and impact on artifact loading flows
components/src/dynamo/vllm/handlers.py — Token-based prompt construction changes should be validated to ensure consistency with tokenizer output format
components/src/dynamo/common/utils/input_params.py — Verify tokenizer initialization state and encoding behavior across request types

Poem

🐰 Hop, hop—the tokens dance their way,
No more plain text clutters the fray,
Mistral hops through its own special door,
While prompts are encoded to the core! ✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is incomplete; it lacks the required template sections including Overview, Details, and Where should the reviewer start sections.	Expand the description to follow the template structure with proper Overview, Details, and reviewer guidance sections, including specific files to review.
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main focus of the PR, which adds support for Mistral models with changes to model card handling and tokenizer flag fixes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

components/src/dynamo/common/utils/input_params.py (2)
9-29: Fix inconsistent return types in get_input_param.

The function now returns different types depending on the input:

List[int] for "prompt" and "text" (lines 24, 26: tokenizer.encode)

str for "messages" (line 20-21: apply_chat_template with tokenize=False)

Optional[List[int]] when use_tokenizer=False (line 29: token_ids)

This type inconsistency will cause runtime errors in callers that expect a uniform return type.

Apply this diff to make the return type consistent:
 def get_input_param(self, request: dict, use_tokenizer: bool):
     """
     Get the input parameter for the request.
+    Returns token IDs (List[int]) when use_tokenizer=True, or falls back to 
+    request["token_ids"] when use_tokenizer=False.
     """

     if use_tokenizer:
         print(f"Request: {request}")
         if self.tokenizer is None:
             raise ValueError("Tokenizer is not available")

         if "messages" in request:
             return self.tokenizer.apply_chat_template(
-                request["messages"], tokenize=False, add_generation_prompt=True
+                request["messages"], tokenize=True, add_generation_prompt=True
             )
         elif "prompt" in request:
             return self.tokenizer.encode(request["prompt"])
         elif "text" in request:
             return self.tokenizer.encode(request["text"])
         else:
             raise ValueError("No input parameter found in request")
     return request.get("token_ids")
15-15: Remove debug print statement.

The print statement on line 15 appears to be debugging code and should be removed or replaced with proper logging.

Apply this diff:
     if use_tokenizer:
-        print(f"Request: {request}")
+        # Consider using logging.debug() if needed for debugging
         if self.tokenizer is None:
components/src/dynamo/vllm/handlers.py (1)
1167-1176: Critical: PrefillWorkerHandler._generate_text_mode is broken.

Lines 1170-1175 were not updated but get_input_param now returns List[int] (tokens) instead of str (text). The code still assigns to input_text and passes it to TextPrompt(prompt=input_text), which expects a string. This will cause a runtime type error.

Apply this diff to fix the inconsistency:
 async def _generate_text_mode(self, request, context, request_id):
     """Generate prefill using OpenAI-compatible format (text-in-text-out)."""
     # Get text input using InputParamManager
-    input_text = self.input_param_manager.get_input_param(
+    input_tokens = self.input_param_manager.get_input_param(
         request, use_tokenizer=True
     )

     # Build prompt for vLLM
-    prompt = TextPrompt(prompt=input_text)
+    prompt = TokensPrompt(prompt_token_ids=input_tokens)

     # Build sampling params from OpenAI-style request

🧹 Nitpick comments (2)

lib/llm/src/model_card.rs (1)
507-518: Consider logging when mistral-format model is detected.

The conditional logic cleanly handles mistral vs HF models, but there's no indication to the user that mistral-format detection occurred. This could make debugging harder if the detection logic fails or behaves unexpectedly.

Add a log statement:
 let is_mistral_model = is_exclusively_mistral_model(local_path);
+
+if is_mistral_model {
+    tracing::debug!(
+        path = %local_path.display(),
+        "Detected Mistral-format model (no config.json, has params.json). Skipping HF artifact loading."
+    );
+}

 let (model_info, tokenizer, gen_config, prompt_formatter) = if !is_mistral_model {
components/src/dynamo/common/utils/input_params.py (1)
9-9: Add return type hint to get_input_param.

The function lacks a return type hint, which would make the type inconsistency more obvious and help catch bugs earlier.

Add type hints:
+from typing import List, Optional, Union
+
 class InputParamManager:
     def __init__(self, tokenizer):
         self.tokenizer = tokenizer

-    def get_input_param(self, request: dict, use_tokenizer: bool):
+    def get_input_param(self, request: dict, use_tokenizer: bool) -> Optional[List[int]]:
         """
         Get the input parameter for the request.
+        Returns token IDs as a list of integers.
         """

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5250303 and 79ce9a3.

📒 Files selected for processing (3)

components/src/dynamo/common/utils/input_params.py (1 hunks)
components/src/dynamo/vllm/handlers.py (1 hunks)
lib/llm/src/model_card.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/llm/src/model_card.rs

📚 Learning: 2025-09-08T21:18:43.478Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

lib/llm/src/model_card.rs

📚 Learning: 2025-06-08T08:28:20.100Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects.

Applied to files:

components/src/dynamo/vllm/handlers.py

📚 Learning: 2025-06-08T08:28:20.100Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects. Attempting to use attribute access (prompt.prompt_token_ids) will result in an error.

Applied to files:

components/src/dynamo/vllm/handlers.py

🧬 Code graph analysis (2)

lib/llm/src/model_card.rs (3)

lib/llm/src/tokenizers.rs (1)

tokenizer (340-342)

lib/llm/src/common/checked_file.rs (1)

from_disk (43-57)

lib/llm/src/local_model.rs (1)

custom_template_path (153-156)

components/src/dynamo/vllm/handlers.py (1)

components/src/dynamo/common/utils/input_params.py (1)

get_input_param (9-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: sglang (amd64)
GitHub Check: vllm (amd64)
GitHub Check: operator (amd64)
GitHub Check: operator (arm64)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: clippy (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

components/src/dynamo/vllm/handlers.py (1)

949-954: LGTM! Correctly updated to use token-based input.

The DecodeWorkerHandler's text-mode path now correctly uses input_tokens with TokensPrompt, aligning with the changes in InputParamManager that now returns encoded tokens.

coderabbitai · 2025-12-11T04:58:57Z

lib/llm/src/model_card.rs

+/// Check if our model only has config fields for a Mistral-format model.
+fn is_exclusively_mistral_model(directory: &Path) -> bool {
+    !directory.join("config.json").exists() && directory.join("params.json").exists()
+}


⚠️ Potential issue | 🟡 Minor

Verify params.json existence before returning true.

The function checks that config.json doesn't exist but only verifies the presence of params.json without ensuring it's actually a valid file. Consider what happens if params.json exists but is unreadable or if the directory contains both files.

Apply this diff to make the check more robust:

fn is_exclusively_mistral_model(directory: &Path) -> bool { - !directory.join("config.json").exists() && directory.join("params.json").exists() + let config_json = directory.join("config.json"); + let params_json = directory.join("params.json"); + !config_json.exists() && params_json.exists() && params_json.is_file() }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Check if our model only has config fields for a Mistral-format model.

fn is_exclusively_mistral_model(directory: &Path) -> bool {

!directory.join("config.json").exists() && directory.join("params.json").exists()

}

/// Check if our model only has config fields for a Mistral-format model.

fn is_exclusively_mistral_model(directory: &Path) -> bool {

let config_json = directory.join("config.json");

let params_json = directory.join("params.json");

!config_json.exists() && params_json.exists() && params_json.is_file()

}

🤖 Prompt for AI Agents

In lib/llm/src/model_card.rs around lines 165 to 168, the current check returns true if config.json is missing and params.json exists without verifying params.json is a regular, readable file or guarding against both files present; update the function to (1) confirm config.json does not exist, (2) check params.json exists AND fs::metadata(params.json).is_file() (or attempt to open it) to ensure it's a file and readable, and (3) return false if both config.json and params.json are present; use proper error-safe APIs (metadata or File::open) rather than just Path::exists to make the check robust.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, get the full context of the function and surrounding code cat -n lib/llm/src/model_card.rs | sed -n '155,180p'

Repository: ai-dynamo/dynamo

Length of output: 1005

🏁 Script executed:

# Check the struct definition of ModelDeploymentCard rg -n 'struct ModelDeploymentCard' -A 20 lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 887

🏁 Script executed:

# Now run the verification script as suggested in the comment # Search for direct field access that might panic on None rg -nP --type=rust -C3 '\.model_info\s*\.\s*unwrap|\.tokenizer\s*\.\s*unwrap|\.gen_config\s*\.\s*unwrap|\.prompt_formatter\s*\.\s*unwrap' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 42

🏁 Script executed:

# Search for pattern matching that doesn't handle None rg -nP --type=rust -C3 'model_info:\s*Some\(|tokenizer:\s*Some\(' -g '!**/test/**' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 42

🏁 Script executed:

# Check if there are other ModelDeploymentCard construction paths ast-grep --pattern 'ModelDeploymentCard { $$$ }' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 369

🏁 Script executed:

# Also check the sections mentioned: 507-518, 521-544 cat -n lib/llm/src/model_card.rs | sed -n '500,550p'

Repository: ai-dynamo/dynamo

Length of output: 2313

🏁 Script executed:

# See the full construction and error handling context cat -n lib/llm/src/model_card.rs | sed -n '545,570p'

Repository: ai-dynamo/dynamo

Length of output: 1082

🏁 Script executed:

# Check the function signature and return type to understand error handling rg -n 'fn.*from_disk.*ModelDeploymentCard' -B5 -A2 lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 42

🏁 Script executed:

# Check how ModelInfoType::from_disk handles errors rg -n 'ModelInfoType::from_disk' -B2 -A10 lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 659

🏁 Script executed:

# Look for any other calls to is_exclusively_mistral_model to understand its usage rg -n 'is_exclusively_mistral_model' lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 200

🏁 Script executed:

# Check if there are any tests or documentation about mistral model handling rg -n 'mistral' -i lib/llm/src/model_card.rs

Repository: ai-dynamo/dynamo

Length of output: 617

The mistral-format detection logic is sound, but clarify edge case handling.

The detection approach is correct: it requires both the absence of config.json AND presence of params.json for exclusive mistral detection. However, address one edge case:

If a mistral model directory lacks params.json: The function returns false, causing the code to attempt regular model loading via ModelInfoType::from_disk(local_path)?. This will fail if the model truly doesn't have the expected fields. Either document this as intentional (params.json is required for mistral format) or add explicit validation.

The downstream concerns are not issues:

Downstream code at line 361 explicitly expects tokenizer to be None for mistral models

All optional fields (model_info, tokenizer, gen_config, prompt_formatter) are designed as Option<T> and properly handled

Only one ModelDeploymentCard construction path exists (line 249), using ..Default::default() which is consistent

coderabbitai · 2025-12-11T04:58:57Z

lib/llm/src/model_card.rs

+                anyhow::bail!(
+                    "Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required."
+                );


⚠️ Potential issue | 🟡 Minor

Clarify the placeholder pattern in the error message.

The error message uses --use-<framework>-tokenizer as a placeholder, but users might not understand they need to replace <framework> with their specific framework (e.g., vllm). Consider being more explicit or providing examples.

Apply this diff:

antml::bail!( - "Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required." + "Blank ModelDeploymentCard does not have a tokenizer. Is this a Mistral-format model? If so, use the `--use-vllm-tokenizer` (or equivalent framework-specific) flag in the engine command." );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

anyhow::bail!(

"Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required."

);

anyhow::bail!(

"Blank ModelDeploymentCard does not have a tokenizer. Is this a Mistral-format model? If so, use the `--use-vllm-tokenizer` (or equivalent framework-specific) flag in the engine command."

);

Signed-off-by: jthomson04 <[email protected]>

mistral support

79ce9a3

Signed-off-by: jthomson04 <[email protected]>

jthomson04 requested review from a team as code owners December 11, 2025 04:54

pull-request-size bot added the size/M label Dec 11, 2025

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

jthomson04 changed the title ~~Mistral-3-large support~~ feat: Mistral-3-large support Dec 11, 2025

github-actions bot added the feat label Dec 11, 2025

remove debug log

7d37a3a

Signed-off-by: jthomson04 <[email protected]>

copy-pr-bot bot had a problem deploying to GITLAB December 11, 2025 17:11 Failure

Merge branch 'main' into jthomson04/mistral

711c899

copy-pr-bot bot had a problem deploying to GITLAB December 12, 2025 19:40 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Mistral-3-large support #4885

feat: Mistral-3-large support #4885

Uh oh!

jthomson04 commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 11, 2025

Uh oh!

coderabbitai bot Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Mistral-3-large support #4885

Are you sure you want to change the base?

feat: Mistral-3-large support #4885

Uh oh!

Conversation

jthomson04 commented Dec 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jthomson04 commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2025 •

edited

Loading