-
Notifications
You must be signed in to change notification settings - Fork 738
feat: Mistral-3-large support #4885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: jthomson04 <[email protected]>
WalkthroughThree files are updated to implement token-based processing for input handling and add Mistral-format model detection. Python components replace text inputs with tokenizer-encoded tokens in request handling and prompt construction. The Rust model card loader introduces conditional logic to skip artifact loading for Mistral-format models and updates error messaging accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/src/dynamo/common/utils/input_params.py (2)
9-29: Fix inconsistent return types in get_input_param.The function now returns different types depending on the input:
List[int]for "prompt" and "text" (lines 24, 26: tokenizer.encode)strfor "messages" (line 20-21: apply_chat_template with tokenize=False)Optional[List[int]]when use_tokenizer=False (line 29: token_ids)This type inconsistency will cause runtime errors in callers that expect a uniform return type.
Apply this diff to make the return type consistent:
def get_input_param(self, request: dict, use_tokenizer: bool): """ Get the input parameter for the request. + Returns token IDs (List[int]) when use_tokenizer=True, or falls back to + request["token_ids"] when use_tokenizer=False. """ if use_tokenizer: print(f"Request: {request}") if self.tokenizer is None: raise ValueError("Tokenizer is not available") if "messages" in request: return self.tokenizer.apply_chat_template( - request["messages"], tokenize=False, add_generation_prompt=True + request["messages"], tokenize=True, add_generation_prompt=True ) elif "prompt" in request: return self.tokenizer.encode(request["prompt"]) elif "text" in request: return self.tokenizer.encode(request["text"]) else: raise ValueError("No input parameter found in request") return request.get("token_ids")
15-15: Remove debug print statement.The print statement on line 15 appears to be debugging code and should be removed or replaced with proper logging.
Apply this diff:
if use_tokenizer: - print(f"Request: {request}") + # Consider using logging.debug() if needed for debugging if self.tokenizer is None:components/src/dynamo/vllm/handlers.py (1)
1167-1176: Critical: PrefillWorkerHandler._generate_text_mode is broken.Lines 1170-1175 were not updated but
get_input_paramnow returnsList[int](tokens) instead ofstr(text). The code still assigns toinput_textand passes it toTextPrompt(prompt=input_text), which expects a string. This will cause a runtime type error.Apply this diff to fix the inconsistency:
async def _generate_text_mode(self, request, context, request_id): """Generate prefill using OpenAI-compatible format (text-in-text-out).""" # Get text input using InputParamManager - input_text = self.input_param_manager.get_input_param( + input_tokens = self.input_param_manager.get_input_param( request, use_tokenizer=True ) # Build prompt for vLLM - prompt = TextPrompt(prompt=input_text) + prompt = TokensPrompt(prompt_token_ids=input_tokens) # Build sampling params from OpenAI-style request
🧹 Nitpick comments (2)
lib/llm/src/model_card.rs (1)
507-518: Consider logging when mistral-format model is detected.The conditional logic cleanly handles mistral vs HF models, but there's no indication to the user that mistral-format detection occurred. This could make debugging harder if the detection logic fails or behaves unexpectedly.
Add a log statement:
let is_mistral_model = is_exclusively_mistral_model(local_path); + +if is_mistral_model { + tracing::debug!( + path = %local_path.display(), + "Detected Mistral-format model (no config.json, has params.json). Skipping HF artifact loading." + ); +} let (model_info, tokenizer, gen_config, prompt_formatter) = if !is_mistral_model {components/src/dynamo/common/utils/input_params.py (1)
9-9: Add return type hint to get_input_param.The function lacks a return type hint, which would make the type inconsistency more obvious and help catch bugs earlier.
Add type hints:
+from typing import List, Optional, Union + class InputParamManager: def __init__(self, tokenizer): self.tokenizer = tokenizer - def get_input_param(self, request: dict, use_tokenizer: bool): + def get_input_param(self, request: dict, use_tokenizer: bool) -> Optional[List[int]]: """ Get the input parameter for the request. + Returns token IDs as a list of integers. """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/src/dynamo/common/utils/input_params.py(1 hunks)components/src/dynamo/vllm/handlers.py(1 hunks)lib/llm/src/model_card.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.
Applied to files:
lib/llm/src/model_card.rs
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.
Applied to files:
lib/llm/src/model_card.rs
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects.
Applied to files:
components/src/dynamo/vllm/handlers.py
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects. Attempting to use attribute access (prompt.prompt_token_ids) will result in an error.
Applied to files:
components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (2)
lib/llm/src/model_card.rs (3)
lib/llm/src/tokenizers.rs (1)
tokenizer(340-342)lib/llm/src/common/checked_file.rs (1)
from_disk(43-57)lib/llm/src/local_model.rs (1)
custom_template_path(153-156)
components/src/dynamo/vllm/handlers.py (1)
components/src/dynamo/common/utils/input_params.py (1)
get_input_param(9-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: clippy (.)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
components/src/dynamo/vllm/handlers.py (1)
949-954: LGTM! Correctly updated to use token-based input.The DecodeWorkerHandler's text-mode path now correctly uses
input_tokenswithTokensPrompt, aligning with the changes in InputParamManager that now returns encoded tokens.
| /// Check if our model only has config fields for a Mistral-format model. | ||
| fn is_exclusively_mistral_model(directory: &Path) -> bool { | ||
| !directory.join("config.json").exists() && directory.join("params.json").exists() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify params.json existence before returning true.
The function checks that config.json doesn't exist but only verifies the presence of params.json without ensuring it's actually a valid file. Consider what happens if params.json exists but is unreadable or if the directory contains both files.
Apply this diff to make the check more robust:
fn is_exclusively_mistral_model(directory: &Path) -> bool {
- !directory.join("config.json").exists() && directory.join("params.json").exists()
+ let config_json = directory.join("config.json");
+ let params_json = directory.join("params.json");
+ !config_json.exists() && params_json.exists() && params_json.is_file()
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Check if our model only has config fields for a Mistral-format model. | |
| fn is_exclusively_mistral_model(directory: &Path) -> bool { | |
| !directory.join("config.json").exists() && directory.join("params.json").exists() | |
| } | |
| /// Check if our model only has config fields for a Mistral-format model. | |
| fn is_exclusively_mistral_model(directory: &Path) -> bool { | |
| let config_json = directory.join("config.json"); | |
| let params_json = directory.join("params.json"); | |
| !config_json.exists() && params_json.exists() && params_json.is_file() | |
| } |
🤖 Prompt for AI Agents
In lib/llm/src/model_card.rs around lines 165 to 168, the current check returns
true if config.json is missing and params.json exists without verifying
params.json is a regular, readable file or guarding against both files present;
update the function to (1) confirm config.json does not exist, (2) check
params.json exists AND fs::metadata(params.json).is_file() (or attempt to open
it) to ensure it's a file and readable, and (3) return false if both config.json
and params.json are present; use proper error-safe APIs (metadata or File::open)
rather than just Path::exists to make the check robust.
🧩 Analysis chain
🏁 Script executed:
# First, get the full context of the function and surrounding code
cat -n lib/llm/src/model_card.rs | sed -n '155,180p'Repository: ai-dynamo/dynamo
Length of output: 1005
🏁 Script executed:
# Check the struct definition of ModelDeploymentCard
rg -n 'struct ModelDeploymentCard' -A 20 lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 887
🏁 Script executed:
# Now run the verification script as suggested in the comment
# Search for direct field access that might panic on None
rg -nP --type=rust -C3 '\.model_info\s*\.\s*unwrap|\.tokenizer\s*\.\s*unwrap|\.gen_config\s*\.\s*unwrap|\.prompt_formatter\s*\.\s*unwrap' lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 42
🏁 Script executed:
# Search for pattern matching that doesn't handle None
rg -nP --type=rust -C3 'model_info:\s*Some\(|tokenizer:\s*Some\(' -g '!**/test/**' lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 42
🏁 Script executed:
# Check if there are other ModelDeploymentCard construction paths
ast-grep --pattern 'ModelDeploymentCard { $$$ }' lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 369
🏁 Script executed:
# Also check the sections mentioned: 507-518, 521-544
cat -n lib/llm/src/model_card.rs | sed -n '500,550p'Repository: ai-dynamo/dynamo
Length of output: 2313
🏁 Script executed:
# See the full construction and error handling context
cat -n lib/llm/src/model_card.rs | sed -n '545,570p'Repository: ai-dynamo/dynamo
Length of output: 1082
🏁 Script executed:
# Check the function signature and return type to understand error handling
rg -n 'fn.*from_disk.*ModelDeploymentCard' -B5 -A2 lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 42
🏁 Script executed:
# Check how ModelInfoType::from_disk handles errors
rg -n 'ModelInfoType::from_disk' -B2 -A10 lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 659
🏁 Script executed:
# Look for any other calls to is_exclusively_mistral_model to understand its usage
rg -n 'is_exclusively_mistral_model' lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 200
🏁 Script executed:
# Check if there are any tests or documentation about mistral model handling
rg -n 'mistral' -i lib/llm/src/model_card.rsRepository: ai-dynamo/dynamo
Length of output: 617
The mistral-format detection logic is sound, but clarify edge case handling.
The detection approach is correct: it requires both the absence of config.json AND presence of params.json for exclusive mistral detection. However, address one edge case:
If a mistral model directory lacks params.json: The function returns false, causing the code to attempt regular model loading via ModelInfoType::from_disk(local_path)?. This will fail if the model truly doesn't have the expected fields. Either document this as intentional (params.json is required for mistral format) or add explicit validation.
The downstream concerns are not issues:
- Downstream code at line 361 explicitly expects tokenizer to be
Nonefor mistral models - All optional fields (
model_info,tokenizer,gen_config,prompt_formatter) are designed asOption<T>and properly handled - Only one
ModelDeploymentCardconstruction path exists (line 249), using..Default::default()which is consistent
| anyhow::bail!( | ||
| "Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required." | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the placeholder pattern in the error message.
The error message uses --use-<framework>-tokenizer as a placeholder, but users might not understand they need to replace <framework> with their specific framework (e.g., vllm). Consider being more explicit or providing examples.
Apply this diff:
antml::bail!(
- "Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required."
+ "Blank ModelDeploymentCard does not have a tokenizer. Is this a Mistral-format model? If so, use the `--use-vllm-tokenizer` (or equivalent framework-specific) flag in the engine command."
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| anyhow::bail!( | |
| "Blank ModelDeploymentCard does not have a tokenizer. Is this a mistral model? If so, the `--use-<framework>-tokenizer` flag in the engine command is required." | |
| ); | |
| anyhow::bail!( | |
| "Blank ModelDeploymentCard does not have a tokenizer. Is this a Mistral-format model? If so, use the `--use-vllm-tokenizer` (or equivalent framework-specific) flag in the engine command." | |
| ); |
Signed-off-by: jthomson04 <[email protected]>
--use-vllm-tokenizerflag.config.json,generation_config.json, etc. if it's an HF checkpoint.Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.