Skip to content

[Feat] Helm: add support for per-model tolerations#897

Open
AlexanderSing wants to merge 2 commits intovllm-project:mainfrom
AlexanderSing:feat/make-tolerations-configurable-per-modelspec
Open

[Feat] Helm: add support for per-model tolerations#897
AlexanderSing wants to merge 2 commits intovllm-project:mainfrom
AlexanderSing:feat/make-tolerations-configurable-per-modelspec

Conversation

@AlexanderSing
Copy link
Copy Markdown

Summary

  • Adds support for per-model tolerations in modelSpec, allowing individual model deployments to have their own Kubernetes tolerations in addition to or instead of the global servingEngineSpec.tolerations
  • Introduces a tolerationsPolicy field per model spec with two modes: append (default) merges global + model tolerations, and override replaces global tolerations entirely for that model
  • Updates values.schema.json, values.yaml, and values-example.yaml with documentation and schema definitions for the new fields
  • Adds a comprehensive Helm unit test suite (tolerations_test.yaml) covering all policy combinations and edge cases

Motivation

In multi-model deployments, different models may run on different node pools with distinct taints. Previously, tolerations could only be set globally for all models. This change enables operators to schedule models on specific node types (e.g., different GPU tiers) without requiring separate Helm releases.

Behavior

Scenario Result
No model tolerations set Global tolerations applied
Model tolerations + append policy (default) Global + model tolerations merged
Model tolerations + override policy Model tolerations only (global ignored)
Model tolerations, no global tolerations Model tolerations applied
No tolerations anywhere No tolerations field rendered

Signed-off-by: Alexander Sing <AlexanderSing@live.de>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-model tolerations for the vLLM multi-deployment Helm chart, allowing users to either append model-specific tolerations to global ones or override them entirely using a new tolerationsPolicy field. The changes include updates to the deployment template, schema validation, and a comprehensive test suite. Feedback was provided to simplify the template logic for calculating effective tolerations to improve readability and maintainability.

Signed-off-by: Alexander Sing <AlexanderSing@live.de>
Copy link
Copy Markdown
Collaborator

@ruizhang0101 ruizhang0101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The ray deployment (ray-cluster.yaml) is also using tolerations. Would it worth changing as well?
Also, could you add this to helm readme too?

# - nodeName: (optional) Directly assigns a pod to a specific node (e.g., "192.168.56.5"). When both nodeName and nodeSelectorTerms are defined, the preference is given to nodeName.
# - shmSize: (optional, string) The size of the shared memory, e.g., "20Gi"
# - enableLoRA: (optional, bool) Whether to enable LoRA, e.g., true
# - nodeName: (optional) Directly assigns a pod to a specific node (e.g., "192.168.56.5"). When both nodeName and nodeSelectorTerms are defined, the preference is given to nodeName.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention seems a little bit off. Could you align with the others?

},
"containerPort": {
"required": [
"enabled"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention seems a bit off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants