[Feat] Helm: add support for per-model tolerations by AlexanderSing · Pull Request #897 · vllm-project/production-stack

AlexanderSing · 2026-03-26T17:16:34Z

Summary

Adds support for per-model tolerations in modelSpec, allowing individual model deployments to have their own Kubernetes tolerations in addition to or instead of the global servingEngineSpec.tolerations
Introduces a tolerationsPolicy field per model spec with two modes: append (default) merges global + model tolerations, and override replaces global tolerations entirely for that model
Updates values.schema.json, values.yaml, and values-example.yaml with documentation and schema definitions for the new fields
Adds a comprehensive Helm unit test suite (tolerations_test.yaml) covering all policy combinations and edge cases

Motivation

In multi-model deployments, different models may run on different node pools with distinct taints. Previously, tolerations could only be set globally for all models. This change enables operators to schedule models on specific node types (e.g., different GPU tiers) without requiring separate Helm releases.

Behavior

Scenario	Result
No model tolerations set	Global tolerations applied
Model tolerations + `append` policy (default)	Global + model tolerations merged
Model tolerations + `override` policy	Model tolerations only (global ignored)
Model tolerations, no global tolerations	Model tolerations applied
No tolerations anywhere	No `tolerations` field rendered

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

gemini-code-assist

Code Review

This pull request introduces per-model tolerations for the vLLM multi-deployment Helm chart, allowing users to either append model-specific tolerations to global ones or override them entirely using a new tolerationsPolicy field. The changes include updates to the deployment template, schema validation, and a comprehensive test suite. Feedback was provided to simplify the template logic for calculating effective tolerations to improve readability and maintainability.

helm/templates/deployment-vllm-multi.yaml

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

ruizhang0101

Thanks for the PR! The ray deployment (ray-cluster.yaml) is also using tolerations. Would it worth changing as well?
Also, could you add this to helm readme too?

ruizhang0101 · 2026-03-31T17:37:32Z

helm/values.yaml

-  # - nodeName: (optional) Directly assigns a pod to a specific node (e.g., "192.168.56.5"). When both nodeName and nodeSelectorTerms are defined, the preference is given to nodeName.
-  # - shmSize: (optional, string) The size of the shared memory, e.g., "20Gi"
-  # - enableLoRA: (optional, bool) Whether to enable LoRA, e.g., true
+   # - nodeName: (optional) Directly assigns a pod to a specific node (e.g., "192.168.56.5"). When both nodeName and nodeSelectorTerms are defined, the preference is given to nodeName.


The indention seems a little bit off. Could you align with the others?

ruizhang0101 · 2026-03-31T17:38:52Z

helm/values.schema.json

-        },
-        "containerPort": {
+                   "required": [
+                   "enabled"


The indention seems a bit off.

feat(helm) add support for per-model tolerations

2e49184

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

helm/templates/deployment-vllm-multi.yaml Outdated Show resolved Hide resolved

fix(helm) simplify if/else branching

936aa7b

Signed-off-by: Alexander Sing <AlexanderSing@live.de>

ruizhang0101 requested changes Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Helm: add support for per-model tolerations#897

[Feat] Helm: add support for per-model tolerations#897
AlexanderSing wants to merge 2 commits intovllm-project:mainfrom
AlexanderSing:feat/make-tolerations-configurable-per-modelspec

AlexanderSing commented Mar 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ruizhang0101 left a comment

Uh oh!

ruizhang0101 Mar 31, 2026

Uh oh!

ruizhang0101 Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexanderSing commented Mar 26, 2026

Summary

Motivation

Behavior

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ruizhang0101 left a comment

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

ruizhang0101 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants