feat: add ServiceMonitor template for external Prometheus#866
feat: add ServiceMonitor template for external Prometheus#866NJX-njx wants to merge 1 commit intovllm-project:mainfrom
Conversation
…ct#606) Add a conditional ServiceMonitor Helm template that allows teams with existing Prometheus stacks to scrape vLLM metrics without deploying the bundled observability stack. New configuration in values.yaml: - externalPrometheus.enabled: toggle ServiceMonitor creation - externalPrometheus.namespace: target namespace for the ServiceMonitor - externalPrometheus.labels: labels for Prometheus serviceMonitorSelector - externalPrometheus.serviceSelector: custom service selector - externalPrometheus.portName: service port name to scrape - externalPrometheus.path: metrics endpoint path - externalPrometheus.interval: scrape interval - externalPrometheus.scrapeTimeout: scrape timeout Files changed: - helm/templates/servicemonitor.yaml (new) - helm/values.yaml (added externalPrometheus config) - helm/values.schema.json (added schema definition)
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the vLLM Helm chart by introducing robust support for external Prometheus monitoring. It provides a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Adds Helm support for creating a ServiceMonitor when teams use an existing Prometheus Operator stack, so vLLM metrics can be scraped without installing the bundled observability components.
Changes:
- Added new
externalPrometheusconfiguration block inhelm/values.yaml - Added JSON schema entries for
externalPrometheusinhelm/values.schema.json - Added a conditional
helm/templates/servicemonitor.yamlto render aServiceMonitorwhenexternalPrometheus.enabledis set
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| helm/values.yaml | Introduces externalPrometheus configuration knobs (namespace, labels, selector, endpoint settings). |
| helm/values.schema.json | Adds schema definition for externalPrometheus values validation/tooling. |
| helm/templates/servicemonitor.yaml | Implements the conditional ServiceMonitor manifest for external Prometheus Operator setups. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| {{- else }} | ||
| matchLabels: | ||
| app.kubernetes.io/instance: {{ .Release.Name }} | ||
| app.kubernetes.io/managed-by: helm |
There was a problem hiding this comment.
The default spec.selector.matchLabels is broad enough to match the router and cache-server Services (they share app.kubernetes.io/instance and app.kubernetes.io/managed-by). Since those Services don’t expose a service-port port, this ServiceMonitor will create invalid/noisy targets by default. Consider tightening the default selector to only match serving-engine Services (e.g., include app.kubernetes.io/component: serving-engine), and rely on externalPrometheus.serviceSelector for other cases (router/cache).
| app.kubernetes.io/managed-by: helm | |
| app.kubernetes.io/managed-by: helm | |
| app.kubernetes.io/component: serving-engine |
| {{- if .Values.externalPrometheus.namespace }} | ||
| namespaceSelector: | ||
| matchNames: | ||
| - {{ .Release.Namespace }} |
There was a problem hiding this comment.
This introduces new conditional rendering logic for an external ServiceMonitor, but there’s no corresponding Helm unittest coverage. Adding a small test case (enabled/disabled, namespaceSelector behavior, and default selector/portName rendering) would help prevent regressions.
| - {{ .Release.Namespace }} | |
| - {{ .Values.externalPrometheus.namespace }} |
There was a problem hiding this comment.
Code Review
This pull request introduces a ServiceMonitor template for integrating with an external Prometheus, which is a valuable addition. The implementation is well-structured, providing flexibility through values.yaml. I've made a few suggestions in helm/templates/servicemonitor.yaml to improve the template's conciseness and maintainability by using common Helm templating patterns. Overall, the changes are solid.
| {{- if .Values.externalPrometheus }} | ||
| {{- if .Values.externalPrometheus.enabled }} |
There was a problem hiding this comment.
For better readability and to make the template more concise, you can combine these two if statements into a single one using the and function. This is a common practice in Helm templates. You would also need to remove one of the {{- end }} statements at the end of the file (e.g., line 45).
{{- if and .Values.externalPrometheus .Values.externalPrometheus.enabled }}| {{- if .Values.externalPrometheus.namespace }} | ||
| namespace: {{ .Values.externalPrometheus.namespace }} | ||
| {{- else }} | ||
| namespace: {{ .Release.Namespace }} | ||
| {{- end }} |
| {{- if .Values.externalPrometheus.path }} | ||
| path: {{ .Values.externalPrometheus.path }} | ||
| {{- else }} | ||
| path: /metrics | ||
| {{- end }} |
There was a problem hiding this comment.
Summary
Closes #606
Adds a conditional
ServiceMonitorHelm template that allows teams with existing Prometheus stacks (e.g., kube-prometheus-stack, EKS Blueprint addons) to scrape vLLM metrics without deploying the bundled observability stack.Motivation
Currently, vLLM metrics are only scraped when the bundled observability stack is installed. Users with pre-existing Prometheus setups must manually create and apply ServiceMonitor resources.
Configuration
New
externalPrometheussection invalues.yaml:Changes
helm/templates/servicemonitor.yaml(new): Conditional ServiceMonitor template with namespaceSelector support for cross-namespace monitoringhelm/values.yaml: AddedexternalPrometheusconfiguration sectionhelm/values.schema.json: Added schema definition forexternalPrometheusBackward Compatibility
Fully backward compatible - disabled by default (
enabled: false). Existing deployments are unaffected.