Skip to content

feat: add ServiceMonitor template for external Prometheus#866

Closed
NJX-njx wants to merge 1 commit intovllm-project:mainfrom
NJX-njx:feat/606-servicemonitor-support
Closed

feat: add ServiceMonitor template for external Prometheus#866
NJX-njx wants to merge 1 commit intovllm-project:mainfrom
NJX-njx:feat/606-servicemonitor-support

Conversation

@NJX-njx
Copy link
Copy Markdown

@NJX-njx NJX-njx commented Mar 4, 2026

Summary

Closes #606

Adds a conditional ServiceMonitor Helm template that allows teams with existing Prometheus stacks (e.g., kube-prometheus-stack, EKS Blueprint addons) to scrape vLLM metrics without deploying the bundled observability stack.

Motivation

Currently, vLLM metrics are only scraped when the bundled observability stack is installed. Users with pre-existing Prometheus setups must manually create and apply ServiceMonitor resources.

Configuration

New externalPrometheus section in values.yaml:

externalPrometheus:
  enabled: false
  namespace: "monitoring"
  labels:
    release: "kube-prometheus-stack"
  serviceSelector: {}
  portName: "service-port"
  path: "/metrics"
  interval: "30s"
  scrapeTimeout: ""

Changes

  • helm/templates/servicemonitor.yaml (new): Conditional ServiceMonitor template with namespaceSelector support for cross-namespace monitoring
  • helm/values.yaml: Added externalPrometheus configuration section
  • helm/values.schema.json: Added schema definition for externalPrometheus

Backward Compatibility

Fully backward compatible - disabled by default (enabled: false). Existing deployments are unaffected.

…ct#606)

Add a conditional ServiceMonitor Helm template that allows teams with
existing Prometheus stacks to scrape vLLM metrics without deploying the
bundled observability stack.

New configuration in values.yaml:
- externalPrometheus.enabled: toggle ServiceMonitor creation
- externalPrometheus.namespace: target namespace for the ServiceMonitor
- externalPrometheus.labels: labels for Prometheus serviceMonitorSelector
- externalPrometheus.serviceSelector: custom service selector
- externalPrometheus.portName: service port name to scrape
- externalPrometheus.path: metrics endpoint path
- externalPrometheus.interval: scrape interval
- externalPrometheus.scrapeTimeout: scrape timeout

Files changed:
- helm/templates/servicemonitor.yaml (new)
- helm/values.yaml (added externalPrometheus config)
- helm/values.schema.json (added schema definition)
Copilot AI review requested due to automatic review settings March 4, 2026 10:37
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the vLLM Helm chart by introducing robust support for external Prometheus monitoring. It provides a new ServiceMonitor Helm template and corresponding configuration options, allowing users with existing Prometheus setups to seamlessly integrate vLLM metrics collection. This change significantly improves flexibility and reduces resource duplication for users who manage their own Prometheus instances, streamlining their monitoring infrastructure.

Highlights

  • External Prometheus Integration: Introduced a conditional Helm template for a Kubernetes ServiceMonitor resource, enabling external Prometheus instances to scrape vLLM metrics without deploying the bundled observability stack.
  • Configuration Options: Added a new externalPrometheus section in values.yaml and values.schema.json for detailed configuration of the ServiceMonitor, including namespace, labels, service selector, port, path, interval, and scrape timeout.
  • Backward Compatibility: The new external Prometheus integration feature is disabled by default, ensuring that existing deployments remain unaffected.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • helm/templates/servicemonitor.yaml
    • Added a new Helm template for a Kubernetes ServiceMonitor resource.
  • helm/values.schema.json
    • Added a new externalPrometheus object to the schema, defining its properties and types.
  • helm/values.yaml
    • Introduced the externalPrometheus configuration block with default values and comments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Helm support for creating a ServiceMonitor when teams use an existing Prometheus Operator stack, so vLLM metrics can be scraped without installing the bundled observability components.

Changes:

  • Added new externalPrometheus configuration block in helm/values.yaml
  • Added JSON schema entries for externalPrometheus in helm/values.schema.json
  • Added a conditional helm/templates/servicemonitor.yaml to render a ServiceMonitor when externalPrometheus.enabled is set

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
helm/values.yaml Introduces externalPrometheus configuration knobs (namespace, labels, selector, endpoint settings).
helm/values.schema.json Adds schema definition for externalPrometheus values validation/tooling.
helm/templates/servicemonitor.yaml Implements the conditional ServiceMonitor manifest for external Prometheus Operator setups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{{- else }}
matchLabels:
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: helm
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default spec.selector.matchLabels is broad enough to match the router and cache-server Services (they share app.kubernetes.io/instance and app.kubernetes.io/managed-by). Since those Services don’t expose a service-port port, this ServiceMonitor will create invalid/noisy targets by default. Consider tightening the default selector to only match serving-engine Services (e.g., include app.kubernetes.io/component: serving-engine), and rely on externalPrometheus.serviceSelector for other cases (router/cache).

Suggested change
app.kubernetes.io/managed-by: helm
app.kubernetes.io/managed-by: helm
app.kubernetes.io/component: serving-engine

Copilot uses AI. Check for mistakes.
{{- if .Values.externalPrometheus.namespace }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces new conditional rendering logic for an external ServiceMonitor, but there’s no corresponding Helm unittest coverage. Adding a small test case (enabled/disabled, namespaceSelector behavior, and default selector/portName rendering) would help prevent regressions.

Suggested change
- {{ .Release.Namespace }}
- {{ .Values.externalPrometheus.namespace }}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ServiceMonitor template for integrating with an external Prometheus, which is a valuable addition. The implementation is well-structured, providing flexibility through values.yaml. I've made a few suggestions in helm/templates/servicemonitor.yaml to improve the template's conciseness and maintainability by using common Helm templating patterns. Overall, the changes are solid.

Comment on lines +1 to +2
{{- if .Values.externalPrometheus }}
{{- if .Values.externalPrometheus.enabled }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and to make the template more concise, you can combine these two if statements into a single one using the and function. This is a common practice in Helm templates. You would also need to remove one of the {{- end }} statements at the end of the file (e.g., line 45).

{{- if and .Values.externalPrometheus .Values.externalPrometheus.enabled }}

Comment on lines +7 to +11
{{- if .Values.externalPrometheus.namespace }}
namespace: {{ .Values.externalPrometheus.namespace }}
{{- else }}
namespace: {{ .Release.Namespace }}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if-else block for setting the namespace can be simplified using the default function. This makes the template more readable and concise.

  namespace: {{ .Values.externalPrometheus.namespace | default .Release.Namespace }}

Comment on lines +35 to +39
{{- if .Values.externalPrometheus.path }}
path: {{ .Values.externalPrometheus.path }}
{{- else }}
path: /metrics
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if-else block can be simplified using the default function, which is consistent with how portName and interval are handled in this template. This improves readability and reduces verbosity.

      path: {{ .Values.externalPrometheus.path | default "/metrics" }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Support ServiceMonitors for External Prometheus stack

3 participants