Skip to content

Conversation

@amirejaz
Copy link
Contributor

@amirejaz amirejaz commented Nov 6, 2025

Summary

This PR implements a unified workload management system that provides a consistent interface for managing MCP server workloads across both CLI (Docker/Podman) and Kubernetes environments. This follows the same architectural pattern established by groups.Manager and enables platform-agnostic workload operations throughout ToolHive.

Motivation

Following the successful unification of group management, we now extend this pattern to workload management. This enables:

  • Consistent API: Same interface for workload operations regardless of runtime
  • Unified Discovery: Enables vmcp aggregator to discover backends from both CLI and Kubernetes workloads
  • Code Reusability: Platform-agnostic code can work with workloads without runtime-specific logic
  • Future-Proof: Easier to add new runtime environments or extend functionality

Implementation

Unified Manager Interface

The workloads.Manager interface provides a comprehensive set of operations for managing workloads:

Lifecycle Operations:

  • RunWorkload / RunWorkloadDetached - Start workloads in foreground or background
  • StopWorkloads - Stop running workloads
  • DeleteWorkloads - Remove workloads
  • RestartWorkloads - Restart workloads
  • UpdateWorkload - Update workload configuration

Query Operations:

  • GetWorkload - Retrieve workload details and status
  • ListWorkloads - List all workloads with optional filtering
  • ListWorkloadsInGroup - List workloads in a specific group
  • DoesWorkloadExist - Check workload existence

Utility Operations:

  • GetLogs / GetProxyLogs - Retrieve workload logs
  • MoveToGroup - Move workloads between groups

Platform-Specific Implementations

CLI Manager (cliManager)

  • Manages Docker/Podman containers
  • Uses filesystem-based storage (runconfig.json)
  • Supports full lifecycle operations
  • Handles container networking, secrets, and environment variables

Kubernetes Manager (k8sManager)

  • Manages MCPServer CRDs via Kubernetes API
  • Provides read operations and group management
  • Integrates with ToolHive operator for lifecycle management
  • Maps MCPServer CRDs to workload representation

Automatic Runtime Detection

The NewManager() factory function automatically detects the runtime environment:

  • Kubernetes mode: Returns k8sManager when TOOLHIVE_RUNTIME=kubernetes or running in a pod
  • CLI mode: Returns cliManager for Docker/Podman environments

Key Features

Group Integration

  • Workloads can be assigned to groups at creation time
  • ListWorkloadsInGroup enables group-based discovery
  • MoveToGroup allows reorganizing workloads
  • Seamless integration with groups.Manager

Unified Backend Discovery

  • vmcp aggregator can now discover backends from both CLI and Kubernetes workloads
  • Single BackendDiscoverer implementation works across platforms
  • Automatic health status mapping from workload status

Comprehensive Testing

  • Full unit test coverage for both implementations
  • Table-driven tests for all operations
  • Mock-based testing for isolation
  • Edge case and error handling coverage

Files Added

  • pkg/workloads/cli_manager.go - CLI implementation (1205 lines)
  • pkg/workloads/cli_manager_test.go - CLI tests (1616 lines)
  • pkg/workloads/k8s_manager.go - Kubernetes implementation (351 lines)
  • pkg/workloads/k8s_manager_test.go - Kubernetes tests (777 lines)

Files Modified

  • pkg/workloads/manager.go - Simplified to factory functions and interface definition
  • pkg/workloads/manager_test.go - Reduced to factory function tests
  • pkg/vmcp/aggregator/discoverer.go - Updated to use unified manager
  • cmd/vmcp/app/commands.go - Updated to use unified discoverer

Benefits

  1. Consistency: Same API for workload operations across all environments
  2. Maintainability: Clear separation between platform-specific and shared logic
  3. Extensibility: Easy to add new runtime environments or operations
  4. Testability: Each implementation can be tested independently
  5. Integration: Enables unified features like vmcp backend discovery

Testing

  • All unit tests pass
  • Linting passes
  • Verified CLI workload operations (run, stop, delete, restart, logs)
  • Verified Kubernetes MCPServer operations (get, list, group operations)
  • Tested vmcp discovery with Kubernetes workloads
  • Verified group integration (ListWorkloadsInGroup, MoveToGroup)

Related

  • Follows the architectural pattern from groups.Manager unification
  • Enables unified backend discovery in vmcp aggregator (see pkg/vmcp/aggregator/discoverer.go)
  • Prepares foundation for future workload management features
  • Integrates with ToolHive operator for Kubernetes workload lifecycle

Example Usage

// Automatically selects the right implementation based on runtime
manager, err := workloads.NewManager(ctx)
if err != nil {
return err
}

// Works the same way in CLI and Kubernetes
workloads, err := manager.ListWorkloadsInGroup(ctx, "engineering-team")
if err != nil {
return err
}

// Discover backends from workloads (used by vmcp)
discoverer := aggregator.NewBackendDiscoverer(manager, groupsManager, authConfig)
backends, err := discoverer.Discover(ctx, "engineering-team")

@amirejaz amirejaz marked this pull request as draft November 6, 2025 16:45
@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 58.65385% with 344 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.25%. Comparing base (a5a0621) to head (480db30).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloads/cli_manager.go 47.85% 231 Missing and 61 partials ⚠️
pkg/workloads/manager.go 15.38% 20 Missing and 2 partials ⚠️
pkg/workloads/k8s_manager.go 90.75% 10 Missing and 6 partials ⚠️
pkg/vmcp/aggregator/discoverer.go 85.50% 8 Missing and 2 partials ⚠️
cmd/vmcp/app/commands.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2487      +/-   ##
==========================================
+ Coverage   55.01%   55.25%   +0.23%     
==========================================
  Files         292      294       +2     
  Lines       27904    28108     +204     
==========================================
+ Hits        15351    15530     +179     
- Misses      11144    11170      +26     
+ Partials     1409     1408       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@dmjb dmjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this as "request changes" until we figure out how to split up this interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants