Model Manager

Centralized model broker and job runner for local and remote models. It exposes one service on port 5001 with two execution styles:

synchronous chat broker via /api/chat/completions
asynchronous queued jobs via /api/submit

What It Supports

Canonical routed model IDs such as ollama/qwen3:8b, openrouter/z-ai/glm-5.1, and openrouter/anthropic/claude-sonnet-4.6
Local Ollama text and vision models discovered from /api/tags
OpenRouter-backed remote models, including Anthropic and GLM routes
Z.AI-backed remote text/code/tool routes via the coding endpoint
Systems history telemetry for brokered chat requests and queued job execution

Quick Start

Ensure Ollama is running on localhost:11434 for local models.
Add any remote models you want to expose in config.local.yaml.
Start the service:

systemctl --user start model-manager

Call the broker:

curl -X POST http://localhost:5001/api/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai/glm-5",
    "messages": [{"role": "user", "content": "Reply with exactly OK"}]
  }'

Or submit an async job:

curl -X POST http://localhost:5001/api/submit \
  -H "Content-Type: application/json" \
  -d '{"model": "ollama/qwen3:8b", "prompt": "Hello world"}'

Poll the job:

curl http://localhost:5001/api/job/<job_id>

Remote Model Configuration

Create config.local.yaml for site-specific settings. Example routed model setup:

REMOTE_MODELS:
  - name: "openrouter/anthropic/claude-sonnet-4.6"
    provider: "openrouter"
    upstream_model: "anthropic/claude-sonnet-4.6"
    capabilities: ["text"]
    family: "claude"
    parameter_size: "remote"
    metadata:
      label: "Claude Sonnet 4.6 via OpenRouter"
  - name: "zai/glm-5"
    provider: "zai"
    upstream_model: "glm-5"
    capabilities: ["text", "code", "tools"]
    family: "glm"
    parameter_size: "remote"
    metadata:
      label: "GLM-5 via Z.AI coding endpoint"
  - name: "openrouter/z-ai/glm-5"
    provider: "openrouter"
    upstream_model: "z-ai/glm-5"
    capabilities: ["text", "code", "tools"]
    family: "glm"
    parameter_size: "remote"
    metadata:
      label: "GLM-5 via OpenRouter"
  - name: "openrouter/z-ai/glm-5.1"
    provider: "openrouter"
    upstream_model: "z-ai/glm-5.1"
    capabilities: ["text", "code", "tools"]
    family: "glm"
    parameter_size: "remote"
    metadata:
      label: "GLM-5.1 via OpenRouter"

OPENROUTER_API_KEYS_JSON: "$SYSTEMS_ROOT/ai/modular-flow-engine/config/api_keys.json"
OPENROUTER_HTTP_REFERER: "https://localhost/model-manager"
OPENROUTER_X_TITLE: "model-manager"
OPENROUTER_TIMEOUT_SECONDS: 180

OpenRouter credentials can come from one of:

OPENROUTER_API_KEY in config.local.yaml
OPENROUTER_API_KEYS_JSON pointing at a JSON file with an openrouter key
OPENROUTER_API_KEY_FILE pointing at a plain-text key file
OPENROUTER_API_KEY in the environment

Z.AI credentials can come from one of:

ZAI_API_KEY in config.local.yaml
ZAI_API_KEYS_JSON pointing at a JSON file with a zai or glm key
ZAI_API_KEY_FILE pointing at a plain-text key file
ZAI_API_KEY in the environment

The default bind host is 127.0.0.1. Override HTTP_HOST in config.local.yaml only if you intentionally want network exposure.

HTTP API

Endpoint	Method	Description
`/api/chat/completions`	POST	Execute synchronous brokered chat completion
`/api/submit`	POST	Submit inference job
`/api/job/<job_id>`	GET	Get job status/result
`/api/models`	GET	List available models across providers
`/api/models/refresh`	POST	Refresh model list
`/api/stats`	GET	Queue and resource statistics
`/api/health`	GET	Health check

Broker Chat

curl -X POST http://localhost:5001/api/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/qwen3:8b",
    "messages": [{"role": "user", "content": "Reply with exactly OK"}]
  }'

Submit Job

curl -X POST http://localhost:5001/api/submit \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter/anthropic/claude-sonnet-4.6",
    "prompt": "Answer in one line.",
    "system_prompt": "Be concise.",
    "priority": "high",
    "metadata": {"source": "manual-test"}
  }'

List Models

curl http://localhost:5001/api/models

Local models are exposed as canonical ollama/... routes. Remote configured models are exposed exactly as declared in REMOTE_MODELS. Use upstream_model to define the actual provider model ID behind each route.

CLI Tool

The ./cli tool provides commands for batch queries and vision analysis. It communicates with the running Model Manager service.

./cli batch-query items.txt "Is {item} a tool?"
./cli analyze photo.png
./cli quick photo.png "What color is the car?"
./cli count photo.png "people"
./cli --version
./cli --print-defaults
./cli --print-resolved
./cli --print-config-schema
./cli --validate-config

Architecture

HTTP API — synchronous broker plus async job queue on port 5001
Queue Manager — priority-based storage and batching
VRAM Scheduler — local-model load decisions based on GPU memory
Execution Engine — provider-aware inference execution
Model Registry — local discovery plus configured remote models
Resource Monitor — local GPU state via nvidia-smi and Ollama /api/ps

Remote models do not participate in local VRAM load/unload decisions. Exact routed broker requests are executed immediately. The async job API still uses the queue and job tracking path.

Development

cd "$SYSTEMS_ROOT/ai/model-manager"
PYTHONPATH=src python3 -m unittest discover -s tests
python3 -m compileall src tests

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
pkg/systemd		pkg/systemd
src/model_manager		src/model_manager
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
MONITORING.md		MONITORING.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
config.example.yaml		config.example.yaml
monitor.sh		monitor.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Manager

What It Supports

Quick Start

Remote Model Configuration

HTTP API

Broker Chat

Submit Job

List Models

CLI Tool

Architecture

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model Manager

What It Supports

Quick Start

Remote Model Configuration

HTTP API

Broker Chat

Submit Job

List Models

CLI Tool

Architecture

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages