Centralized model broker and job runner for local and remote models. It exposes one service on port 5001 with two execution styles:
- synchronous chat broker via
/api/chat/completions - asynchronous queued jobs via
/api/submit
- Canonical routed model IDs such as
ollama/qwen3:8b,openrouter/z-ai/glm-5.1, andopenrouter/anthropic/claude-sonnet-4.6 - Local Ollama text and vision models discovered from
/api/tags - OpenRouter-backed remote models, including Anthropic and GLM routes
- Z.AI-backed remote text/code/tool routes via the coding endpoint
- Systems history telemetry for brokered chat requests and queued job execution
- Ensure Ollama is running on
localhost:11434for local models. - Add any remote models you want to expose in
config.local.yaml. - Start the service:
systemctl --user start model-manager- Call the broker:
curl -X POST http://localhost:5001/api/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "zai/glm-5",
"messages": [{"role": "user", "content": "Reply with exactly OK"}]
}'- Or submit an async job:
curl -X POST http://localhost:5001/api/submit \
-H "Content-Type: application/json" \
-d '{"model": "ollama/qwen3:8b", "prompt": "Hello world"}'- Poll the job:
curl http://localhost:5001/api/job/<job_id>Create config.local.yaml for site-specific settings. Example routed model setup:
REMOTE_MODELS:
- name: "openrouter/anthropic/claude-sonnet-4.6"
provider: "openrouter"
upstream_model: "anthropic/claude-sonnet-4.6"
capabilities: ["text"]
family: "claude"
parameter_size: "remote"
metadata:
label: "Claude Sonnet 4.6 via OpenRouter"
- name: "zai/glm-5"
provider: "zai"
upstream_model: "glm-5"
capabilities: ["text", "code", "tools"]
family: "glm"
parameter_size: "remote"
metadata:
label: "GLM-5 via Z.AI coding endpoint"
- name: "openrouter/z-ai/glm-5"
provider: "openrouter"
upstream_model: "z-ai/glm-5"
capabilities: ["text", "code", "tools"]
family: "glm"
parameter_size: "remote"
metadata:
label: "GLM-5 via OpenRouter"
- name: "openrouter/z-ai/glm-5.1"
provider: "openrouter"
upstream_model: "z-ai/glm-5.1"
capabilities: ["text", "code", "tools"]
family: "glm"
parameter_size: "remote"
metadata:
label: "GLM-5.1 via OpenRouter"
OPENROUTER_API_KEYS_JSON: "$SYSTEMS_ROOT/ai/modular-flow-engine/config/api_keys.json"
OPENROUTER_HTTP_REFERER: "https://localhost/model-manager"
OPENROUTER_X_TITLE: "model-manager"
OPENROUTER_TIMEOUT_SECONDS: 180OpenRouter credentials can come from one of:
OPENROUTER_API_KEYinconfig.local.yamlOPENROUTER_API_KEYS_JSONpointing at a JSON file with anopenrouterkeyOPENROUTER_API_KEY_FILEpointing at a plain-text key fileOPENROUTER_API_KEYin the environment
Z.AI credentials can come from one of:
ZAI_API_KEYinconfig.local.yamlZAI_API_KEYS_JSONpointing at a JSON file with azaiorglmkeyZAI_API_KEY_FILEpointing at a plain-text key fileZAI_API_KEYin the environment
The default bind host is 127.0.0.1. Override HTTP_HOST in config.local.yaml only if you intentionally want network exposure.
| Endpoint | Method | Description |
|---|---|---|
/api/chat/completions |
POST | Execute synchronous brokered chat completion |
/api/submit |
POST | Submit inference job |
/api/job/<job_id> |
GET | Get job status/result |
/api/models |
GET | List available models across providers |
/api/models/refresh |
POST | Refresh model list |
/api/stats |
GET | Queue and resource statistics |
/api/health |
GET | Health check |
curl -X POST http://localhost:5001/api/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen3:8b",
"messages": [{"role": "user", "content": "Reply with exactly OK"}]
}'curl -X POST http://localhost:5001/api/submit \
-H "Content-Type: application/json" \
-d '{
"model": "openrouter/anthropic/claude-sonnet-4.6",
"prompt": "Answer in one line.",
"system_prompt": "Be concise.",
"priority": "high",
"metadata": {"source": "manual-test"}
}'curl http://localhost:5001/api/modelsLocal models are exposed as canonical ollama/... routes. Remote configured models are exposed exactly as declared in REMOTE_MODELS. Use upstream_model to define the actual provider model ID behind each route.
The ./cli tool provides commands for batch queries and vision analysis. It communicates with the running Model Manager service.
./cli batch-query items.txt "Is {item} a tool?"
./cli analyze photo.png
./cli quick photo.png "What color is the car?"
./cli count photo.png "people"
./cli --version
./cli --print-defaults
./cli --print-resolved
./cli --print-config-schema
./cli --validate-config- HTTP API — synchronous broker plus async job queue on port
5001 - Queue Manager — priority-based storage and batching
- VRAM Scheduler — local-model load decisions based on GPU memory
- Execution Engine — provider-aware inference execution
- Model Registry — local discovery plus configured remote models
- Resource Monitor — local GPU state via
nvidia-smiand Ollama/api/ps
Remote models do not participate in local VRAM load/unload decisions. Exact routed broker requests are executed immediately. The async job API still uses the queue and job tracking path.
cd "$SYSTEMS_ROOT/ai/model-manager"
PYTHONPATH=src python3 -m unittest discover -s tests
python3 -m compileall src testsMIT License. See LICENSE.