Add LiteLLM proxy config for Venice API in Cursor

Jon-edge · Jon-edge · commit 4c5dc46bf71b · 2026-02-19T12:47:45.000-08:00
Cursor over-allocates max_tokens for models with 1M context windows
(e.g. claude-opus-4-6), causing Venice to reject requests. This adds
a LiteLLM proxy config that clamps output tokens to safe limits.
diff --git a/scripts/venice-litellm/README.md b/scripts/venice-litellm/README.md
@@ -0,0 +1,88 @@
+# Venice API + Cursor IDE via LiteLLM Proxy
+
+Venice models with 1M token context windows (e.g. `claude-opus-4-6`, `claude-sonnet-4-6`) fail in Cursor because Cursor derives `max_tokens` from the context window and sends values that exceed Venice's output token limits.
+
+Models with ≤200k context (e.g. `claude-opus-45`) work without a proxy.
+
+LiteLLM sits between Cursor and Venice, clamping `max_tokens` to safe values.
+
+## Prerequisites
+
+- `VENICE_API_KEY` exported in your shell (e.g. in `~/.zshrc`)
+- Python 3.10+
+
+## Setup
+
+```bash
+pip install 'litellm[proxy]'
+pip install python-multipart
+```
+
+## Start the proxy
+
+```bash
+litellm --config ~/git/edge-conventions/scripts/venice-litellm/litellm-config.yaml --port 8765
+```
+
+The config reads your Venice API key from the `VENICE_API_KEY` environment variable via `os.environ/VENICE_API_KEY`, so LiteLLM handles all authentication to Venice directly.
+
+## Configure Cursor
+
+1. Open **Settings > Models**
+2. Add custom models by name: `claude-opus-4-6`, `openai-gpt-52`, etc.
+3. Set **Override OpenAI Base URL** to `http://localhost:8765`
+4. **OpenAI API Key** can be left disabled — LiteLLM already has the Venice key from your environment. If Cursor requires a value, enter any dummy string (e.g. `sk-dummy`).
+
+## Models
+
+All Venice text models are included. Only the 1M-context models need `model_info.max_tokens` to prevent Cursor from over-allocating. The rest pass through unmodified.
+
+### Clamped (1M context — broken without proxy)
+
+| Model | max_tokens | Context |
+|-------|------------|---------|
+| `claude-opus-4-6` | 8192 | 1M |
+| `claude-sonnet-4-6` | 8192 | 1M |
+| `gemini-3-1-pro-preview` | 8192 | 1M |
+
+### Pass-through (≤256k context — work without proxy)
+
+| Model | Context | Notes |
+|-------|---------|-------|
+| `claude-opus-45` | 198k | |
+| `claude-sonnet-45` | 198k | |
+| `openai-gpt-52` | 256k | |
+| `openai-gpt-52-codex` | 256k | Optimized for code |
+| `openai-gpt-oss-120b` | 128k | Open-weight MoE |
+| `grok-41-fast` | 256k | |
+| `grok-code-fast-1` | 256k | Optimized for code |
+| `gemini-3-pro-preview` | 198k | |
+| `gemini-3-flash-preview` | 256k | |
+| `deepseek-v3.2` | 160k | |
+| `kimi-k2-thinking` | 256k | |
+| `kimi-k2-5` | 256k | |
+| `minimax-m21` | 198k | Optimized for code |
+| `minimax-m25` | 198k | Optimized for code |
+| `zai-org-glm-5` | 198k | |
+| `zai-org-glm-4.7` | 198k | |
+| `qwen3-coder-480b-a35b-instruct` | 256k | Optimized for code |
+| `qwen3-235b-a22b-thinking-2507` | 128k | |
+| `qwen3-235b-a22b-instruct-2507` | 128k | |
+| `qwen3-vl-235b-a22b` | 256k | Vision-language |
+| `llama-3.3-70b` | 128k | |
+| `hermes-3-llama-3.1-405b` | 128k | |
+| `google-gemma-3-27b-it` | 198k | Vision |
+
+## Adding models
+
+Edit `litellm-config.yaml` following the existing pattern. Use Venice model IDs from their [models endpoint](https://docs.venice.ai/api-reference/endpoint/models/list). Only add `model_info.max_tokens` for models with context windows >256k.
+
+## Why not use Venice directly?
+
+Venice advertises `availableContextTokens: 1000000` for newer Claude/Gemini models. Cursor uses this to budget `max_tokens`, often requesting 200k+ output tokens. Venice rejects these with:
+
+```
+max_tokens: 232001 > 128000, which is the maximum allowed number of output tokens for claude-opus-4-6
+```
+
+The proxy intercepts this by setting `model_info.max_tokens` per model, which LiteLLM uses to constrain requests.
diff --git a/scripts/venice-litellm/litellm-config.yaml b/scripts/venice-litellm/litellm-config.yaml
@@ -0,0 +1,169 @@
+model_list:
+  # --- 1M context models (NEED clamping) ---
+
+  - model_name: claude-opus-4-6
+    litellm_params:
+      model: openai/claude-opus-4-6
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+    model_info:
+      max_tokens: 8192
+
+  - model_name: claude-sonnet-4-6
+    litellm_params:
+      model: openai/claude-sonnet-4-6
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+    model_info:
+      max_tokens: 8192
+
+  - model_name: gemini-3-1-pro-preview
+    litellm_params:
+      model: openai/gemini-3-1-pro-preview
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+    model_info:
+      max_tokens: 8192
+
+  # --- ≤256k context models (pass-through, no clamping needed) ---
+
+  - model_name: claude-opus-45
+    litellm_params:
+      model: openai/claude-opus-45
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: claude-sonnet-45
+    litellm_params:
+      model: openai/claude-sonnet-45
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: openai-gpt-52
+    litellm_params:
+      model: openai/openai-gpt-52
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: openai-gpt-52-codex
+    litellm_params:
+      model: openai/openai-gpt-52-codex
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: openai-gpt-oss-120b
+    litellm_params:
+      model: openai/openai-gpt-oss-120b
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: grok-41-fast
+    litellm_params:
+      model: openai/grok-41-fast
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: grok-code-fast-1
+    litellm_params:
+      model: openai/grok-code-fast-1
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: gemini-3-pro-preview
+    litellm_params:
+      model: openai/gemini-3-pro-preview
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: gemini-3-flash-preview
+    litellm_params:
+      model: openai/gemini-3-flash-preview
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: deepseek-v3.2
+    litellm_params:
+      model: openai/deepseek-v3.2
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: kimi-k2-thinking
+    litellm_params:
+      model: openai/kimi-k2-thinking
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: kimi-k2-5
+    litellm_params:
+      model: openai/kimi-k2-5
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: minimax-m21
+    litellm_params:
+      model: openai/minimax-m21
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: minimax-m25
+    litellm_params:
+      model: openai/minimax-m25
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: zai-org-glm-5
+    litellm_params:
+      model: openai/zai-org-glm-5
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: zai-org-glm-4.7
+    litellm_params:
+      model: openai/zai-org-glm-4.7
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: qwen3-coder-480b-a35b-instruct
+    litellm_params:
+      model: openai/qwen3-coder-480b-a35b-instruct
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: qwen3-235b-a22b-thinking-2507
+    litellm_params:
+      model: openai/qwen3-235b-a22b-thinking-2507
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: qwen3-235b-a22b-instruct-2507
+    litellm_params:
+      model: openai/qwen3-235b-a22b-instruct-2507
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: qwen3-vl-235b-a22b
+    litellm_params:
+      model: openai/qwen3-vl-235b-a22b
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: llama-3.3-70b
+    litellm_params:
+      model: openai/llama-3.3-70b
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: hermes-3-llama-3.1-405b
+    litellm_params:
+      model: openai/hermes-3-llama-3.1-405b
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+  - model_name: google-gemma-3-27b-it
+    litellm_params:
+      model: openai/google-gemma-3-27b-it
+      api_base: https://api.venice.ai/api/v1
+      api_key: os.environ/VENICE_API_KEY
+
+router_settings:
+  enable_pre_call_checks: true