-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Feature Request
Description
Add support for multimodal content (images/vision) when proxying requests between Claude API format and OpenAI-compatible APIs.
Current Behavior
Currently, the converter in internal/converter/converter.go only handles:
- Text content (string messages)
- Tool calls (tool_use, tool_result)
- Thinking blocks
Image content blocks are not converted, which means multimodal requests from Claude Code cannot be forwarded to backend providers that support vision capabilities.
Desired Behavior
Support converting Claude's image content format to OpenAI's image_url format:
Claude format (input):
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "<base64_data>"
}
}OpenAI format (output):
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,<base64_data>"
}
}Use Case
Many users want to use Claude Code with vision-capable models like:
- GPT-4o / GPT-4 Vision (via OpenAI Direct)
- Gemini Pro Vision (via OpenRouter)
- LLaVA, Qwen-VL (via Ollama)
Without multimodal support, users cannot leverage these vision capabilities through the proxy.
Reference
The similar project takltc/claude-code-chutes-proxy has implemented this feature:
"Images and multimodal: request-side user/system image blocks are translated to OpenAI image_url content entries"
Additional Context
This would significantly expand the proxy's utility for workflows involving:
- Screenshot analysis
- Diagram/architecture review
- UI/UX feedback
- Document image processing
Thank you for this great project! 🙏