feat: Adaptive per-model capability detection#7
Conversation
OpenWebUI/LiteLLM has a bug where it converts max_completion_tokens to max_tokens before forwarding to Azure, causing GPT-5 to fail. Workaround: Don't send any max tokens parameter for Unknown providers (like OpenWebUI) when using reasoning models. The backend will use its default token limits instead. Changes: - Modified token parameter logic in converter.go to skip max tokens for Unknown providers with reasoning models - Added comprehensive documentation in docs/OPENWEBUI-GPT5-FIX.md Tested with: - OpenWebUI GPT-5 (gpt-5-2025-08-07) ✅ - OpenWebUI GPT-4.1 (gpt-4.1-2025-04-14) ✅ 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Replace hardcoded model pattern matching with fully adaptive per-model
detection that automatically learns which parameters each (provider, model)
combination supports. This eliminates special-casing and works with any
current or future OpenAI-compatible provider without code changes.
**Key Changes:**
- Removed ~100 lines of hardcoded model patterns (IsReasoningModel,
FetchReasoningModels, ReasoningModelCache)
- Implemented per-model capability caching with CacheKey{BaseURL, Model}
- Added broad keyword-based error detection (no status code restrictions)
- Thread-safe in-memory cache with sync.RWMutex
- Debug logging shows cache hits/misses with -d flag
**Architecture:**
1. First request: Try max_completion_tokens → Error → Retry → Cache result
2. Subsequent requests: Use cached value immediately (instant)
3. Works with OpenWebUI, OpenRouter, OpenAI Direct, Ollama, and any
OpenAI-compatible provider
**Benefits:**
- Zero user configuration required
- Future-proof - no hardcoded model names
- Per-model granularity (same model on different providers cached separately)
- Auto-adapts to provider quirks (OpenWebUI, misconfigured providers)
**Documentation:**
- Added "Adaptive Per-Model Detection" section to README.md
- Updated CLAUDE.md with comprehensive technical details
- CHANGELOG.md updated for v1.3.0 release
- Cleaned up docs/ folder (removed planning artifacts)
**Tested with:**
- OpenWebUI GPT-5 (gpt-5-2025-08-07) ✓
- OpenRouter GPT-5 (openai/gpt-5) ✓
- Multiple provider configurations ✓
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Pull Request Review: Adaptive Per-Model Capability Detection (v1.3.0)Overall AssessmentThis is an excellent refactor that significantly improves the codebase architecture. The replacement of hardcoded model patterns with adaptive capability detection is a smart design choice that makes the proxy future-proof and more maintainable. ✅ Strengths1. Architecture & Design ⭐⭐⭐⭐⭐The adaptive detection system is well-architected:
2. Code Quality ⭐⭐⭐⭐½
3. Documentation ⭐⭐⭐⭐⭐Outstanding documentation updates:
Issues & Concerns1. CRITICAL: Test Coverage Gap 🔴The tests are now outdated and will fail. The test files reference removed code: // internal/config/config_test.go:628
reasoningCache = &ReasoningModelCache{ // ❌ This type no longer exists
models: make(map[string]bool),
populated: false,
}
// internal/config/config_test.go:887
err := cfg.FetchReasoningModels() // ❌ This method was removedImpact: Lines 581-1037 in Recommendation:
2. Test File Mismatch 🟡
3. Potential Race Condition in Cache Initialization 🟡// internal/config/config.go:46
var (
modelCapabilityCache = make(map[CacheKey]*ModelCapabilities)
capabilityCacheMutex sync.RWMutex
)This is fine for most cases, but in a concurrent environment with many requests on first startup, multiple goroutines might simultaneously:
Impact: Minor - just duplicate work on first request, not a correctness issue. Recommendation: Consider using 4. Error Detection May Be Too Broad 🟡// internal/server/handlers.go:917-927
func isMaxTokensParameterError(errorMessage string) bool {
errorLower := strings.ToLower(errorMessage)
hasParamIndicator := strings.Contains(errorLower, "parameter") ||
strings.Contains(errorLower, "unsupported") ||
strings.Contains(errorLower, "invalid")
hasOurParam := strings.Contains(errorLower, "max_tokens") ||
strings.Contains(errorLower, "max_completion_tokens")
return hasParamIndicator && hasOurParam
}Potential Issue: This could match errors like:
Recommendation:
5. Cache Persistence & Expiry 🟢 (Design Choice)The cache is:
Consideration: For long-running proxies, if a provider adds support for Recommendation (Optional): Consider adding a TTL (e.g., 24 hours) where cached capabilities are re-validated. This is likely over-engineering for v1.3.0 though. Code Quality ObservationsPositive
Minor Nits
if cfg.Debug {
fmt.Printf("[DEBUG] Detected max_completion_tokens parameter error...")
}This pattern repeats 4 times. Consider a helper:
const defaultAssumeMaxCompletionTokens = trueSecurity Considerations✅ No security concerns identified:
Performance✅ Performance improvements:
Recommendations for MergeMust-Fix Before Merge 🔴
Should-Fix Before Merge 🟡
Optional (Can be Follow-up) 🟢
Documentation Completeness✅ Excellent documentation:
Final VerdictCode Quality: 8.5/10
Architecture: 9.5/10
Documentation: 10/10
Recommendation: APPROVE with required test fixes ✅Action ItemsBefore Merge:
Post-Merge (Optional):
Great work on this refactor! The adaptive detection system is a significant improvement over hardcoded patterns. Once the tests are updated, this is ready to merge. 🚀 |
Remove test functions that relied on the old IsReasoningModel() method and ReasoningModelCache type, which were removed during the adaptive per-model capability detection refactoring. The new system learns capabilities through error-based retry instead of pre-detecting them. Tests removed: - TestIsReasoningModelWithHardcodedFallback - TestIsReasoningModelWithCache - TestIsReasoningModelProviderSpecific - TestFetchReasoningModels - TestIsReasoningModel (from reasoning_model_test.go) - TestReasoningModelTokenParameter Also removed unused imports from config_test.go that were only used by the deleted tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
This PR replaces hardcoded model pattern matching with a fully adaptive per-model detection system that automatically learns which parameters each
(provider, model)combination supports. This eliminates special-casing and works with any current or future OpenAI-compatible provider without code changes.Key Changes
Code Refactoring (~100 lines removed):
IsReasoningModel()function with hardcoded gpt-5/o1/o2/o3/o4 patternsFetchReasoningModels()function and OpenRouter API integrationReasoningModelCachestruct and related codeencoding/json,net/httpfrom config.go)New Implementation:
CacheKey{BaseURL, Model}structuresync.RWMutex-dflag)Architecture
Adaptive Detection Flow:
max_completion_tokens→ Error → Retry without it → Cache resultCache Structure:
Benefits
Testing
Tested with:
Debug logging example:
Documentation
Philosophy
This release embodies the project philosophy:
The adaptive system eliminates special-casing and works with any current or future OpenAI-compatible provider.
Files Changed
Breaking Changes
None. This is a drop-in replacement that maintains full backward compatibility while improving functionality.
Ready for v1.3.0 Release
This PR is ready to be merged and released as v1.3.0.