RobinLLM is an intelligent Large Language Model (LLM) routing service that automatically selects the best performing model for each request. It provides an OpenAI-compatible API while intelligently routing requests across multiple free LLM providers.
- Intelligent Routing: Automatically selects models based on latency, success rate, and rate limit proximity
- Performance Monitoring: Tracks metrics for all models and routes to best performers
- Circuit Breaker: Automatically stops routing to failing models with automatic recovery
- Load Balancing: Distributes requests across top-performing models using round-robin
- Auto-Discovery: Scrapes available models from OpenRouter API
- OpenAI Compatible: Drop-in replacement for OpenAI API clients
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Client │────▶│ Router │────▶│ Model Pool │
└─────────────┘ └──────────────┘ └─────────────┘
│ │
│ ▼
│ ┌──────────────┐
│ │ Metrics │
│ │ Collector │
│ └──────────────┘
▼
┌──────────────┐
│ Load Balancer │
└──────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Model A │ │ Model B │ │ Model C │
└──────────┘ └──────────┘ └──────────┘
- Request Ingestion: Client sends request to
/v1/chat/completions - Model Selection: Router selects best models based on current metrics
- Load Balancing: Load balancer picks model from top performers using round-robin
- Circuit Breaker Check: Verifies selected model is not in open circuit state
- Request Execution: Forwards request to selected model endpoint
- Response Handling: Returns OpenAI-compatible response to client
- Metrics Collection: Updates performance metrics for the used model
- Circuit Breaker Update: Adjusts circuit breaker state based on success/failure
Location: com.robinllm.api.OpenAICompatController
REST controller providing OpenAI-compatible API endpoints.
Endpoints:
POST /v1/chat/completions- Main chat completion endpointGET /v1/models- List all available modelsGET /v1/models/{id}- Get specific model detailsGET /v1/models/{id}/metrics- Get model performance metricsGET /v1/stats- Get system statisticsPOST /v1/stats/reset- Reset all statisticsGET /v1/health- Health checkGET /v1/- Service information
Responsibilities:
- Request validation
- Response formatting (OpenAI-compatible)
- Error handling and HTTP status codes
- Metrics and statistics aggregation
Location: com.robinllm.router.RequestRouter
Core routing logic for LLM requests.
Methods:
routeRequest(OpenAIChatRequest)- Routes request to best available modelselectBestModel()- Selects model based on scoring algorithmcreateErrorResponse(String)- Creates error response in OpenAI formatgetTotalRequests()- Returns total requests servedgetTotalFailures()- Returns total failed requestsgetSuccessRate()- Returns current success rateresetStats()- Resets all routing statistics
Routing Algorithm:
score = (1 - latency_norm) * weight_latency +
success_rate * weight_success +
(1 - rate_limit_proximity) * weight_rate_limit
Where:
latency_norm= normalized average latency (0-1)success_rate= recent success rate (0-1)rate_limit_proximity= how close to rate limit (0-1)- Default weights: latency=0.6, success=0.3, rate_limit=0.1
Location: com.robinllm.router.LoadBalancer
Implements load balancing with circuit breaker pattern.
Methods:
selectModel(List<LLMModel>)- Selects model using round-robinrecordSuccess(String)- Records successful request for modelrecordFailure(String)- Records failed request for modelgetModelHealthScore(String)- Returns current health score (0-1)resetCircuitBreaker(String)- Resets circuit breaker for specific modelresetAllCircuitBreakers()- Resets all circuit breakers
Circuit Breaker States:
- CLOSED: Normal operation, requests flow through
- OPEN: Model has failed too many times, requests blocked
- HALF_OPEN: Testing if model has recovered
Thresholds:
- Failure threshold: 50% (configurable)
- Recovery cooldown: 5 minutes
- Minimum requests before opening: 5
Location: com.robinllm.model.ModelPool
Manages collection of available LLM models.
Methods:
addModel(LLMModel)- Adds model to poolremoveModel(String)- Removes model from poolgetModel(String)- Retrieves model by IDgetModelStatus(String)- Gets model statusgetAvailableModels()- Lists all available modelsgetActiveModels()- Lists active modelsgetFreeModels()- Lists free modelssize()- Returns total model count
Location: com.robinllm.router.ModelSelector
Ranks models based on performance metrics.
Methods:
selectBestModels(List<LLMModel>)- Returns top N models by scorecalculateScore(LLMModel)- Calculates model score using weights
Location: com.robinllm.metrics.MetricsCollector
Collects and aggregates performance metrics.
Methods:
recordRequest(String, long, boolean)- Records request latency and successgetLatestMetrics(String)- Retrieves latest metrics for modelcalculatePerformanceMetrics(String)- Computes P95, P99, etc.
Location: com.robinllm.client.OpenRouterClient
HTTP client for OpenRouter API.
Methods:
chatCompletions(OpenAIChatRequest)- Sends chat completion requestgetModelInfo(String)- Retrieves model informationresetRateLimiters()- Resets rate limit tracking
Rate Limiting:
- Tracks requests per model
- Implements exponential backoff
- Prevents quota exhaustion
Location: com.robinllm.scraper.OpenRouterScraper
Discovers and scrapes available models from OpenRouter.
Methods:
scrapeModels(String, String)- Fetches models from APIsaveModels(List<LLMModel>)- Persists models to database
ModelRepository - com.robinllm.repository.ModelRepository
save(LLMModel)- Saves/updates modelfindById(String)- Retrieves model by IDfindAll()- Lists all modelsfindByStatus(String)- Filters by statusdelete(String)- Removes model
MetricsRepository - com.robinllm.repository.MetricsRepository
save(ModelMetrics)- Saves metrics snapshotfindById(long)- Retrieves metrics by IDfindLatestByModelId(String)- Gets most recent metricsfindByModelId(String)- Gets all metrics for modelfindSince(LocalDateTime)- Gets metrics after timestampfindLatestForAllModels()- Gets latest metrics per modeldeleteOlderThan(LocalDateTime)- Cleanup old metrics
POST /v1/chat/completions
{
"model": "auto | openrouter/free | meta-llama/llama-3-8b-instruct",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 0.9
}Success Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "meta-llama/llama-3-8b-instruct",
"provider": "meta-llama",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you for asking!"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}Error Response
{
"id": "err-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "error",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Error message here"
},
"finish_reason": "error"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}GET /v1/models
{
"object": "list",
"data": [
{
"id": "meta-llama/llama-3-8b-instruct",
"object": "model",
"created": 1234567890,
"owned_by": "meta-llama"
}
]
}GET /v1/models/{id}/metrics
{
"id": 1,
"modelId": "meta-llama/llama-3-8b-instruct",
"avgLatencyMs": 450.5,
"successRate": 0.95,
"errorRate": 0.05,
"p95LatencyMs": 800.0,
"p99LatencyMs": 1200.0,
"requestsPerSecond": 2.5,
"measuredAt": "2025-02-10T12:00:00"
}GET /v1/stats
{
"total_models": 181,
"active_models": 150,
"free_models": 75,
"total_requests": 10000,
"total_failures": 250,
"success_rate": "97.50%",
"uptime": 1234567
}File: src/main/resources/application.properties
# Application
quarkus.application.name=robinllm
quarkus.http.port=8080
# Scraper
scraper.enabled=true
scraper.interval=1h
scraper.openrouter.url=https://openrouter.ai/models
scraper.filter=free
# Metrics
metrics.enabled=true
metrics.interval=1h
metrics.test.prompts=What is 2+2?,Explain photosynthesis
metrics.top-models=3
# Router
router.weight.latency=0.6
router.weight.success=0.3
router.weight.rate-limit=0.1
router.circuit-breaker.threshold=0.5
router.retry.max=3
router.retry.backoff=1000
# API
api.compatibility=openai
api.max-tokens=4096
api.timeout=30000
# OpenRouter
openrouter.api-key=${OPENROUTER_API_KEY}
openrouter.base-url=https://openrouter.ai/api/v1OPENROUTER_API_KEY- Your OpenRouter API key (required)
The router uses configurable weights to prioritize different factors:
| Factor | Default Weight | Description |
|---|---|---|
| Latency | 0.6 | Lower latency = higher score |
| Success Rate | 0.3 | Higher success = higher score |
| Rate Limit Proximity | 0.1 | Farther from limit = higher score |
Adjust weights based on your priorities:
- Prioritize speed: Increase
weight.latency - Prioritize reliability: Increase
weight.success - Prioritize quota: Increase
weight.rate-limit
- Java 21 - Latest LTS with virtual threads support
- Quarkus 3.6.4 - Cloud-native framework
- Maven - Build and dependency management
- SQLite - Embedded database for metrics persistence
- JAX-RS - Jakarta REST API specification
- Jackson - JSON serialization/deserialization
robinllm/
├── src/main/java/com/robinllm/
│ ├── api/ # REST controllers
│ │ └── OpenAICompatController.java
│ ├── client/ # External API clients
│ │ ├── LLMClientFactory.java
│ │ └── OpenRouterClient.java
│ ├── config/ # Configuration classes
│ │ └── AppConfig.java
│ ├── dto/ # Data Transfer Objects
│ │ ├── OpenAIChatRequest.java
│ │ └── OpenAIChatResponse.java
│ ├── metrics/ # Metrics collection
│ │ ├── MetricsCollector.java
│ │ └── MetricsScheduler.java
│ ├── model/ # Domain models
│ │ ├── LLMModel.java
│ │ ├── ModelMetrics.java
│ │ ├── ModelPool.java
│ │ └── PerformanceMetrics.java
│ ├── repository/ # Data access
│ │ ├── BaseRepository.java
│ │ ├── MetricsRepository.java
│ │ └── ModelRepository.java
│ ├── router/ # Routing logic
│ │ ├── CircuitBreakerState.java
│ │ ├── LoadBalancer.java
│ │ ├── ModelHealth.java
│ │ ├── ModelSelector.java
│ │ └── RequestRouter.java
│ ├── scraper/ # Model discovery
│ │ ├── ModelDetailsExtractor.java
│ │ ├── OpenRouterScraper.java
│ │ └── ScraperScheduler.java
│ ├── startup/ # Application startup
│ │ ├── DatabaseInitializer.java
│ │ └── StartupInitializer.java
│ └── RobinLLMMain.java # Application entry point
├── src/main/resources/
│ ├── application.properties # Configuration
│ └── db/migration/
│ └── V1__Initial_schema.sql # Database schema
└── src/test/java/com/robinllm/ # Test suite
# Compile
mvn clean compile
# Run tests
mvn test
# Build package
mvn clean package
# Skip tests during build
mvn clean package -DskipTests
# Run in dev mode
mvn quarkus:devUnit tests cover:
- Repository operations
- DTO validation
- Client communication
- Routing algorithms
- Load balancing logic
Run specific test classes:
mvn test -Dtest=OpenRouterClientTest
mvn test -Dtest=ModelPoolTest
mvn test -Dtest=MetricsRepositoryTestTo add support for a new LLM provider:
- Create client class implementing LLM client interface
- Add provider-specific metrics collection
- Update LoadBalancer to include new provider models
- Add configuration properties for provider
- Update scraper to discover provider's models
- Java 21 or later
- Maven 3.8+
- 512MB RAM minimum (2GB recommended)
- 100MB disk space
# Build
mvn clean package
# Set API key
export OPENROUTER_API_KEY=your_api_key_here
# Run
java -jar target/quarkus-app/quarkus-run.jarFROM quay.io/quarkus/quarkus-micro-image:2.0
WORKDIR /work
COPY target/quarkus-app/quarkus-run.jar application.properties ./
EXPOSE 8080
CMD ["java", "-jar", "quarkus-run.jar"]# Build Docker image
docker build -t robinllm .
# Run container
docker run -p 8080:8080 -e OPENROUTER_API_KEY=your_key robinllm- Database Persistence: SQLite is embedded - ensure backup strategy
- Rate Limits: Monitor OpenRouter API quotas
- Circuit Breaker: Adjust thresholds based on production patterns
- Logging: Configure appropriate log levels
- Scaling: Can horizontally scale behind load balancer
Metrics are collected:
- Per Request: Latency, success/failure, error type
- Per Model: Average latency, P95/P99, success rate, RPS
- System: Total requests, total failures, overall success rate, uptime
- Connection Pooling: Reuse HTTP connections
- Caching: Cache model list and metadata
- Async Operations: Use virtual threads for I/O
- Bulk Metrics: Batch metric updates to database
Expected performance on typical hardware:
- Request routing: < 1ms
- Model selection: < 10ms
- End-to-end latency: Model latency + ~50ms overhead
No models available
- Verify API key is set
- Check scraper logs for errors
- Confirm OpenRouter API is accessible
High error rates
- Review
/v1/statsfor model-specific metrics - Check circuit breaker status
- Verify network connectivity
Slow responses
- Check model latency metrics via
/v1/models/{id}/metrics - Consider adjusting router weights
- Verify database performance
Configure logging in application.properties:
# Log level
quarkus.log.level=INFO
quarkus.log.category."com.robinllm".level=DEBUG
# Log format
quarkus.log.console.format=%d{HH:mm:ss} %-5p [%c{2.}] (%t) %s%e%nContributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Write tests for changes
- Submit a pull request
MIT License - See LICENSE file for details.