A self-hosted LLM API gateway.
One unified endpoint between your application and every major AI provider.
Project Overview ✦ Key Features ✦ Architecture ✦ Diagrams ✦ Structure ✦ Installation ✦ API ✦ Tech Stack
| Service | URL |
|---|---|
| Dashboard | openrouter-clone-dashboard.vercel.app |
| Docs | openrouter-clone-docs.vercel.app |
| API Gateway | openrouter-clone-api-gateway.onrender.com |
| Primary Backend | orbyt-primary-backend.onrender.com |
A unified proxy layer that centralizes access to Large Language Models. Instead of managing complex integration with multiple provider SDKs, handling inconsistent streaming outputs, or writing brittle fallback logic to handle provider outages, you point your application to a single endpoint.
The gateway absorbs the complexity of network failures, latency spikes, and routing logic. If a primary model fails, the gateway immediately reroutes the execution to a secondary model. Uptime is preserved structurally and the client never sees the error.
| 01 Model Fallback | 02 Provider Selection |
|---|---|
| If a target model returns an error (rate limits, downtime, context violations), the gateway automatically tries the next model in a configured priority list. | Before sending a request, the system evaluates available providers. Route prompts dynamically based on strategy (e.g., cheapest or fastest). |
| 03 Retry Policy | 04 Streaming |
|---|---|
| Configurable retry behavior before escalating to full model fallback. Handles transient network errors gracefully using explicit attempts and delay logic. | Real-time token streaming via Server Sent Events. The gateway unifies provider-specific chunk formatting into a single, predictable interface. |
DevTools Tracing Session See your request in real-time as it moves through the system—from pending to completion. Gain deep visibility into the full lifecycle of every execution within a dedicated tracing session.
- Live Status Updates: Real-time tracking of pending, success, and error states.
- Payload Visibility: Full request and response transmission details.
- Performance: Latency metrics and exact token usage insights.
- Routing: Visibility into retries and provider selection logic.
Everything is transparent, so you always know exactly what’s happening.
| Capability | Impact |
|---|---|
| Presets | Define model configs in the dashboard and apply them on the fly using @preset in your SDK calls. |
| Budget Limits | Establish spending maximums per request or per active user. |
| Multimodality | Direct proxy compatibility for image inputs, PDF document analysis, and video. |
| Zero Insurance | If all fallback routes and retries fail entirely, the execution is never billed. |
| BYOK | Unbind yourself from billing by letting end-users provide their own API keys. |
| Domain | Traditional Setup | Our Gateway |
|---|---|---|
| Integration | Maintaining 5+ SDKs and unique payload shapes | A single OpenAI-compatible endpoint |
| Reliability | Application crashes during provider outages | Automated model and provider fallbacks |
| Error Handling | Bloated blocks of retry code in application logic | Centralized routing and exponential backoff |
| Visibility | Blind faith until the monthly invoice arrives | Millisecond tracing and exact token counting |
Requests flow through highly structured layers. Processing logic is deterministic, isolating faults based on origin while utilizing high-throughput data stores to protect downstream limits.
flowchart LR
Req([Client Request]) --> RL{Rate Limiter}
RL -->|Limit Exceeded| Drop[Reject 429]
RL -->|Approved| Strat[Provider Selector]
Strat --> Pool[(Redis Key Pool)]
Pool --> Exec[API Execution]
Exec -.->|Log Traces| PG[(PostgreSQL)]
Exec -.->|Transient 5xx| Retry((Retry Policy))
Retry -.->|Wait & Retry| Exec
Exec ==>|Success| Stream(((Normalize & Stream)))
Exec -.->|Hard Error| Engine{Decision Engine}
Engine -->|Provider Exhausted| Pool
Engine -->|Model Exhausted| Strat
Engine -->|Client Error 4xx| DropReq[Reject: Inform User]
Engine -->|Config Error 5xx| Alert[Alert Dev + Inform User]
When a request enters the gateway, it is first evaluated by a Global Rate Limiter. If traffic bounds are respected, the Provider Selector evaluates your fallback configurations to pick the optimal mathematical route (cheapest, fastest, etc.).
The execution runtime then leases the healthiest available API key from the Redis Key Pool (ranked dynamically by remaining TPM/RPM capacity) to hit the LLM provider.
If the API execution encounters an anomaly:
- Transient network errors trigger your designated retry policy with specific delays.
- Hard provider failures are intercepted by the Decision Engine. The engine temporarily evicts the bad key and cycles to the
Provider Exhaustedqueue, or re-evaluates a new provider entirely (Model Exhausted). - Bad parameters (4xx) are bounced directly back to the client.
- Internal gateway errors (5xx) notify engineering telemetry while returning a safe failure state to the client.
All telemetry records and trace logs are saved asynchronously to PostgreSQL.
View System Diagrams
/
├── apps
│ ├── api-gateway/ — Execution router and decision logic
│ ├── dashboard/ — Administrative interface for telemetry
│ ├── devtools/ — Traces UI and system debug views
│ └── primary-backend/— Authentication and configuration state
├── packages
│ ├── config/ — Centralized system configurations
│ ├── db/ — Prisma data layer and PostgreSQL schemas
│ ├── eslint-config/ — Monorepo linting synchronization
│ ├── types/ — Inter-service TypeScript definitions
│ ├── typescript-config/ — Shared TS compilation settings
│ ├── ui/ — Shared React component library
│ └── utils/ — Standardized helper libraries
└── turbo.json — Build orchestration
Prerequisites: Node.js v18+, PostgreSQL.
Initial Setup
1. Clone the repository
git clone https://github.com/Srijan76-code/openrouter-clone.git
cd openrouter-clone
npm install2. Configure Environment
Duplicate .env.example to .env.
DATABASE_URL="postgresql://user:password@localhost:5432/gateway"
PORT="4000"3. Initialize Database
npx turbo run db:generate
npx turbo run db:push4. Start
npm run dev5. Active Port Mapping
| Application | Local Port |
|---|---|
| Dashboard | 3000 |
| API Gateway | 3001 |
| Primary Backend | 4000 |
| DevTools | 4983 |
Tip: Traces have a dedicated UI. Open
localhost:4983in your browser, run a request via Postman to the API Gateway, and watch the telemetry populate in real-time.
Harnessing the routing engine requires minimal declarative configuration inside standard structures.
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://openrouter-clone-api-gateway.onrender.com/v1",
apiKey: "gateway-sk-12345",
});
const response = await openai.chat.completions.create({
model: "google/gemini-3.1-pro", // Primary Model
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of germany?" },
],
temperature: 0.7,
stream: false,
// --- OPENROUTER CUSTOM EXTENSIONS ---
extra: {
fallback_models: [
"anthropic/claude-3-haiku",
"google/gemini-2.5-flash",
],
provider: "cheap", // Override standard routing mechanism
retry: 3, // Set custom retry handler count
},
});
console.log(response)| Domain | Technology | Implementation Objective |
|---|---|---|
| API Gateway | Node.js & Express | Proxying high-throughput streams and evaluating error limits. |
| Type Safety | TypeScript | Structuring rigid data contracts across internal monorepo packages. |
| Monorepo | Turborepo | Facilitating isolated builds and rapid cache-hitting deployments. |
| Database | PostgreSQL & Prisma | Relational data persistence for telemetry and configuration states. |
| Dashboard | Next.js 14 | Delivering a lightweight React interface for configuration tracking. |



