Skip to content

Commit d51fba9

Browse files
committed
feat: news pages and doc optimisations
1 parent ef28223 commit d51fba9

42 files changed

Lines changed: 785 additions & 932 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

api-reference/messages.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ When `stream: true`, the response is sent as Server-Sent Events (SSE). Each even
173173
<ParamField header="X-Edgee-Enable-Compression" type="boolean">
174174
Enable token compression to reduce token usage. When enabled, the gateway automatically compresses your prompts to reduce costs by up to 50%.
175175

176-
See [Token Compression](/features/token-compression) for more details.
176+
See [Token Compression for Agentic Workloads](/features/agentic-compression) for more details.
177177
</ParamField>
178178

179179
<ParamField header="X-edgee-tags" type="string">

docs.json

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,18 @@
3636
{
3737
"source": "/introduction/privacy",
3838
"destination": "/introduction/why-edgee"
39+
},
40+
{
41+
"source": "/features/token-compression",
42+
"destination": "/features/agentic-compression"
43+
},
44+
{
45+
"source": "/quickstart/sdk",
46+
"destination": "/quickstart/integration"
47+
},
48+
{
49+
"source": "/features/overview",
50+
"destination": "/features/agentic-compression"
3951
}
4052
],
4153

@@ -73,16 +85,17 @@
7385
"quickstart/index",
7486
"quickstart/account-creation",
7587
"quickstart/api-key",
76-
"quickstart/sdk"
88+
"quickstart/integration"
7789
]
7890
},
7991
{
8092
"group": "Features",
8193
"pages": [
82-
"features/overview",
83-
"features/token-compression",
94+
"features/agentic-compression",
95+
"features/claude-compression",
8496
"features/observability",
85-
"features/edge-tools"
97+
"features/edge-tools",
98+
"features/byok"
8699
]
87100
},
88101
{
Lines changed: 44 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,35 @@
11
---
2-
title: Token Compression
3-
description: Reduce LLM costs by up to 50% with edge-native prompt compression.
4-
icon: dollar-sign
2+
title: Agentic Token Compression
3+
sidebarTitle: Agentic Compression
4+
description: Reduce LLM costs with intelligent compression engines that combine lossy and lossless strategies.
5+
icon: /images/icons/agentic-comp.svg
56
---
67

7-
# Reduce LLM costs by up to 50%
8+
<img src="/images/banner-agentic-compression.png" alt="Agentic token compression" />
89

9-
Edgee's token compression runs at the edge before every request reaches LLM providers, automatically reducing prompt size by up to 50% while preserving semantic meaning and output quality.
10+
Edgee's agentic token compression runs at the edge before every request reaches LLM providers, automatically reducing prompt size by up to 50% while preserving semantic meaning and output quality.
1011

1112
This is particularly effective for:
1213
- RAG pipelines with large document contexts
1314
- Long conversation histories in multi-turn agents
1415
- Verbose system instructions and formatting
1516
- Document analysis and summarization tasks
1617

18+
<Note>
19+
Looking for lossless compression for Claude Code? See [Claude Token Compression (Beta)](/features/claude-compression).
20+
</Note>
21+
1722
## How It Works
1823

19-
Token compression happens automatically on every request through a four-step process:
24+
Agentic token compression uses multiple strategies that work together on every request. The core semantic compression strategy follows a four-step process; other strategies (tool compression, smart crusher, cache aligner) work in parallel, some fully lossless.
2025

2126
<Steps>
2227
<Step title="Semantic Analysis">
2328
Analyze the prompt structure to identify redundant context, verbose formatting, and compressible sections without losing critical information.
2429
</Step>
2530

2631
<Step title="Context Optimization">
27-
Compress repeated context (common in RAG), condense verbose formatting, and remove unnecessary whitespace while maintaining semantic relationships.
32+
Compress repeated context (common in RAG), condense verbose formatting, and remove unnecessary elements while maintaining semantic relationships.
2833
</Step>
2934

3035
<Step title="Instruction Preservation">
@@ -50,7 +55,10 @@ The **compression ratio** (sometimes called *compression rate* in APIs) is **com
5055
In the console you choose **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)**. The compressor aims for that ratio; the actual ratio per request may vary. Strong (0.7) asks for more compression; Light (0.9) is more conservative and keeps more of the original text.
5156

5257
<Tip>
53-
**Ratio vs reduction:** Ratio = compressed/original (e.g. 0.75). Reduction = 1 − ratio (e.g. 25%). When we say "50% reduction," that corresponds to a ratio of 0.50.
58+
**Ratio vs reduction:**
59+
60+
Ratio = compressed/original (e.g. 0.75). <br />
61+
Reduction = 1 − ratio (e.g. 25%). When we say "50% reduction," that corresponds to a ratio of 0.50.
5462
</Tip>
5563

5664
## Semantic preservation and BERT score
@@ -66,13 +74,13 @@ This way you can allow aggressive compression (low ratio) while still guaranteei
6674
In the Activity table, when we fell back to the original prompt because the similarity was below the threshold, the input token count is shown in red with a tooltip: "Didn't match the semantic threshold – original prompt was used."
6775
</Tip>
6876

69-
## Enabling Token Compression
77+
## Enabling Agentic Token Compression
7078

71-
Token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level:
79+
Agentic token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level.
7280

73-
### 1. Per Request (SDK)
81+
### 1. Per Request (SDK or Headers)
7482

75-
Enable compression for specific requests using the SDK:
83+
Enable compression for specific requests using the SDK or headers:
7684

7785
<Tabs>
7886
<Tab title="TypeScript">
@@ -126,47 +134,34 @@ Enable compression for specific requests using the SDK:
126134
let response = client.send("gpt-5.2", input).await?;
127135
```
128136
</Tab>
137+
138+
<Tab title="cURL">
139+
```bash
140+
curl -X POST "https://api.edgee.ai/v1/chat/completions" \
141+
-H "Authorization: Bearer $EDGEE_API_KEY" \
142+
-H "Content-Type: application/json" \
143+
-H "x-edgee-enable-compression: true" \
144+
-H "x-edgee-compression-rate: 0.8" \
145+
-d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "Your prompt here"}]}'
146+
```
147+
</Tab>
129148
</Tabs>
130149

131150
### 2. Per API Key (Console)
132151

133152
Enable compression for specific API keys in your organization settings. This is useful when you want different compression settings for different applications or environments.
134153

135154
<Frame>
136-
<img src="/images/compression-enabled-by-tag-light.png" alt="Enable compression for specific API keys" className="dark:hidden" />
137-
<img src="/images/compression-enabled-by-tag-dark.png" alt="Enable compression for specific API keys" className="hidden dark:block" />
138-
</Frame>
139-
140-
In the **Edge Models** section of your console:
141-
1. Toggle **Enable token compression** on
142-
2. Set **Compression** to **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)** — see [Understanding compression ratio](#understanding-compression-ratio)
143-
3. Set **Semantic preservation threshold** to **Off**, **Ultra Safe (0.95)**, **Safe (0.85)**, or **Edgy (0.75)** — see [Semantic preservation and BERT score](#semantic-preservation-and-bert-score)
144-
4. Under **Scope**, select **Apply to specific API keys**
145-
5. Choose which API keys should use compression
146-
147-
### 3. Organization-Wide (Console)
148-
149-
Enable compression for all requests across your entire organization. This is the recommended setting for most users to maximize savings automatically.
150-
151-
<Frame>
152-
<img src="/images/compression-enabled-org-light.png" alt="Enable compression organization-wide" className="dark:hidden" />
153-
<img src="/images/compression-enabled-org-dark.png" alt="Enable compression organization-wide" className="hidden dark:block" />
155+
<img src="/images/compression-by-tag-light.png" alt="Enable compression for specific API keys" className="dark:hidden" />
156+
<img src="/images/compression-by-tag-dark.png" alt="Enable compression for specific API keys" className="hidden dark:block" />
154157
</Frame>
155158

156159
In the **Edge Models** section of your console:
157-
1. Toggle **Enable token compression** on
158-
2. Set **Compression** to **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)**
159-
3. Set **Semantic preservation threshold** to **Off**, **Ultra Safe (0.95)**, **Safe (0.85)**, or **Edgy (0.75)**
160-
4. Under **Scope**, select **Apply to all org requests**
161-
5. All API keys will now use compression by default
160+
1. Set **Compression** to **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)** — see [Understanding compression ratio](#understanding-compression-ratio)
161+
2. Set **Semantic preservation threshold** to **Off**, **Ultra Safe (0.95)**, **Safe (0.85)**, or **Edgy (0.75)** — see [Semantic preservation and BERT score](#semantic-preservation-and-bert-score)
162+
3. Under **Scope**, select **Apply to specific API keys**
163+
4. Choose which API keys should use compression
162164

163-
<Tip>
164-
**Compression** controls how aggressively Edgee compresses prompts: **Strong (0.7)** aims for more compression; **Light (0.9)** is more conservative. **Medium (0.8)** is the default. See [Understanding compression ratio](#understanding-compression-ratio).
165-
</Tip>
166-
167-
<Note>
168-
SDK-level configuration takes precedence over console settings. If you enable compression in your code with `enable_compression: true`, it will override the console configuration for that specific request.
169-
</Note>
170165

171166
## When It Works Best
172167

@@ -242,56 +237,19 @@ Compression time: 14ms
242237

243238
## Real-World Savings
244239

245-
Here's what token compression means for your monthly AI bill:
240+
Here's what token compression means for your monthly AI bill with 50% compression:
246241

247-
| Use Case | Monthly Requests | Without Edgee | With Edgee (50% compression) | **Monthly Savings** |
248-
|----------|-----------------|---------------|------------------------------|---------------------|
249-
| RAG Q&A (GPT-5.2) | 100,000 @ 2,000 tokens | $3,000 | $1,500 | **$1,500** |
250-
| Document Analysis (Claude 3.5) | 50,000 @ 4,000 tokens | $1,800 | $900 | **$900** |
251-
| Chatbot (GPT-4o-mini) | 500,000 @ 500 tokens | $375 | $188 | **$187** |
252-
| Multi-turn Agent (GPT-5.2) | 200,000 @ 1,000 tokens | $3,000 | $1,500 | **$1,500** |
242+
| Use Case | Monthly Requests | Without Edgee | With Edgee |
243+
|----------|-----------------|---------------|-------------|
244+
| RAG Q&A (GPT-5.2) | 1,000,000 @ 2,000 input tokens | $3,500 | **$1,750** |
245+
| Document Analysis (Sonnet 4.6) | 50,000 @ 20,000 input tokens | $3,000 | **$1,500** |
246+
| Chatbot (Haiku) | 5,000,000 @ 500 input tokens | $2,500 | **$1,250** |
253247

254-
<Note>
255-
Savings calculations use list pricing for GPT-4o ($5/1M input tokens), Claude 3.5 Sonnet ($3/1M input tokens), and GPT-4o-mini ($0.15/1M input tokens). Actual compression ratios vary by use case.
256-
</Note>
257248

258-
## Best Practices
259-
260-
<AccordionGroup>
261-
<Accordion title="Optimize prompts for compression">
262-
- Structure RAG contexts with clear sections
263-
- Use consistent formatting in document chunks
264-
- Avoid excessive whitespace in system prompts
265-
- Group similar information together
266-
</Accordion>
267-
268-
<Accordion title="Track savings over time">
269-
- Monitor `compression.saved_tokens` and `compression.cost_savings` across requests
270-
- Use `compression.reduction` to gauge effectiveness per request
271-
- Calculate cumulative savings weekly or monthly
272-
- Use observability tools to identify high-compression opportunities
273-
- Compare costs across different use cases
274-
</Accordion>
275-
276-
<Accordion title="Configure compression per use case">
277-
- Enable compression by default for all requests
278-
- Compression happens automatically without configuration
279-
- Track `compression.reduction` to understand effectiveness (e.g. `48` = 48% fewer tokens)
280-
- Monitor `compression.time_ms` to ensure compression latency fits your SLA
281-
- Use response metrics to optimize prompt design
282-
</Accordion>
283-
284-
<Accordion title="Combine with cost-aware routing">
285-
- Use [automatic model selection](/features/automatic-model-selection) for additional savings
286-
- Route to cheaper models when appropriate
287-
- Compression + routing can reduce costs by 60-70% total
288-
- Monitor both compression and routing savings
289-
</Accordion>
290-
</AccordionGroup>
291249

292250
## Response Fields
293251

294-
Every Edgee response includes detailed compression metrics:
252+
Every Edgee response includes the standard usage information, and detailed compression metrics (if compression was applied):
295253

296254
```typescript
297255
// Usage information
@@ -311,23 +269,3 @@ Use these fields to:
311269
- Build cost dashboards and budgeting tools
312270
- Identify high-value compression opportunities
313271
- Optimize prompt design for maximum compression
314-
315-
## What's Next
316-
317-
<CardGroup cols={2}>
318-
<Card title="Observability" icon="chart-line" iconType="duotone" href="/features/observability">
319-
Monitor token savings, costs, and compression ratios across all requests.
320-
</Card>
321-
322-
<Card title="Intelligent Routing" icon="route" iconType="duotone" href="/features/automatic-model-selection">
323-
Combine compression with cost-aware model routing for even greater savings.
324-
</Card>
325-
326-
<Card title="Quick Start" icon="rocket" iconType="duotone" href="/quickstart">
327-
Get started in 5 minutes and start saving on your next request.
328-
</Card>
329-
330-
<Card title="SDK Documentation" icon="code" iconType="duotone" href="/sdk">
331-
Explore SDKs in TypeScript, Python, Go, and Rust with built-in compression support.
332-
</Card>
333-
</CardGroup>

features/automatic-model-selection.mdx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -189,8 +189,12 @@ await edgee.routing.addRule({
189189
## What's Next
190190

191191
<CardGroup cols={2}>
192-
<Card title="Token Compression" icon="dollar-sign" iconType="duotone" href="/features/token-compression">
193-
Learn how compression reduces costs by up to 50% before routing.
192+
<Card title="Agentic Token Compression" icon="dollar-sign" iconType="duotone" href="/features/agentic-compression">
193+
Learn how agentic compression reduces costs by up to 50% before routing.
194+
</Card>
195+
196+
<Card title="Token Compression for Claude Code" icon="dollar-sign" iconType="duotone" href="/features/claude-compression">
197+
Learn how tool result compression for Claude Code reduces costs by up to 50% before routing.
194198
</Card>
195199

196200
<Card title="Observability" icon="chart-line" iconType="duotone" href="/features/observability">

features/byok.mdx

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
title: Bring Your Own Keys (BYOK)
3+
sidebarTitle: BYOK
4+
description: Use your own provider keys for billing and provider-side controls, while keeping Edgee's routing and observability.
5+
icon: key-round
6+
---
7+
8+
BYOK lets you register your own LLM provider API keys with Edgee. Requests are routed through your key, so they're billed to
9+
your provider account, subject to your own rate limits and provider controls, while Edgee's routing, compression, and observability continue to work normally.
10+
Keys are stored securely and masked immediately after creation.
11+
12+
## Supported Providers
13+
14+
| Provider | Credential Type |
15+
|----------|----------------|
16+
| Anthropic | API Key |
17+
| OpenAI | API Key |
18+
| Google Vertex AI | Service account JSON |
19+
| Mistral | API Key |
20+
| DeepSeek | API Key |
21+
| xAI | API Key |
22+
| zAI | API Key |
23+
| AWS Bedrock | IAM credentials per region |
24+
25+
## Adding a Provider Key
26+
27+
Navigate to **BYOK** in the console sidebar. Click **Create a new Provider Key**. Fill in the form:
28+
29+
**1. Provider**: select from the list above.
30+
31+
**2. Name**: a friendly label for this key (e.g. `Production Key`).
32+
33+
**3. Credentials**: varies by provider:
34+
35+
- **Most providers** (Anthropic, OpenAI, Mistral, DeepSeek, xAI, zAI): enter a single API key (e.g. `sk-...`).
36+
- **AWS Bedrock**: enter IAM credentials per region. For each region, provide:
37+
- **Region**: (e.g. `us-east-1`, or `global` for a single fallback credential set)
38+
- **Access Key ID**: (e.g. `AKIA...`)
39+
- **Secret Access Key**:
40+
41+
Click **Add region** to configure multiple regions in one entry.
42+
- **Google Vertex AI**: paste the full service account JSON key file downloaded from Google Cloud Console.
43+
44+
**4. Test credentials**: click **Test credentials** to validate your credentials against the provider before saving.
45+
46+
**5. Assignment**: choose how the key is used:
47+
- Check **Assign to entire organization** to use this key as the default for all requests, across all your Edgee API keys.
48+
- Or select one or more **specific API keys** from the multi-select to limit the key to those keys only.
49+
- **Save**: click **Save** to create the provider key.
50+
51+
<Note>
52+
Keys are stored securely and never revealed after creation. When editing a key, leave the credentials field empty to keep the existing key unchanged.
53+
</Note>
54+
55+
<Frame>
56+
<img src="/images/byok-create-light.png" alt="Create a provider key" className="dark:hidden" />
57+
<img src="/images/byok-create-dark.png" alt="Create a provider key" className="hidden dark:block" />
58+
</Frame>
59+
60+
## Assignment: Organization vs. API Keys
61+
62+
Provider keys can be scoped in two ways:
63+
64+
- **Organization (all keys)**: the provider key is used as the default for every request sent through your organization, regardless of which Edgee API key is used. Shown as a purple **Organization (all keys)** badge in the table.
65+
- **Specific API keys**: the provider key only applies to the selected Edgee API keys. Shown as blue per-key badges.
66+
67+
This lets you use different provider accounts for different environments or projects, for example, one OpenAI key for staging and another for production.
68+
69+
## AWS Bedrock: Multi-Region Setup
70+
71+
AWS Bedrock requires IAM credentials per region. You can configure multiple regions in a single provider key entry,
72+
add one row per region, each with its own Access Key ID and Secret Access Key. Use `global` as the region name to define a single fallback credential set.
73+
74+
75+

0 commit comments

Comments
 (0)