You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Edgee's token compression runs at the edge before every request reaches LLM providers, automatically reducing prompt size by up to 50% while preserving semantic meaning and output quality.
10
+
Edgee's agentic token compression runs at the edge before every request reaches LLM providers, automatically reducing prompt size by up to 50% while preserving semantic meaning and output quality.
10
11
11
12
This is particularly effective for:
12
13
- RAG pipelines with large document contexts
13
14
- Long conversation histories in multi-turn agents
14
15
- Verbose system instructions and formatting
15
16
- Document analysis and summarization tasks
16
17
18
+
<Note>
19
+
Looking for lossless compression for Claude Code? See [Claude Token Compression (Beta)](/features/claude-compression).
20
+
</Note>
21
+
17
22
## How It Works
18
23
19
-
Token compression happens automatically on every request through a four-step process:
24
+
Agentic token compression uses multiple strategies that work together on every request. The core semantic compression strategy follows a four-step process; other strategies (tool compression, smart crusher, cache aligner) work in parallel, some fully lossless.
20
25
21
26
<Steps>
22
27
<Steptitle="Semantic Analysis">
23
28
Analyze the prompt structure to identify redundant context, verbose formatting, and compressible sections without losing critical information.
24
29
</Step>
25
30
26
31
<Steptitle="Context Optimization">
27
-
Compress repeated context (common in RAG), condense verbose formatting, and remove unnecessary whitespace while maintaining semantic relationships.
32
+
Compress repeated context (common in RAG), condense verbose formatting, and remove unnecessary elements while maintaining semantic relationships.
28
33
</Step>
29
34
30
35
<Steptitle="Instruction Preservation">
@@ -50,7 +55,10 @@ The **compression ratio** (sometimes called *compression rate* in APIs) is **com
50
55
In the console you choose **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)**. The compressor aims for that ratio; the actual ratio per request may vary. Strong (0.7) asks for more compression; Light (0.9) is more conservative and keeps more of the original text.
51
56
52
57
<Tip>
53
-
**Ratio vs reduction:** Ratio = compressed/original (e.g. 0.75). Reduction = 1 − ratio (e.g. 25%). When we say "50% reduction," that corresponds to a ratio of 0.50.
58
+
**Ratio vs reduction:**
59
+
60
+
Ratio = compressed/original (e.g. 0.75). <br />
61
+
Reduction = 1 − ratio (e.g. 25%). When we say "50% reduction," that corresponds to a ratio of 0.50.
54
62
</Tip>
55
63
56
64
## Semantic preservation and BERT score
@@ -66,13 +74,13 @@ This way you can allow aggressive compression (low ratio) while still guaranteei
66
74
In the Activity table, when we fell back to the original prompt because the similarity was below the threshold, the input token count is shown in red with a tooltip: "Didn't match the semantic threshold – original prompt was used."
67
75
</Tip>
68
76
69
-
## Enabling Token Compression
77
+
## Enabling Agentic Token Compression
70
78
71
-
Token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level:
79
+
Agentic token compression can be enabled in three ways, giving you flexibility to control compression at the request, API key, or organization level.
72
80
73
-
### 1. Per Request (SDK)
81
+
### 1. Per Request (SDK or Headers)
74
82
75
-
Enable compression for specific requests using the SDK:
83
+
Enable compression for specific requests using the SDK or headers:
76
84
77
85
<Tabs>
78
86
<Tabtitle="TypeScript">
@@ -126,47 +134,34 @@ Enable compression for specific requests using the SDK:
126
134
letresponse=client.send("gpt-5.2", input).await?;
127
135
```
128
136
</Tab>
137
+
138
+
<Tabtitle="cURL">
139
+
```bash
140
+
curl -X POST "https://api.edgee.ai/v1/chat/completions" \
Enable compression for specific API keys in your organization settings. This is useful when you want different compression settings for different applications or environments.
134
153
135
154
<Frame>
136
-
<imgsrc="/images/compression-enabled-by-tag-light.png"alt="Enable compression for specific API keys"className="dark:hidden" />
137
-
<imgsrc="/images/compression-enabled-by-tag-dark.png"alt="Enable compression for specific API keys"className="hidden dark:block" />
138
-
</Frame>
139
-
140
-
In the **Edge Models** section of your console:
141
-
1. Toggle **Enable token compression** on
142
-
2. Set **Compression** to **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)** — see [Understanding compression ratio](#understanding-compression-ratio)
143
-
3. Set **Semantic preservation threshold** to **Off**, **Ultra Safe (0.95)**, **Safe (0.85)**, or **Edgy (0.75)** — see [Semantic preservation and BERT score](#semantic-preservation-and-bert-score)
144
-
4. Under **Scope**, select **Apply to specific API keys**
145
-
5. Choose which API keys should use compression
146
-
147
-
### 3. Organization-Wide (Console)
148
-
149
-
Enable compression for all requests across your entire organization. This is the recommended setting for most users to maximize savings automatically.
<imgsrc="/images/compression-by-tag-light.png"alt="Enable compression for specific API keys"className="dark:hidden" />
156
+
<imgsrc="/images/compression-by-tag-dark.png"alt="Enable compression for specific API keys"className="hidden dark:block" />
154
157
</Frame>
155
158
156
159
In the **Edge Models** section of your console:
157
-
1. Toggle **Enable token compression** on
158
-
2. Set **Compression** to **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)**
159
-
3. Set **Semantic preservation threshold** to **Off**, **Ultra Safe (0.95)**, **Safe (0.85)**, or **Edgy (0.75)**
160
-
4. Under **Scope**, select **Apply to all org requests**
161
-
5. All API keys will now use compression by default
160
+
1. Set **Compression** to **Light (0.9)**, **Medium (0.8)**, or **Strong (0.7)** — see [Understanding compression ratio](#understanding-compression-ratio)
161
+
2. Set **Semantic preservation threshold** to **Off**, **Ultra Safe (0.95)**, **Safe (0.85)**, or **Edgy (0.75)** — see [Semantic preservation and BERT score](#semantic-preservation-and-bert-score)
162
+
3. Under **Scope**, select **Apply to specific API keys**
163
+
4. Choose which API keys should use compression
162
164
163
-
<Tip>
164
-
**Compression** controls how aggressively Edgee compresses prompts: **Strong (0.7)** aims for more compression; **Light (0.9)** is more conservative. **Medium (0.8)** is the default. See [Understanding compression ratio](#understanding-compression-ratio).
165
-
</Tip>
166
-
167
-
<Note>
168
-
SDK-level configuration takes precedence over console settings. If you enable compression in your code with `enable_compression: true`, it will override the console configuration for that specific request.
169
-
</Note>
170
165
171
166
## When It Works Best
172
167
@@ -242,56 +237,19 @@ Compression time: 14ms
242
237
243
238
## Real-World Savings
244
239
245
-
Here's what token compression means for your monthly AI bill:
240
+
Here's what token compression means for your monthly AI bill with 50% compression:
246
241
247
-
| Use Case | Monthly Requests | Without Edgee | With Edgee (50% compression) |**Monthly Savings**|
Savings calculations use list pricing for GPT-4o ($5/1M input tokens), Claude 3.5 Sonnet ($3/1M input tokens), and GPT-4o-mini ($0.15/1M input tokens). Actual compression ratios vary by use case.
256
-
</Note>
257
248
258
-
## Best Practices
259
-
260
-
<AccordionGroup>
261
-
<Accordiontitle="Optimize prompts for compression">
262
-
- Structure RAG contexts with clear sections
263
-
- Use consistent formatting in document chunks
264
-
- Avoid excessive whitespace in system prompts
265
-
- Group similar information together
266
-
</Accordion>
267
-
268
-
<Accordiontitle="Track savings over time">
269
-
- Monitor `compression.saved_tokens` and `compression.cost_savings` across requests
270
-
- Use `compression.reduction` to gauge effectiveness per request
271
-
- Calculate cumulative savings weekly or monthly
272
-
- Use observability tools to identify high-compression opportunities
273
-
- Compare costs across different use cases
274
-
</Accordion>
275
-
276
-
<Accordiontitle="Configure compression per use case">
277
-
- Enable compression by default for all requests
278
-
- Compression happens automatically without configuration
description: Use your own provider keys for billing and provider-side controls, while keeping Edgee's routing and observability.
5
+
icon: key-round
6
+
---
7
+
8
+
BYOK lets you register your own LLM provider API keys with Edgee. Requests are routed through your key, so they're billed to
9
+
your provider account, subject to your own rate limits and provider controls, while Edgee's routing, compression, and observability continue to work normally.
10
+
Keys are stored securely and masked immediately after creation.
11
+
12
+
## Supported Providers
13
+
14
+
| Provider | Credential Type |
15
+
|----------|----------------|
16
+
| Anthropic | API Key |
17
+
| OpenAI | API Key |
18
+
| Google Vertex AI | Service account JSON |
19
+
| Mistral | API Key |
20
+
| DeepSeek | API Key |
21
+
| xAI | API Key |
22
+
| zAI | API Key |
23
+
| AWS Bedrock | IAM credentials per region |
24
+
25
+
## Adding a Provider Key
26
+
27
+
Navigate to **BYOK** in the console sidebar. Click **Create a new Provider Key**. Fill in the form:
28
+
29
+
**1. Provider**: select from the list above.
30
+
31
+
**2. Name**: a friendly label for this key (e.g. `Production Key`).
32
+
33
+
**3. Credentials**: varies by provider:
34
+
35
+
-**Most providers** (Anthropic, OpenAI, Mistral, DeepSeek, xAI, zAI): enter a single API key (e.g. `sk-...`).
36
+
-**AWS Bedrock**: enter IAM credentials per region. For each region, provide:
37
+
-**Region**: (e.g. `us-east-1`, or `global` for a single fallback credential set)
38
+
-**Access Key ID**: (e.g. `AKIA...`)
39
+
-**Secret Access Key**:
40
+
41
+
Click **Add region** to configure multiple regions in one entry.
42
+
-**Google Vertex AI**: paste the full service account JSON key file downloaded from Google Cloud Console.
43
+
44
+
**4. Test credentials**: click **Test credentials** to validate your credentials against the provider before saving.
45
+
46
+
**5. Assignment**: choose how the key is used:
47
+
- Check **Assign to entire organization** to use this key as the default for all requests, across all your Edgee API keys.
48
+
- Or select one or more **specific API keys** from the multi-select to limit the key to those keys only.
49
+
-**Save**: click **Save** to create the provider key.
50
+
51
+
<Note>
52
+
Keys are stored securely and never revealed after creation. When editing a key, leave the credentials field empty to keep the existing key unchanged.
53
+
</Note>
54
+
55
+
<Frame>
56
+
<imgsrc="/images/byok-create-light.png"alt="Create a provider key"className="dark:hidden" />
57
+
<imgsrc="/images/byok-create-dark.png"alt="Create a provider key"className="hidden dark:block" />
58
+
</Frame>
59
+
60
+
## Assignment: Organization vs. API Keys
61
+
62
+
Provider keys can be scoped in two ways:
63
+
64
+
-**Organization (all keys)**: the provider key is used as the default for every request sent through your organization, regardless of which Edgee API key is used. Shown as a purple **Organization (all keys)** badge in the table.
65
+
-**Specific API keys**: the provider key only applies to the selected Edgee API keys. Shown as blue per-key badges.
66
+
67
+
This lets you use different provider accounts for different environments or projects, for example, one OpenAI key for staging and another for production.
68
+
69
+
## AWS Bedrock: Multi-Region Setup
70
+
71
+
AWS Bedrock requires IAM credentials per region. You can configure multiple regions in a single provider key entry,
72
+
add one row per region, each with its own Access Key ID and Secret Access Key. Use `global` as the region name to define a single fallback credential set.
0 commit comments