You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,7 +106,7 @@ Get a free API key from [Stream](https://getstream.io/). Developers receive **33
106
106
107
107
|**Plugin Name**|**Description**|**Docs Link**|
108
108
|-------------|-------------|-----------|
109
-
| AWS Polly | TTS plugin using Amazon's cloud-based service with natural-sounding voices and neural engine support |[AWS Polly](https://visionagents.ai/integrations/aws-polly)|
109
+
| AWS | AWS (Bedrock) integration with support for standard LLM (Qwen, Claude with vision), realtime with Nova 2 Sonic, and TTS with AWS Polly |[AWS](https://visionagents.ai/integrations/aws)|
110
110
| Cartesia | TTS plugin for realistic voice synthesis in real-time voice applications |[Cartesia](https://visionagents.ai/integrations/cartesia)|
111
111
| Decart | Real-time video restyling capabilities using generative AI models |[Decart](https://visionagents.ai/integrations/decart)|
112
112
| Deepgram | STT plugin for fast, accurate real-time transcription with speaker diarization |[Deepgram](https://visionagents.ai/integrations/deepgram)|
@@ -225,7 +225,7 @@ While building the integrations, here are the limitations we've noticed (Dec 202
225
225
* Longer videos can cause the AI to lose context. For instance if it's watching a soccer match it will get confused after 30 seconds
226
226
* Most applications require a combination of small specialized models like Yolo/Roboflow/Moondream, API calls to get more context and larger models like gemini/openAI
227
227
* Image size & FPS need to stay relatively low due to performance constraints
228
-
* Video doesn’t trigger responses in realtime models. You always need to send audio/text to trigger a response.
228
+
* Video doesn’t trigger responses in realtime models. You always need to send audio/text to trigger a response.
Copy file name to clipboardExpand all lines: plugins/aws/README.md
+79-16Lines changed: 79 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,43 +1,101 @@
1
1
# AWS Plugin for Vision Agents
2
2
3
-
AWS (Bedrock) LLM integration for Vision Agents framework with support for both standard and realtime interactions.
3
+
AWS (Bedrock) integration for Vision Agents framework with support for standard LLM, realtime with Nova Sonic, and text-to-speech with automatic session resumption.
4
4
5
5
## Installation
6
6
7
7
```bash
8
-
pip install vision-agents-plugins-aws
8
+
uv add vision-agents[aws]
9
9
```
10
10
11
11
## Usage
12
12
13
13
### Standard LLM Usage
14
14
15
-
This example shows how to use qwen3 on bedrock for the LLM.
15
+
The AWS plugin supports various Bedrock models including Qwen, Claude, and others. Claude models also support vision/image inputs.
16
16
17
17
```python
18
+
from vision_agents.core import Agent, User
19
+
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
The Realtime implementation **fully supports** function calling with AWS Nova Sonic. Register functions using the `@llm.register_function` decorator:
131
+
The Realtime implementation **fully supports** function calling with AWS Nova 2 Sonic. Register functions using the `@llm.register_function` decorator:
0 commit comments