Skip to content

Commit 7f758e3

Browse files
authored
Update plugin readme (#241)
* Update readme with latest sample * Update root table
1 parent 765a57e commit 7f758e3

File tree

2 files changed

+81
-18
lines changed

2 files changed

+81
-18
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Get a free API key from [Stream](https://getstream.io/). Developers receive **33
106106

107107
| **Plugin Name** | **Description** | **Docs Link** |
108108
|-------------|-------------|-----------|
109-
| AWS Polly | TTS plugin using Amazon's cloud-based service with natural-sounding voices and neural engine support | [AWS Polly](https://visionagents.ai/integrations/aws-polly) |
109+
| AWS | AWS (Bedrock) integration with support for standard LLM (Qwen, Claude with vision), realtime with Nova 2 Sonic, and TTS with AWS Polly | [AWS](https://visionagents.ai/integrations/aws) |
110110
| Cartesia | TTS plugin for realistic voice synthesis in real-time voice applications | [Cartesia](https://visionagents.ai/integrations/cartesia) |
111111
| Decart | Real-time video restyling capabilities using generative AI models | [Decart](https://visionagents.ai/integrations/decart) |
112112
| Deepgram | STT plugin for fast, accurate real-time transcription with speaker diarization | [Deepgram](https://visionagents.ai/integrations/deepgram) |
@@ -225,7 +225,7 @@ While building the integrations, here are the limitations we've noticed (Dec 202
225225
* Longer videos can cause the AI to lose context. For instance if it's watching a soccer match it will get confused after 30 seconds
226226
* Most applications require a combination of small specialized models like Yolo/Roboflow/Moondream, API calls to get more context and larger models like gemini/openAI
227227
* Image size & FPS need to stay relatively low due to performance constraints
228-
* Video doesn’t trigger responses in realtime models. You always need to send audio/text to trigger a response.
228+
* Video doesn’t trigger responses in realtime models. You always need to send audio/text to trigger a response.
229229

230230
## Star History
231231

plugins/aws/README.md

Lines changed: 79 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,101 @@
11
# AWS Plugin for Vision Agents
22

3-
AWS (Bedrock) LLM integration for Vision Agents framework with support for both standard and realtime interactions.
3+
AWS (Bedrock) integration for Vision Agents framework with support for standard LLM, realtime with Nova Sonic, and text-to-speech with automatic session resumption.
44

55
## Installation
66

77
```bash
8-
pip install vision-agents-plugins-aws
8+
uv add vision-agents[aws]
99
```
1010

1111
## Usage
1212

1313
### Standard LLM Usage
1414

15-
This example shows how to use qwen3 on bedrock for the LLM.
15+
The AWS plugin supports various Bedrock models including Qwen, Claude, and others. Claude models also support vision/image inputs.
1616

1717
```python
18+
from vision_agents.core import Agent, User
19+
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
20+
1821
agent = Agent(
1922
edge=getstream.Edge(),
2023
agent_user=User(name="Friendly AI"),
2124
instructions="Be nice to the user",
22-
llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
25+
llm=aws.LLM(
26+
model="qwen.qwen3-32b-v1:0",
27+
region_name="us-east-1"
28+
),
2329
tts=cartesia.TTS(),
2430
stt=deepgram.STT(),
2531
turn_detection=smart_turn.TurnDetection(buffer_duration=2.0, confidence_threshold=0.5),
2632
)
2733
```
2834

29-
The full example is available in example/aws_qwen_example.py
35+
For vision-capable models like Claude:
36+
37+
```python
38+
llm = aws.LLM(
39+
model="anthropic.claude-3-haiku-20240307-v1:0",
40+
region_name="us-east-1"
41+
)
42+
43+
# Send image with text
44+
response = await llm.converse(
45+
messages=[{
46+
"role": "user",
47+
"content": [
48+
{"image": {"format": "png", "source": {"bytes": image_bytes}}},
49+
{"text": "What do you see in this image?"}
50+
]
51+
}]
52+
)
53+
```
3054

3155
### Realtime Audio Usage
3256

33-
Nova Sonic audio realtime STS is also supported:
57+
AWS Nova 2 Sonic provides realtime speech-to-speech capabilities with automatic reconnection logic. The default model is `amazon.nova-2-sonic-v1:0`.
58+
59+
```python
60+
from vision_agents.core import Agent, User
61+
from vision_agents.plugins import aws, getstream
3462

35-
```python
3663
agent = Agent(
3764
edge=getstream.Edge(),
3865
agent_user=User(name="Story Teller AI"),
3966
instructions="Tell a story suitable for a 7 year old about a dragon and a princess",
40-
llm=aws.Realtime(),
67+
llm=aws.Realtime(
68+
model="amazon.nova-2-sonic-v1:0",
69+
region_name="us-east-1",
70+
voice_id="matthew" # See available voices in AWS Nova documentation
71+
),
72+
)
73+
```
74+
75+
The Realtime implementation includes automatic reconnection logic that reconnects after periods of silence or when approaching connection time limits.
76+
77+
See `example/aws_realtime_nova_example.py` for a complete example.
78+
79+
### Text-to-Speech (TTS)
80+
81+
AWS Polly TTS is available for converting text to speech:
82+
83+
```python
84+
from vision_agents.plugins import aws
85+
86+
tts = aws.TTS(
87+
region_name="us-east-1",
88+
voice_id="Joanna", # AWS Polly voice ID
89+
engine="neural", # 'standard' or 'neural'
90+
text_type="text", # 'text' or 'ssml'
91+
language_code="en-US"
92+
)
93+
94+
# Use in agent
95+
agent = Agent(
96+
llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
97+
tts=tts,
98+
# ... other components
4199
)
42100
```
43101

@@ -70,14 +128,15 @@ def get_weather(city: str) -> dict:
70128

71129
### Realtime (aws.Realtime)
72130

73-
The Realtime implementation **fully supports** function calling with AWS Nova Sonic. Register functions using the `@llm.register_function` decorator:
131+
The Realtime implementation **fully supports** function calling with AWS Nova 2 Sonic. Register functions using the `@llm.register_function` decorator:
74132

75133
```python
76134
from vision_agents.plugins import aws
77135

78136
llm = aws.Realtime(
79-
model="amazon.nova-sonic-v1:0",
80-
region_name="us-east-1"
137+
model="amazon.nova-2-sonic-v1:0",
138+
region_name="us-east-1",
139+
voice_id="matthew"
81140
)
82141

83142
@llm.register_function(
@@ -97,19 +156,23 @@ def get_weather(city: str) -> dict:
97156

98157
See `example/aws_realtime_function_calling_example.py` for a complete example.
99158

100-
## Running the examples
159+
## Configuration
101160

102-
Create a `.env` file, or cp .env.example to .env and fill in
161+
### Environment Variables
162+
163+
Create a `.env` file with the following variables:
103164

104165
```
105166
STREAM_API_KEY=your_stream_api_key_here
106167
STREAM_API_SECRET=your_stream_api_secret_here
107168
108-
AWS_BEARER_TOKEN_BEDROCK=
169+
AWS_BEDROCK_API_KEY=
109170
AWS_ACCESS_KEY_ID=
110171
AWS_SECRET_ACCESS_KEY=
172+
AWS_REGION=us-east-1
111173
112-
FAL_KEY=
113174
CARTESIA_API_KEY=
114175
DEEPGRAM_API_KEY=
115-
```
176+
```
177+
178+
Make sure your `.env` file is configured before running the examples.

0 commit comments

Comments
 (0)