Skip to content

Commit f694d83

Browse files
committed
Merge branch 'main' of github.com:GetStream/Vision-Agents into rag_and_sip
2 parents b2ebaee + 7f758e3 commit f694d83

File tree

34 files changed

+4647
-1317
lines changed

34 files changed

+4647
-1317
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Get a free API key from [Stream](https://getstream.io/). Developers receive **33
106106

107107
| **Plugin Name** | **Description** | **Docs Link** |
108108
|-------------|-------------|-----------|
109-
| AWS Polly | TTS plugin using Amazon's cloud-based service with natural-sounding voices and neural engine support | [AWS Polly](https://visionagents.ai/integrations/aws-polly) |
109+
| AWS | AWS (Bedrock) integration with support for standard LLM (Qwen, Claude with vision), realtime with Nova 2 Sonic, and TTS with AWS Polly | [AWS](https://visionagents.ai/integrations/aws) |
110110
| Cartesia | TTS plugin for realistic voice synthesis in real-time voice applications | [Cartesia](https://visionagents.ai/integrations/cartesia) |
111111
| Decart | Real-time video restyling capabilities using generative AI models | [Decart](https://visionagents.ai/integrations/decart) |
112112
| Deepgram | STT plugin for fast, accurate real-time transcription with speaker diarization | [Deepgram](https://visionagents.ai/integrations/deepgram) |
@@ -225,7 +225,7 @@ While building the integrations, here are the limitations we've noticed (Dec 202
225225
* Longer videos can cause the AI to lose context. For instance if it's watching a soccer match it will get confused after 30 seconds
226226
* Most applications require a combination of small specialized models like Yolo/Roboflow/Moondream, API calls to get more context and larger models like gemini/openAI
227227
* Image size & FPS need to stay relatively low due to performance constraints
228-
* Video doesn’t trigger responses in realtime models. You always need to send audio/text to trigger a response.
228+
* Video doesn’t trigger responses in realtime models. You always need to send audio/text to trigger a response.
229229

230230
## Star History
231231

agents-core/vision_agents/core/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33
from vision_agents.core.agents import Agent
44

55
from vision_agents.core.cli.cli_runner import cli
6+
from vision_agents.core.agents.agent_launcher import AgentLauncher
67

7-
8-
__all__ = ["Agent", "User", "cli"]
8+
__all__ = ["Agent", "User", "cli", "AgentLauncher"]

agents-core/vision_agents/core/agents/agents.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,8 @@ class Agent:
9999
- Small methods so its easy to subclass/change behaviour
100100
"""
101101

102+
options: AgentOptions
103+
102104
def __init__(
103105
self,
104106
# edge network for video & audio

agents-core/vision_agents/core/vad/silero.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ def __init__(self, model_path: str, reset_interval_seconds: float = 5.0):
3838

3939
def predict_speech(self, pcm: PcmData):
4040
# convert from pcm to the right format for silero
41+
4142
chunks = pcm.resample(16000, 1).to_float32().chunks(SILERO_CHUNK, pad_last=True)
4243
scores = [self._predict_speech(c.samples) for c in chunks]
4344
return max(scores)

dev.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def format():
7777
def lint():
7878
"""Run ruff linting (check only)."""
7979
click.echo("Running ruff lint...")
80-
run("uv run ruff check .")
80+
run("uv run ruff format --check .")
8181

8282

8383
@cli.command()
@@ -103,7 +103,8 @@ def check():
103103

104104
# Run ruff
105105
click.echo("\n=== 1. Ruff Linting ===")
106-
run("uv run ruff check . --fix")
106+
run("uv run ruff format")
107+
run("uv run ruff format --check .")
107108

108109
# Run mypy on main package
109110
click.echo("\n=== 2. MyPy Type Checking ===")

examples/01_simple_agent_example/README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,17 @@ This example shows you how to build a basic video AI agent using [Vision Agents]
1919

2020
## Installation
2121

22-
1. Install dependencies using uv:
22+
1. Go to the example's directory
23+
```bash
24+
cd examples/01_simple_agent_example
25+
```
26+
27+
2. Install dependencies using uv:
2328
```bash
2429
uv sync
2530
```
2631

27-
2. Create a `.env` file with your API keys:
32+
3. Create a `.env` file with your API keys:
2833
```
2934
OPENAI_API_KEY=your_openai_key
3035
ELEVENLABS_API_KEY=your_11labs_key

examples/01_simple_agent_example/simple_agent_example.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,7 @@
33

44
from dotenv import load_dotenv
55

6-
from vision_agents.core import User, Agent, cli
7-
from vision_agents.core.agents import AgentLauncher
6+
from vision_agents.core import User, Agent, cli, AgentLauncher
87
from vision_agents.core.utils.examples import get_weather_by_location
98
from vision_agents.plugins import deepgram, getstream, gemini, elevenlabs
109

examples/02_golf_coach_example/README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,17 @@ This approach combines a fast object detection model (YOLO) with a full realtime
2121

2222
## Installation
2323

24-
1. Install dependencies using uv:
24+
1. Go to the example's directory
25+
```bash
26+
cd examples/02_golf_coach_example
27+
```
28+
29+
2. Install dependencies using uv:
2530
```bash
2631
uv sync
2732
```
2833

29-
2. Create a `.env` file with your API keys:
34+
3. Create a `.env` file with your API keys:
3035
```
3136
GEMINI_API_KEY=your_gemini_key
3237
STREAM_API_KEY=your_stream_key

examples/other_examples/09_github_mcp_demo/README.md

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,19 +38,40 @@ export GITHUB_PAT=your_github_personal_access_token_here
3838
```
3939

4040
## Running the Demo
41+
42+
1. Go to the example's directory
43+
```bash
44+
cd examples/other_examples/09_github_mcp_demo
45+
```
46+
47+
2. Install dependencies using uv:
48+
```bash
49+
uv sync
50+
```
51+
52+
3. Run the agent
4153
```bash
42-
cd examples/09_github_mcp_demo
4354
uv run python github_mcp_demo.py
4455
```
4556

46-
### Gemini Realtime Version (New)
57+
### Gemini Realtime Version
4758
```bash
48-
cd examples/09_github_mcp_demo
59+
cd examples/other_examples/09_github_mcp_demo
4960
uv run python gemini_realtime_github_mcp_demo.py
5061
```
5162

5263
**Note**: The Gemini Realtime version requires `GOOGLE_API_KEY` in your `.env` file in addition to `GITHUB_PAT`.
5364

65+
66+
### OpenAI Realtime Version
67+
68+
```bash
69+
cd examples/other_examples/09_github_mcp_demo
70+
uv run python openai_realtime_github_mcp_demo.py
71+
```
72+
73+
**Note**: The OpenAI Realtime version requires `OPENAI_API_KEY` in your `.env` file in addition to `GITHUB_PAT`.
74+
5475
## What the Demo Does
5576

5677

examples/other_examples/openai_realtime_webrtc/README.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# OpenAI Speech-to-Speech (STS) Example
1+
# OpenAI Realtime WebRTC Example
22

33
This example demonstrates how to use OpenAI's Realtime API for speech-to-speech conversation through WebRTC.
44

@@ -18,24 +18,27 @@ The OpenAI Realtime API enables real-time, bidirectional audio conversations wit
1818
3. Python 3.12 or higher
1919

2020
## Setup
21-
22-
1. Install dependencies:
23-
```bash
24-
cd examples/07_openai_sts_example
25-
uv sync
26-
```
27-
28-
2. Create a `.env` file with your credentials:
29-
```env
30-
OPENAI_API_KEY=your_openai_api_key
31-
STREAM_API_KEY=your_stream_api_key
32-
STREAM_API_SECRET=your_stream_api_secret
33-
```
21+
1. Go to the example's directory
22+
```bash
23+
cd examples/other_examples/openai_realtime_webrtc
24+
```
25+
26+
2. Install dependencies:
27+
```bash
28+
uv sync
29+
```
30+
31+
3. Create a `.env` file with your credentials:
32+
```env
33+
OPENAI_API_KEY=your_openai_api_key
34+
STREAM_API_KEY=your_stream_api_key
35+
STREAM_API_SECRET=your_stream_api_secret
36+
```
3437

3538
## Running the Example
3639

3740
```bash
38-
uv run python openai_sts_example.py
41+
uv run python openai_realtime_example.py
3942
```
4043

4144
The script will:

0 commit comments

Comments
 (0)