This project provides a reference implementation of a WebSocket server on Azure that integrates with the Genesys AudioHook protocol for real-time transcription. It implements the AudioHook Monitor, where audio is streamed from the client to the server, and the server does not return results to the client.
This AudioHook server enables you to connect your own speech processing pipeline—such as Azure AI (Custom) Speech, GPT-4o Transcribe, Whisper, or other services—while maintaining data security and seamless integration with cloud-native applications.
Real-time transcription enables advanced call center analytics, such as live summarization, agent coaching, and instant question answering, to improve customer experience and operational efficiency.
Note
This repository accelerates integration between Genesys Cloud and Azure for demonstration and development purposes. It is not production-ready; carefully review, test, and adapt it to meet your organization's security, compliance, and operational requirements before production deployment.
The AudioHook processing is separated from AI services, allowing flexible deployment, strong security, and straightforward integration with enterprise systems. Its modular architecture supports horizontal scaling and rapid customization of the AI pipeline to meet evolving contact center requirements.
- AudioHook WebSocket server (Python)
- Real-time AI processing service (Python, Semantic Kernel) (coming soon)
- Demo front-end (JavaScript, React) with back-end (Python) (coming soon)
This accelerator offers a straightforward architecture to help you quickly integrate Genesys AudioHook with Azure for real-time transcription. The initial setup is designed for simplicity and ease of deployment, making it well-suited for demos and development.
Basic architecture for real-time transcription
For production use, consider extending the architecture to improve scalability, security, and reliability. Enhancements may include adding Azure Web Application Firewall (WAF), using event-driven processing with Azure Event Hubs or Service Bus, deploying components as containerized workloads and enabling monitoring with Azure Monitor.
Production-ready architecture with enhanced security and scalability
This modular design lets you tailor the solution to your needs, supporting advanced scenarios and integration with enterprise systems.
Deploy this accelerator using the provided infrastructure-as-code (Bicep) templates. The recommended method is the Azure Developer CLI (azd), which simplifies authentication, resource provisioning, and configuration.
-
Authenticate with Azure by running:
azd auth login
This opens a browser window for secure sign-in.
-
Create a new environment with:
azd env new
-
(optional) At this stage, you can customize your deployment by setting environment variables. You can configure the following settings:
azd env set SPEECH_PROVIDER <option> azd env set AZURE_SPEECH_LANGUAGES <locale(s)>
Parameter Default Options / Description SPEECH_PROVIDERazure-ai-speechChoose the speech-to-text provider: azure-ai-speechorazure-openai-gpt4o-transcribe.AZURE_SPEECH_LANGUAGESen-USSpecify one or more supported locales (comma-separated, e.g. en-US,nl-NL). See the full list of supported languages. When multiple locales are set, automatic language identification is enabled. -
Deploy resources with:
azd up
-
During deployment, you’ll be prompted for:
Parameter Description Azure Subscription The Azure subscription for resource deployment. Azure Location The Azure region for resources Environment Name A unique environment name (used as a prefix for resource names). For best compatibility, use
swedencentralas your Azure region. Other regions may not be fully supported or tested. -
After deployment, the CLI will display a link to your web service. Open it in your browser, you should see
{"status": "healthy"}to confirm the service is running.
Important
The default infrastructure templates use public networking. For production, secure your deployment with Azure Front Door, Azure Web Application Firewall (WAF), or restrict access to Genesys Cloud IP ranges, as Genesys Cloud requires a publicly accessible endpoint.
Once your web service is running, set up the AudioHook Monitor in Genesys Cloud to stream audio to your Azure deployment. If you do not have access to a Genesys Cloud instance, you can skip this step and use the Genesys mock client for testing later.
-
Follow the Genesys configuration guide.
-
Use the Connection URI output after deployment, or take the web service URL from step 4, replace
httpswithwss, and append/audiohook/wsas the path. -
Select the Credentials tab. Here, you must provide the API Key and Client Secret. In the Azure Portal, go to your deployed resource group and open the Key Vault. Under Objects > Secrets, locate the API Key and Client Secret.
These secrets are generated automatically during deployment, but for security, it is recommended to update them with your own values. Ensure the Client Secret is a BASE64-encoded string.
Note
If you cannot view the secrets, go to Access control (IAM) in the Key Vault and assign yourself the Key Vault Secrets Officer role.
- Activate your AudioHook.
-
Place a call to the queue where the AudioHook Monitor is enabled or leverage the mock client provided by Genesys.
-
Open your deployed web service in a browser. The following endpoints are available to check conversation status:
/api/conversations?key={API_KEY}&active=false|true /api/conversation/{CONVERSATION_ID}?key={API_KEY} -
Confirm that the call audio is being transcribed as expected.
When you no longer need the resources created in this article, run the following command to power down the app:
azd downIf you want to redeploy to a different region, delete the .azure directory before running azd up again. In a more advanced scenario, you could selectively edit files within the .azure directory to change the region.