-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Telnyx has a Speech-to-Text endpoint that will begin listening to a current call control, and then send webhooks as it transcribes what it hears, with a confidence level. It continues to transcribe until commanded to stop (via REST) or on call hang-up.
This seems pretty powerful, especially coupled with the Elsa Agents module. Telnyx themselves have some pretty powerful IVR Flow demos using this exact API.
Transcription Start documentation
Transcription Stop documentation
The only complexity I think is the "gathering" of the transcription. The webhook will eventually send an is_final flag when the transcription is considered to be finished with a chunk of speech, at which point a workflow author could use that flow outcome to end the transcription and use that output to do other things? So I think that means the activity suspends until a webhook comes in and says "is_final" for the transcription? Open to ideas there, but thats at least how I would use it :)
If thats the case, I think all that needs to be done is:
- Extend the ICallsApi to include Refit config for these 2 endpoints
- Create a new stim for Call Transcription (i.e CallTranscriptionStimulus)
- Create a new INotificationHandler for
call.transcriptionwebhooks. It doesn't fire offIStimulusSenderuntil the payload hasis_final? - Transcription Start activity
- Call Control Id
- Language
- Transcription Model:
[openai/whisper-tiny, openai/whisper-large-v3-turbo],default: openai/whisper-tiny - Output: Transcription string
- Transcription Stop activity
- Call Control Id
NOTE:
Technically, their REST allows you to use Google transcription. Idk if maybe that would be an entirely different activity, since the transcription properties vary widely between Telnyx and Google? Or just combine it all into one activity and the activity author would need to understand the differences. For now my vote would be to have a single transcription activity that defaults to Telnyx transcription, that is their recommendation anyway
Happy to work on this if the approach is good!