Skip to content

Conversation

@JJassonn69
Copy link
Contributor

@JJassonn69 JJassonn69 commented Jun 23, 2025

This PR adds the feature for text-to-speech using newer ResembleAI/Chatterbox which as voice cloning feature.

So as not to cause change in the go-livepeer it uses the same application/json format to relay the information between orchestrator and runner.

You can build the text-to-speech docker image from the Dockerfile.text-to-speech similar to any other images.

Docker build -t livepeer/ai-runner:text-to-speech -f Dockerfile.text-to.speech .

for testing purpose you can access the uvicorn server after starting the container with the pipeline and model

docker run --name text-to-speech -e PIPELINE=text-to-speech -e MODEL_ID=chatterbox --gpus all -p 8000:8000 -v ./models:/models <docker-image-build-above>

In the uvicorn server, you will see params for model_id, text and prompt_audio_base64

model_id: ResembleAI/Chatterbox
text:
prompt_audio_base64:

The text converted to base64 looks like UklGRjiTIQBXQVZ.....AADUAADQAADAAACkAAA==

Here is a attached file with the base64 converted audio from my voice.
audio_prompt_base64.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant