feat: add newer model tts chatterbox to ai-runner text-to-speech #647
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the feature for
text-to-speechusing newer ResembleAI/Chatterbox which as voice cloning feature.So as not to cause change in the go-livepeer it uses the same
application/jsonformat to relay the information between orchestrator and runner.You can build the
text-to-speechdocker image from theDockerfile.text-to-speechsimilar to any other images.Docker build -t livepeer/ai-runner:text-to-speech -f Dockerfile.text-to.speech .for testing purpose you can access the uvicorn server after starting the container with the pipeline and model
docker run --name text-to-speech -e PIPELINE=text-to-speech -e MODEL_ID=chatterbox --gpus all -p 8000:8000 -v ./models:/models <docker-image-build-above>In the uvicorn server, you will see params for model_id, text and prompt_audio_base64
model_id: ResembleAI/Chatterbox
text:
prompt_audio_base64:
The text converted to base64 looks like
UklGRjiTIQBXQVZ.....AADUAADQAADAAACkAAA==Here is a attached file with the base64 converted audio from my voice.
audio_prompt_base64.txt