A modular command-line interface for text-to-speech synthesis, supporting multiple TTS engines. The CLI handles text chunking, parallel processing, and provides a unified interface across different TTS services.
- Supports multiple TTS engines (currently OpenAI and Kokoro)
- Automatic text chunking with configurable chunk sizes
- Parallel processing with multiple workers
- Cost estimation and confirmation for paid services
- Modular design for easy addition of new TTS engines
git clone https://github.com/CarsonDavis/antique-tts.git
cd antique-tts
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtFor OpenAI TTS, you'll need to set your API key:
# Linux/MacOS
export OPENAI_API_KEY='your-api-key-here'
# Windows (PowerShell)
$env:OPENAI_API_KEY='your-api-key-here'
# Windows (Command Prompt)
set OPENAI_API_KEY=your-api-key-hereGet your API key from: https://platform.openai.com/account/api-keys
Convert text to speech using default settings (Kokoro engine):
python cli.py input.txt --output-dir ./output_audiopython cli.py input.txt --output-dir ./output_audio --engine kokoro --voice am_michael --speed 1.2python cli.py input.txt --output-dir ./output_audio --engine openai --voice alloy
Estimated cost: $0.53
Do you want to proceed? (y/N): y| Parameter | Description | Default |
|---|---|---|
--output-dir |
Output directory for audio files | output |
--chunk-size |
Maximum characters per chunk | 4000 |
--max-workers |
Number of parallel workers | 4 |
| Parameter | Description | Default |
|---|---|---|
--lang-code |
Language code for synthesis | "a" |
--speed |
Speech speed multiplier | 1.0 |
--voice |
Voice to use af_bella, af_nicole, af_sarah, af_sky, bf_emma, bf_isabella, am_adam, am_michael, bm_george, bm_lewis |
"am_michael" |
| Parameter | Description | Default |
|---|---|---|
--model |
OpenAI TTS model | "tts-1-hd" |
--voice |
Voice to use: alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer |
"alloy" |
--response-format |
Audio format for output | "wav" |
Available OpenAI voices: alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer
usage: cli.py [-h] [--engine {kokoro,openai}] [--output-dir OUTPUT_DIR]
[--chunk-size CHUNK_SIZE] [--max-workers MAX_WORKERS]
[--lang-code LANG_CODE] [--speed SPEED] [--voice VOICE]
[--model MODEL] [--response-format RESPONSE_FORMAT]
input_fileSee all available settings:
python cli.py --helpThe project is designed to be easily extensible. To add a new TTS engine:
- Create a new engine configuration class in
tts_engine/config.py - Create a new engine implementation class in
tts_engine/ - Register the engine/config mapping in
tts_engine/registry.py
- NLTK for text chunking
- SoundFile for audio processing
- Pydantic for configuration management
- OpenAI and Kokoro SDKs for respective engines