This project converts text files into audiobooks using Text-to-Speech (TTS).
It processes input files, tags dialogues with character names, generates character and metadata files for user's to customize the output, and results in an .m4b audiobook file.
For an illustrative example of the process see:
inputs/example.txt -> outputs/example/example_tagged.txt -> outputs/example/example.m4b
If you don't have pipenv or Python 3.9 installed, please install it.
Then,
git clone https://github.com/ebonsignori/tts-book-to-audio.git
cd tts-book-to-audio
pipenv installSee .env.example and rename it to .env with the respective keys.
This project uses a GitHub PAT with any level of permissions set via GITHUB_TOKEN to access OpenAI's 4o model via GitHub Models.
-
The
OPENAI_API_KEYis only needed if you are using the--tts-method openaioption. This is NOT a free API, however the resulting audio quality may be higher if you choose to use this method. -
The
ELEVENLABS_API_KEYis only needed if you are using the--tts-method elevenlabsoption. This is NOT a free API, however the resulting audio quality may be higher if you choose to use this method.
bash pipenv run python src/main.py -i <input_book_name> [options]
Example:
bash pipenv run python src/main.py -i my_book.epub
-
-i,--input-file: (Required) The name of the book file in theinputs/directory. Should include the file extension (e.g.,my_book.epub).Supported Formats:
.txt.epub.mobi.pdf
-
--tts-method,-t: (Optional) Text-to-Speech method to use. Choices are:local: (default) Free and fast, but not as high quality as paid APIs.openai: Requires anOPENAI_API_KEYin.env. Costs money to use the OpenAPI TTS API.elevenlabs: Requires anELEVENLABS_API_KEYin.env. Costs money to use the ElevenLabs TTS API.
-
--steps,-s: (Optional) Comma-separated list of processing steps to execute. If not provided, all steps will run. See Processing Steps -
--m4b-method,-m: (Optional) Method to combine audio files into .m4b.Supported methods
avffmpeg
-
-p,--write-processed-blocks(Optional): Write intermediate text processing blocks tooutput/<input_book_name>/processed_blocks/processed_#.txtreturned from the GPT. Useful for debugging.
Note: Ensure the input file is placed inside the inputs/ directory.
The conversion process is divided into four main steps. You can execute all steps at once or specify individual steps for manual intervention or customization.
Step 1: Process Input File into Plaintext.
- Converts the input book file into a plaintext file.
- Output:
outputs/<input_book_name>/<input_book_name>_plaintext.txt
Step 2: Tag Dialogues and Generate JSON Files
- Transforms plaintext by surrounding dialogues with
<character_name>tags. - Generates
characters.jsonwith character names and their corresponding voices. - Creates
metadata.jsonfor audiobook metadata customization. - Outputs:
outputs/<input_book_name>/<input_book_name>_tagged.txtoutputs/<input_book_name>/characters.jsonoutputs/<input_book_name>/metadata.jsonoutput/<input_book_name>/processed_blocks/processed_#.txt(if-pflag is passed)
Step 3: Generate TTS Audio Files
- Converts the tagged text into audio files using the specified TTS method.
- Output:
outputs/<input_book_name>/audio_files/<file_number>.mp3
Step 4: Combine Audio Files into an .m4b Audiobook
- Merges all generated audio files into a single .m4b file using the chosen method (av or ffmpeg).
- Output:
outputs/<input_book_name>/<input_book_name>.m4b
To run specific steps, use the -s or --steps option followed by a comma-separated list of step numbers.
Example:
bash pipenv run python src/main.py -i my_book.epub -s 1,2
This command will execute Step 1 and Step 2 only.
Note: After running certain steps, you may manually edit the generated files (e.g., characters.json, metadata.json, or _plaintext.txt) before proceeding to the next steps.
book-to-audio-converter/
├── inputs/
│ └── my_book.epub
│ └── my_book.jpg
├── outputs/
│ └── my_book/
│ ├── my_book_plaintext.txt
│ ├── my_book_tagged.txt
│ ├── characters.json
│ ├── metadata.json
│ ├── audio_files/
│ │ ├── 1.mp3
│ │ ├── 2.mp3
│ │ └── ...
│ └── my_book.m4b
- Run
pipenv run python src/generate-voice-examples.py - Browse the resulting
local-voice-examplesdirectory and play audio files to hear the speaker's voice - Adjust
vits_voice_mappingand the gendered voices inCONFIG.voice_identifiersin the src/config.py file.
For example, if you listened to p237 in local-voice-examples and want to add it as another female voice option, append the following to vits_voice_mapping:
"female_3": {
"model": "tts_models/en/vctk/vits",
"speaker": "p237"
},
Then in CONFIG.voice_identifiers.female_voices, add the new voice as an auto-map option so that it shows up in auto-generated characters.json:
"female_voices": ["female_1", "female_2", "female_3"],
This project is licensed under the MIT License.