Run Text-To-Speech with the MLX implementation (Mac M1-M4) of Kokoro to vastly improve processing speed. Use one voice or blend two voices by specifying a mixing ratio.
The app comes with a user-friendly gradio web interface.
- Python >= 3.11
- HuggingFace Access Token
git clone https://github.com/tsmdt/kokoro-MLX-blender.gitcd kokoro-MLX-blenderpython3 -m venv venv_kokoro
source venv_kokoro/bin/activatepip install .Run the following command from the main project folder (./kokoro-MLX-blender/)
huggingface-cli download --local-dir models/Kokoro-82M-bf16 mlx-community/Kokoro-82M-bf16Ensure that the folder Kokoro-82M-bf16 (Hugging Face) with a voices subfolder and various .pt files (e.g. af_heart, af_alloy, etc.) now exists within the models folder. Your directory should look like this:
kokoro-MLX-blender
├── kb_mlx/
├── models/
│ └── Kokoro-82M-bf16/
│ ├── samples/
│ ├── voices/
│ ├── .gitattributes
│ ├── config.json
│ ├── DONATE.md
│ ├── kokoro-v1_0.safetensors
│ ├── README.md
│ ├── SAMPLES.md
│ └── VOICES.md
├── .gitignore
├── LICENSE
├── README.md
...Note
You can use different versions of the KokoroMLX model as well. Download your preferred one from HuggingFace (cf. Installation step 5) and make sure that the downloaded Kokoro model folder exists within the models folder of kokoro-MLX-blender.
Run the following command in CLI to check if everything works.
kbx listIf you see a list of voice names kokoro-MLX-blender should work. If not please make sure that you downloaded the kokoro model in the previous step and placed it correctly in your models folder.
$ kbx run
Usage: kbx run [OPTIONS]
Run TTS with KokoroMLX for M1-M4. Use one voice or blend two voices.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * --text -t TEXT Input text(s) as string, single .txt or directory path [default: None] [required] │
│ --voice1 -v1 TEXT Name of the first voice (without .pt) [default: af_heart] │
│ --voice2 -v2 TEXT Name of second voice (without .pt); if omitted, use only voice1 [default: None] │
│ --mix-ratio -m FLOAT Blend weight for voice1 and voice2 (0.5 = 50% each) [default: 0.5] │
│ --speed -s FLOAT Speed multiplier (1.5 = 50% faster, 0.5 = 50% slower) [default: 1] │
│ --model-dir -md DIRECTORY Path to the local Kokoro model directory [default: ./models/Kokoro-82M-bf16] │
│ --output-dir -o TEXT Directory where output audio file will be saved [default: ./output] │
│ --verbose --no-verbose Enable verbose output [default: verbose] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯Run TTS with two blended voices (60% voice1 and 40% voice2)
kbx run -t "This is a test in blending the male American voice of Eric with the female American voice of Heart." -v1 am_eric -v2 af_heart -m 0.6Launch the Gradio web app like this:
kbx appOpenAI's o4-mini-high was used for creating the CLI app.
@software{kokoro_mlx_blender,
author = {Thomas Schmidt},
title = {Kokoro MLX Voice Blender},
year = {2025},
url = {https://github.com/tsmdt/kokoro-MLX-blender},
note = {Accessed: 2025-05-29}
}