-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Hi team,
I'm facing a consistent issue when trying to deploy my fine-tuned Orpheus model as a streaming endpoint.
Despite multiple attempts, I haven’t been able to successfully convert my checkpoint into the required GGUF format.
🔧 Context
I trained a custom Arabic Orpheus model (≈300 hours of speech data) using the official fine-tuning configs and code provided by Orpheus.
After training, my latest checkpoint directory contains the following files:
chat_template.jinja
generation_config.json
config.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
training_args.bin
⚙️ What I Tried
Since your platform accepts only GGUF models, I attempted to convert the model using llama.cpp as recommended:
python convert_hf_to_gguf.py /path/to/checkpoint \
--outfile /data/models/orpheus-arabic-f16.gguf \
--outtype f16INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_q.weight, torch.bfloat16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_v.weight, torch.bfloat16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 3072
INFO:hf-to-gguf:gguf: feed forward length = 8192
INFO:hf-to-gguf:gguf: head count = 24
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2146, in set_vocab
self._set_vocab_sentencepiece()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1033, in _set_vocab_sentencepiece
tokens, scores, toktypes = self._create_vocab_sentencepiece()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1050, in _create_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /home/ubuntu/training/orpheus/Orpheus-TTS/finetune/checkpoints/orpheus-arabic-ft_3/checkpoint-14736/tokenizer.model
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2149, in set_vocab
self._set_vocab_llama_hf()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1135, in _set_vocab_llama_hf
vocab = gguf.LlamaHfVocab(self.dir_model)
File "/home/ubuntu/orpheus/llama.cpp/gguf-py/gguf/vocab.py", line 515, in __init__
raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 9531, in <module>
main()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 9525, in main
model_instance.write()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 430, in write
self.prepare_metadata(vocab_only=False)
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 597, in prepare_metadata
self.set_vocab()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2152, in set_vocab
self._set_vocab_gpt2()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 969, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 685, in get_vocab_base
assert max(
However, the conversion fails every time
❓Request for Help
Could you please guide me on the correct process or tools to:
- Convert this fine-tuned Orpheus checkpoint into GGUF format to use it with this repo, or
- Obtain a working streaming endpoint for this model.
Any official documentation, command examples, or conversion compatibility notes would be highly appreciated.
Environment
- Model: Orpheus fine-tuned (Arabic, 300 hrs)
- Framework: official Orpheus fine-tuning pipeline
- Conversion tool:
llama.cpp(latest) - OS: Ubuntu 22.04 (CUDA 12.4)
- Python: 3.10
Thanks in advance for your help and support!