Failed to Convert Orpheus Fine-Tuned Checkpoint to GGUF for Streaming Endpoint

**Hi team,**

I'm facing a consistent issue when trying to deploy my fine-tuned Orpheus model as a streaming endpoint.
Despite multiple attempts, I haven’t been able to successfully convert my checkpoint into the required **GGUF** format.

---

### 🔧 Context

I trained a custom **Arabic Orpheus model** (≈300 hours of speech data) using the **official fine-tuning configs and code** provided by Orpheus.
After training, my latest checkpoint directory contains the following files:

```
chat_template.jinja
generation_config.json
config.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
training_args.bin
```

---

### ⚙️ What I Tried

Since your platform accepts only **GGUF models**, I attempted to convert the model using **llama.cpp** as recommended:

```bash
python convert_hf_to_gguf.py /path/to/checkpoint \
  --outfile /data/models/orpheus-arabic-f16.gguf \
  --outtype f16
```
```
INFO:hf-to-gguf:blk.27.attn_output.weight,   torch.bfloat16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_q.weight,        torch.bfloat16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_v.weight,        torch.bfloat16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 3072
INFO:hf-to-gguf:gguf: feed forward length = 8192
INFO:hf-to-gguf:gguf: head count = 24
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2146, in set_vocab
    self._set_vocab_sentencepiece()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1033, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1050, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /home/ubuntu/training/orpheus/Orpheus-TTS/finetune/checkpoints/orpheus-arabic-ft_3/checkpoint-14736/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2149, in set_vocab
    self._set_vocab_llama_hf()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1135, in _set_vocab_llama_hf
    vocab = gguf.LlamaHfVocab(self.dir_model)
  File "/home/ubuntu/orpheus/llama.cpp/gguf-py/gguf/vocab.py", line 515, in __init__
    raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 9531, in <module>
    main()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 9525, in main
    model_instance.write()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 430, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 597, in prepare_metadata
    self.set_vocab()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2152, in set_vocab
    self._set_vocab_gpt2()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 969, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 685, in get_vocab_base
    assert max(
```
However, the conversion fails every time

---

### ❓Request for Help

Could you please guide me on the correct process or tools to:

1. Convert this fine-tuned Orpheus checkpoint into **GGUF format** to use it with this repo, or
2. Obtain a **working streaming endpoint** for this model.

Any official documentation, command examples, or conversion compatibility notes would be highly appreciated.

---

**Environment**

* Model: Orpheus fine-tuned (Arabic, 300 hrs)
* Framework: official Orpheus fine-tuning pipeline
* Conversion tool: `llama.cpp` (latest)
* OS: Ubuntu 22.04 (CUDA 12.4)
* Python: 3.10

---

**Thanks in advance for your help and support!**

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failed to Convert Orpheus Fine-Tuned Checkpoint to GGUF for Streaming Endpoint #86

🔧 Context

⚙️ What I Tried

❓Request for Help

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Failed to Convert Orpheus Fine-Tuned Checkpoint to GGUF for Streaming Endpoint #86

Description

🔧 Context

⚙️ What I Tried

❓Request for Help

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions