Skip to content

Failed to Convert Orpheus Fine-Tuned Checkpoint to GGUF for Streaming Endpoint #86

@motichic

Description

@motichic

Hi team,

I'm facing a consistent issue when trying to deploy my fine-tuned Orpheus model as a streaming endpoint.
Despite multiple attempts, I haven’t been able to successfully convert my checkpoint into the required GGUF format.


🔧 Context

I trained a custom Arabic Orpheus model (≈300 hours of speech data) using the official fine-tuning configs and code provided by Orpheus.
After training, my latest checkpoint directory contains the following files:

chat_template.jinja
generation_config.json
config.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
training_args.bin

⚙️ What I Tried

Since your platform accepts only GGUF models, I attempted to convert the model using llama.cpp as recommended:

python convert_hf_to_gguf.py /path/to/checkpoint \
  --outfile /data/models/orpheus-arabic-f16.gguf \
  --outtype f16
INFO:hf-to-gguf:blk.27.attn_output.weight,   torch.bfloat16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_q.weight,        torch.bfloat16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_v.weight,        torch.bfloat16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 3072
INFO:hf-to-gguf:gguf: feed forward length = 8192
INFO:hf-to-gguf:gguf: head count = 24
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2146, in set_vocab
    self._set_vocab_sentencepiece()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1033, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1050, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /home/ubuntu/training/orpheus/Orpheus-TTS/finetune/checkpoints/orpheus-arabic-ft_3/checkpoint-14736/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2149, in set_vocab
    self._set_vocab_llama_hf()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 1135, in _set_vocab_llama_hf
    vocab = gguf.LlamaHfVocab(self.dir_model)
  File "/home/ubuntu/orpheus/llama.cpp/gguf-py/gguf/vocab.py", line 515, in __init__
    raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 9531, in <module>
    main()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 9525, in main
    model_instance.write()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 430, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 597, in prepare_metadata
    self.set_vocab()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 2152, in set_vocab
    self._set_vocab_gpt2()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 969, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
  File "/home/ubuntu/orpheus/llama.cpp/convert_hf_to_gguf.py", line 685, in get_vocab_base
    assert max(

However, the conversion fails every time


❓Request for Help

Could you please guide me on the correct process or tools to:

  1. Convert this fine-tuned Orpheus checkpoint into GGUF format to use it with this repo, or
  2. Obtain a working streaming endpoint for this model.

Any official documentation, command examples, or conversion compatibility notes would be highly appreciated.


Environment

  • Model: Orpheus fine-tuned (Arabic, 300 hrs)
  • Framework: official Orpheus fine-tuning pipeline
  • Conversion tool: llama.cpp (latest)
  • OS: Ubuntu 22.04 (CUDA 12.4)
  • Python: 3.10

Thanks in advance for your help and support!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions