-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Description
my setup
I am a new coder, using a local Qwen3-4B-Instruct-2507, I don't know if this bug exists in other version.
description
This is your official code and it is running good:
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-4B-Instruct-2507"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)Steps to Reproduce
Use an invalid role value in the messages list, e.g., "default user" instead of "user":
messages = [{"role": "default user", "content": "Give me a short introduction to large language model."}]Apply chat template and run generation as usual:
Expected Behavior
The apply_chat_template or generate method should validate the role field and raise an informative error when an unsupported role (like "default user") is provided, since the Qwen chat template only supports ["user", "assistant", "system"].
Observed Behavior
Instead of raising a clear error (e.g., ValueError: invalid role), the model keeps consuming GPU memory indefinitely (from 8GB to 16G in three minutes), without producing output or terminating. No exception is raised.
Additional Context
Valid roles appear to be strictly "user", "assistant", and optionally "system".
OR, could you please change the official document, put that information into the example code, so that everyone know it?
The current behavior (silent failure + unbounded memory growth) is dangerous and hard to debug for users unfamiliar with the expected role values.
Reproduction
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-4B-Instruct-2507"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "default role", "content": prompt} # this line is changed.
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)other information
I tried to change the value max_new_tokens in code:
generated_ids = model.generate(
**model_inputs,
max_new_tokens=184
)and the code runs very quick. And if I change it to 1840, it runs like 2 minutes to finish(5060ti 16GB)。
finally, I tried to output the template text:
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "default user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
print(text)
exit(0)it outpus:
<|im_start|>assistant
it seems like this is a problem about template and role.
Environment Information
Windows11
python 3.10
Pytorh 2.8 cuda 12.9
Qwen3-4B-Instruct-2507
transformers 4.57.1
vllm 0.11.0
modelscope 1.32.0
Known Issue
- The issue hasn't been already addressed in Documentation, Issues, and Discussions.