Skip to content

When using an undefined role, qwen will occupy more and more GPU and never end computing. #1748

@ZhuShaoQiang

Description

@ZhuShaoQiang

Description

my setup

I am a new coder, using a local Qwen3-4B-Instruct-2507, I don't know if this bug exists in other version.

description

This is your official code and it is running good:

from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-4B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)

Steps to Reproduce

Use an invalid role value in the messages list, e.g., "default user" instead of "user":

messages = [{"role": "default user", "content": "Give me a short introduction to large language model."}]

Apply chat template and run generation as usual:

Expected Behavior

The apply_chat_template or generate method should validate the role field and raise an informative error when an unsupported role (like "default user") is provided, since the Qwen chat template only supports ["user", "assistant", "system"].

Observed Behavior

Instead of raising a clear error (e.g., ValueError: invalid role), the model keeps consuming GPU memory indefinitely (from 8GB to 16G in three minutes), without producing output or terminating. No exception is raised.

Additional Context

Valid roles appear to be strictly "user", "assistant", and optionally "system".

OR, could you please change the official document, put that information into the example code, so that everyone know it?

The current behavior (silent failure + unbounded memory growth) is dangerous and hard to debug for users unfamiliar with the expected role values.

Reproduction

from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-4B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "default role", "content": prompt}  # this line is changed.
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)

other information

I tried to change the value max_new_tokens in code:

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=184
)

and the code runs very quick. And if I change it to 1840, it runs like 2 minutes to finish(5060ti 16GB)。

finally, I tried to output the template text:

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "default user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
print(text)
exit(0)

it outpus:

<|im_start|>assistant

it seems like this is a problem about template and role.

Environment Information

Windows11
python 3.10
Pytorh 2.8 cuda 12.9
Qwen3-4B-Instruct-2507

transformers 4.57.1
vllm 0.11.0
modelscope 1.32.0

Known Issue

  • The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions