When using an undefined role, qwen will occupy more and more GPU and never end computing.

### Description

# my setup
I am a new coder, using a local `Qwen3-4B-Instruct-2507`, I don't know if this bug exists in other version.

# description
This is your official code and it is running good:
```python
from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-4B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
```
# Steps to Reproduce
Use an invalid role value in the messages list, e.g., "default user" instead of "user":
```python
messages = [{"role": "default user", "content": "Give me a short introduction to large language model."}]
```
Apply chat template and run generation as usual:

# Expected Behavior
The apply_chat_template or generate method should validate the role field and raise an informative error when an unsupported role (like "default user") is provided, since the Qwen chat template only supports ["user", "assistant", "system"].

# Observed Behavior
Instead of raising a clear error (e.g., ValueError: invalid role), the model keeps consuming GPU memory indefinitely (from 8GB to 16G in three minutes), without producing output or terminating. No exception is raised.

# Additional Context
Valid roles appear to be strictly "user", "assistant", and optionally "system".

**OR, could you please change the official document, put that information into the example code, so that everyone know it?**

The current behavior (silent failure + unbounded memory growth) is dangerous and hard to debug for users unfamiliar with the expected role values.


### Reproduction

```python
from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-4B-Instruct-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "default role", "content": prompt}  # this line is changed.
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
```

# other information
I tried to change the value `max_new_tokens` in code:
```python
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=184
)
```
and the code runs very quick. And if I change it to 1840, it runs like 2 minutes to finish(5060ti 16GB)。

finally, I tried to output the template text:
```python
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "default user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
print(text)
exit(0)
```
it outpus:
```text
<|im_start|>assistant
```
it seems like this is a problem about `template` and `role`.

### Environment Information

Windows11
python 3.10
Pytorh 2.8 cuda 12.9
Qwen3-4B-Instruct-2507

transformers                      4.57.1
vllm                              0.11.0
modelscope                        1.32.0

### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When using an undefined role, qwen will occupy more and more GPU and never end computing. #1748

Description

my setup

description

Steps to Reproduce

Expected Behavior

Observed Behavior

Additional Context

Reproduction

other information

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When using an undefined role, qwen will occupy more and more GPU and never end computing. #1748

Description

Description

my setup

description

Steps to Reproduce

Expected Behavior

Observed Behavior

Additional Context

Reproduction

other information

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions