Skip to content

[BUG]: frontend should propagate '"stop"' field to PreprocessedRequest #4755

@grahamking

Description

@grahamking

Describe the Bug

The OpenAI request can contain a "stop" field, which is a string which should make the engine stop responding when encountered.
https://github.com/ai-dynamo/dynamo/blob/main/lib/async-openai/src/types/chat.rs#L896

The needs to be copied into PreProcessedRequest by the frontend:
https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/protocols/common/preprocessor.rs#L48
https://github.com/ai-dynamo/dynamo/blob/main/lib/llm/src/protocols/common.rs#L234

It appears that isn't happening currently.

Steps to Reproduce

Compiling

# Step 1: Uninstall existing packages to ensure clean installation
uv pip uninstall ai-dynamo ai-dynamo-runtime
 
# Step 2: Compiles Rust code to native binaries with debug optimization (using subshell for env vars)
# Output: $CARGO_TARGET_DIR/debug/ directory with .so files
 
(cd $DYNAMO_HOME && CARGO_PROFILE_DEV_OPT_LEVEL=0 CARGO_BUILD_JOBS=32 cargo build --locked --features dynamo-llm/block-manager --workspace)
 
# Step 3: Creates editable Python installation (no wheel file)
# Output: .pth files in site-packages, links to source code
(cd $DYNAMO_HOME/lib/bindings/kvbm && maturin develop --uv --features block-manager)

# Qi --
# Step 4? -- not sure if we need step 3 at all
(cd $DYNAMO_HOME/lib/bindings/python && maturin develop --uv)

(cd $DYNAMO_HOME && pip install -e .)

running

DYN_LOG=tracing|debug|info ...
DYN_LOG=info python -m dynamo.frontend --http-port 8080 &

CUDA_VISIBLE_DEVICES=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 \
python -m dynamo.vllm \
--model Qwen/Qwen3-0.6B \
--enforce-eager --no-enable-prefix-caching
curl localhost:8080/v1/completions -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen/Qwen3-0.6B",
    "prompt": "Say this is a test",
    "max_tokens": 50,
    "stop": "I",
    "min_tokens": 10,
    "ignore_eos": false
}' | jq
// for some reasons, Qi cannot run this model for now
--model meta/llama-3.1-8b-instruct \

Make Dynamo print:

TRACE handle_payload: dynamo_runtime::pipeline::network::ingress::push_handler: received request: Object {"model": String("meta/llama-3.1-8b-instruct"), "token_ids": Array [Number(46864), Number(420), Number(374), Number(264), Number(1296)], "stop_conditions": Object {"max_tokens": Number(50), "stop": Null, "stop_token_ids_hidden": Array [Number(128001), Number(128008), Number(128009)], "min_tokens": Number(10), "ignore_eos": Bool(false), "max_thinking_tokens": Null}, "sampling_options": Object {"n": Null, "best_of": Null, "presence_penalty": Null, "frequency_penalty": Null, "repetition_penalty": Null, "temperature": Null, "top_p": Null, "top_k": Null, "min_p": Null, "use_beam_search": Null, "length_penalty": Null, "seed": Null, "include_stop_str_in_output": Null, "guided_decoding": Null}, "output_options": Object {"logprobs": Null, "prompt_logprobs": Null, "skip_special_tokens": Null, "formatted_prompt": Null}, "eos_token_ids": Array [Number(128001), Number(128008), Number(128009)], "mdc_sum": String("f42b51cbe0f409ded764f102796b2a031788145b0f485ca45b09a5ec5aee0118"), "annotations": Array [], "estimated_prefix_hit_num_blocks": Null, "backend_instance_id": Null, "router_config_override": Null}

which has under stop_conditions: "stop": Null. Which is wrong.

Investigation / Code Pointer

impl OpenAIStopConditionsProvider for NvCreateCompletionRequest {
    fn get_stop(&self) -> Option<Vec<String>> {
        None
    }

Metadata

Metadata

Assignees

Labels

Dynamo NIMbugSomething isn't workingfrontend`python -m dynamo.frontend` and `dynamo-run in=http|text|grpc`

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions