Skip to content

Conversation

@billishyahao
Copy link
Contributor

Motivation

Remove the hardcode timeout during warming up by introducing

parser.add_argument(
        "--warmup-timeout",
        type=float,
        default=ServerArgs.warmup_timeout,
        help="Set warmup timeout in seconds. If a warmup forward batch takes longer than this, the server will crash to prevent hanging. Recommend to increase warmup timeout to 1800 to accommodate some kernel JIT precache e.g. deep gemm",
    )

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@billishyahao billishyahao force-pushed the billhe/addwarmuptimeout branch 2 times, most recently from 39c11d5 to 5a8512f Compare November 13, 2025 06:07
Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are clean. But I think maybe we can bypass timeout caused by deepgemm compile through --skip-server-warmup already?

@billishyahao
Copy link
Contributor Author

Yes, @ShangmingCai Without this change, we can also workaround the issue through two ways:

  • specify --skip-server-warmup to bypass the whole server warmup process
  • AOT/prebuild the computation kernel e.g. python3 -m sglang.compile_deep_gemm --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code for deep gemm

This patch aims to eliminate the hardcode in default code path to allow user to preserve the warmup through extending the timeout limitation

@billishyahao billishyahao force-pushed the billhe/addwarmuptimeout branch from d89df01 to c57b467 Compare November 13, 2025 09:20
@hnyls2002
Copy link
Collaborator

This configuration is rarely used. Do not add it to ServerArgs as too many args for now. Put this into sglang.srt.environ and use an environment variable to configure the server warmup timeout.

@billishyahao
Copy link
Contributor Author

This configuration is rarely used. Do not add it to ServerArgs as too many args for now. Put this into sglang.srt.environ and use an environment variable to configure the server warmup timeout.

Thanks for the advice. Let me propose a new PR to introduce new env variable in sglang.srt.environ rather than ServerArgs . Also we have observed random warmup timeout at the server side first time due to gemm library AOT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants