Skip to content

[Bug] Qwen3-235B-A22B-Instruct-W8A8 is very easy npu OOM #12016

@hyena126

Description

@hyena126

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

no any other complex configugres, mem-fraction-statio < 0.9 almost can not start service. even context_len very small
for example:
for 800I A2 single node,if context_len=10K,mem-fraction-statio must > 0.92, is not correct;
for 800I A2,1P(8 npus)1D(8 npus) disaggregation , context_len=4K,mem-fraction-statio must > 0.95, even 0.98

Reproduction

Environment

800I A2 64G, 2 nodes, stand-alone computer(using 8 fiber to connect the npus)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions