flash attention FA4 blackwell on sm120? #10564
voipmonitor
started this conversation in
General
Replies: 1 comment
-
|
only sm100 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
is the new FA4 compatible with sm120 (RTX 6000 PRO, 5090 etc.) ?
THis is the PR: #9928
python3 -m sglang.launch_server
--model-path nvidia/DeepSeek-V3-0324-FP4
--tp 4 --attention-backend trtllm_mla
--moe-runner-backend flashinfer_trtllm
--quantization modelopt_fp4
--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
--prefill-attention-backend fa4 --speculative-attention-mode decode
Beta Was this translation helpful? Give feedback.
All reactions