-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix: cuda graph issue while running longcat_flash #13899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
It looks like you tried to use the |
|
/gemini help |
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. |
|
Reopen to trigger gemini |
|
gemini triage won't trigger close this one |
Motivation
Currently when we run
longcat_flashit will error out with:this is because in #10568
set_attn_inputs()is skipped whenqkv_latent_func == None, and laterfetch_qkv_latent()will assert on this and fail.Modifications
Make sure
qkv_latent_funcis set.Accuracy Tests
After the change the following launch command runs and output normal tokens:
python3 -m sglang.launch_server \ --trust-remote-code \ --model $MODEL_PATH \ --tp 8 \ --ep-size 8 \ --skip-server-warmup \ --cuda-graph-bs 1 2 3 4 5 6 7 8 \ --host 0.0.0.0 \ --port 8080gsm8k:
Benchmarking and Profiling
gsm8k:
same as when i revert to before it was broken:
Checklist