-
Notifications
You must be signed in to change notification settings - Fork 87
Open
Description
hqq worked all well with qwen3 until it came to the 32b one. Table below shows the result. Did you happen to run it before?
| Model | FP16 PPL | HQQ PPL | Increase | Pattern |
|---|---|---|---|---|
| 0.6B | 26.16 | 32.22 | +6.06 | |
| 8B | 12.19 | 12.51 | +0.32 | |
| 14B | 10.78 | 11.09 | +0.31 | |
| 32B | 9.31 | 11.32 | +2.01 |
Here is the cli to get ppl:
CUDA_VISIBLE_DEVICES=2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn python -m lm_eval --num_fewshot 0 --seed 42 --model vllm --model_args pretrained=/models/Qwen3-32B-hqq/,dtype=half,tensor_parallel_size=2,enforce_eager=True,gpu_memory_utilization=0.8 --tasks wikitext --batch_size 2
Metadata
Metadata
Assignees
Labels
No labels