Quantizing a model to NVFP4. The model was SFT'd then RL fine-tuned. Planning to do QAT to recover accuracy after PTQ.
Looking at examples/llm_qat/, the same Daring-Anteater dataset is used across all three steps (SFT → PTQ
calibration → QAT), just different splits. A few questions:
-
Calibration vs QAT data — Is using the exact same samples for both SFT and QAT fine-tuning
intentional/recommended, or just a simplification for the example?
-
RL-tuned models — My model's final training stage was RL, not SFT. Should QAT fine-tuning data
match the RL distribution (tool calls, reasoning traces, environment feedback), or the original SFT data, or
does it not matter much?
Who can help?
@kevalmorabia97 @sugunav14