Skip to content

[Question] Is it recommended to use the same examples for SFT and QAT? #1095

@DhruvBhatia0

Description

@DhruvBhatia0

Quantizing a model to NVFP4. The model was SFT'd then RL fine-tuned. Planning to do QAT to recover accuracy after PTQ.

Looking at examples/llm_qat/, the same Daring-Anteater dataset is used across all three steps (SFT → PTQ
calibration → QAT), just different splits. A few questions:

  1. Calibration vs QAT data — Is using the exact same samples for both SFT and QAT fine-tuning
    intentional/recommended, or just a simplification for the example?

  2. RL-tuned models — My model's final training stage was RL, not SFT. Should QAT fine-tuning data
    match the RL distribution (tool calls, reasoning traces, environment feedback), or the original SFT data, or
    does it not matter much?

Who can help?

@kevalmorabia97 @sugunav14

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionHelp is is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions