[Question] Is it recommended to use the same examples for SFT and QAT?

Quantizing a model to NVFP4. The model was SFT'd then RL fine-tuned. Planning to do QAT to recover accuracy after PTQ.                                            
                                                                                                                  
  Looking at `examples/llm_qat/`, the same Daring-Anteater dataset is used across all three steps (SFT → PTQ      
  calibration → QAT), just different splits. A few questions:                                                     
                                                                                                                  
  1. **Calibration vs QAT data** — Is using the exact same samples for both SFT and QAT fine-tuning         
  intentional/recommended, or just a simplification for the example?
                                                                                                                  
  2. **RL-tuned models** — My model's final training stage was RL, not SFT. Should QAT fine-tuning data    
  match the RL distribution (tool calls, reasoning traces, environment feedback), or the original SFT data, or
  does it not matter much?
                                                                                                                  
  ### Who can help?                                                                                             

  @kevalmorabia97 @sugunav14  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Is it recommended to use the same examples for SFT and QAT? #1095

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Is it recommended to use the same examples for SFT and QAT? #1095

Description

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions