Skip to content

Add experimental support for transformers>=5.0#975

Open
kevalmorabia97 wants to merge 18 commits intomainfrom
kmorabi/bump-transformers-5.0
Open

Add experimental support for transformers>=5.0#975
kevalmorabia97 wants to merge 18 commits intomainfrom
kmorabi/bump-transformers-5.0

Conversation

@kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Mar 4, 2026

What does this PR do?

  • Add experimental support for transformers >=5.0 and remove deprecated usages: https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md
  • ⚠️ Unified Hugging Face checkpoint export for quantized checkpoints may not work for some models with transformers>=5.0 yet as it requires a lot of fixes (e.g. change in how MoE experts are organized)
    • Add Workaround for TRT-LLM's import of deprecated transformers functions so trt-llm based gpu unit tests work fine. Still deployment for models needs proper fixes directly in TRT-LLM hence llm/vlm ptq example tests still run with transformers 4.57
    • Everything except PTQ and Export should work fine with transformers>=5.0
  • Remove hard-coded trust_remote_code=True
  • NOTE: Upcoming Nemo:26.04 container will pin transformers>=5.0

Testing

CI/CD tests passing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅

Summary by CodeRabbit

  • New Features

    • Make remote-code usage opt-in via a configurable --trust_remote_code flag across examples and tools.
  • Bug Fixes

    • Improve checkpoint/resume detection and related training guidance to avoid erroneous errors.
  • Refactor

    • Consolidate dtype/config naming, switch warmup settings from ratio → steps, and unify tokenizer invocation patterns.
  • Documentation

    • Simplify changelog title and add misc notes for release 0.44.
  • Chores

    • Remove scheduled PR-branch cleanup workflow and relax/remove several transformers version pins.
  • Tests

    • Adjust test gates, skips, and structures to align with updated deps and behaviors.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced many Hugging Face torch_dtype kwargs with dtype, made trust_remote_code opt-in and propagated it across examples and internals, adjusted quantization/MoE/trace internals for Transformers v5 semantics, removed a scheduled GitHub Actions workflow, and pruned pinned example requirements and some test decorators.

Changes

Cohort / File(s) Summary
Removed CI workflow
\.github/workflows/delete_outdated_pr_branches\.yml
Deleted the scheduled/manual GitHub Actions workflow that deleted outdated PR branches.
Transformers bounds & runtime warning
pyproject.toml, modelopt/torch/__init__.py, CHANGELOG.rst, tox.ini
Raised minimum transformers to >=5.0, removed prior <5.0 upper-bounds, and updated the runtime compatibility warning/patching logic for pre-5.0 behavior.
Global dtype API (torch_dtypedtype)
examples/..., modelopt/..., examples/windows/..., modelopt/onnx/...
Replaced torch_dtype with dtype in from_pretrained/model-construction callsites to align with Transformers v5 API.
Make trust_remote_code configurable
examples/gpt-oss/..., examples/llm_autodeploy/..., examples/llm_ptq/..., examples/speculative_decoding/..., examples/llm_eval/..., examples/.../vlm_utils.py, modelopt/torch/speculative/utils.py, examples/windows/...
Added trust_remote_code: bool = False CLI flags/dataclass fields and threaded the value into AutoTokenizer.from_pretrained / AutoModel.from_pretrained and related helpers instead of hardcoding True.
QTensor wrapper restore & RealQuantLinear behavior
modelopt/torch/opt/plugins/transformers.py, modelopt/torch/quantization/nn/modules/quant_linear.py, modelopt/torch/quantization/plugins/accelerate.py
Added _restore_qtensor_wrappers to re-wrap saved QTensor metadata after patched from_pretrained; updated RealQuantLinear._setup to preserve/restore existing QTensorWrapper metadata and adjusted dtype fallback/forwarding logic.
Quantization / HuggingFace plugin (DBRX/MoE) updates
modelopt/torch/quantization/plugins/huggingface.py, modelopt/torch/quantization/utils/core_utils.py
Changed DBRX expert APIs/weight shapes and _QuantDbrxExperts.forward parameter ordering; made sync_moe_expert_amax a no-op for non-iterable/batched expert containers.
Trace & speculative runtime patches
modelopt/torch/trace/plugins/transformers.py, modelopt/torch/speculative/plugins/transformers.py, modelopt/torch/speculative/utils.py
Added FX-trace-friendly BertLayer.forward patch for Transformers≥5.0, inlined DynamicCache creation where used, and made load_vlm_or_llm_with_kwargs accept trust_remote_code.
Wrapper model / past_key_values handling
modelopt/onnx/llm_export_utils/export_utils.py
Switched model loading to dtype and reconstructed past_key_values tuples from cache fields instead of using .to_legacy_cache().
Tokenizer / dataset API updates
examples/llm_sparsity/..., examples/windows/onnx_ptq/..., modelopt/torch/utils/speech_dataset_utils.py, examples/llm_eval/lm_eval_hf.py
Replaced tokenizer.batch_encode_plus(...) with direct tokenizer(...) calls; removed or consolidated trust_remote_code=True on load_dataset calls; moved import datasets out of conditionals.
Warmup/config key changes
examples/gpt-oss/configs/sft_*.yaml, examples/llm_qat/launch.sh, examples/llm_qat/notebooks/*, examples/llm_sparsity/weight_sparsity/launch_finetune.sh
Switched warmup keys/CLI args from warmup_ratio to warmup_steps.
Device placement & adapter timing
examples/llm_sparsity/attention_sparsity/hf_sa.py, modelopt/torch/quantization/plugins/transformers_trainer.py
Use .to(model.device) instead of .cuda() and moved LoRA adapter insertion earlier (before super init) using provided training args.
Finetune resume/overwrite logic
examples/llm_sparsity/weight_sparsity/finetune.py
Simplified checkpoint gating to rely on resume_from_checkpoint and removed the previous overwrite-output-dir error path.
Tests: decorators, fixtures, refactors
tests/... (multiple files)
Removed many @pytest.mark.manual markers, introduced skip/gates for transformers<5.0, refactored class-based tests to module-level functions, adjusted tiny model fixtures to use torch.float32, and skipped Whisper tests that require system deps.
Requirements & docs pruning
examples/.../requirements*.txt, examples/llm_ptq/README.md, examples/speculative_decoding/README.md
Removed or loosened several pinned dependencies (transformers pins, librosa, soundfile, deepspeed, etc.) and updated README notes/examples.
Misc examples & scripts
examples/speculative_decoding/scripts/*, examples/gpt-oss/sft.py, examples/llm_qad/*, examples/speculative_decoding/*, examples/...
Minor refactors: added/propagated trust_remote_code flags, removed redundant redeclarations, and reformatted calls to align with v5 API changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant HF as "HuggingFace.from_pretrained"
    participant ModelOpt as "ModelOpt plugin (_restore_qtensor_wrappers)"
    participant FS as "modelopt_state.pth (FS)"

    User->>Script: run (optional --trust_remote_code)
    Script->>HF: from_pretrained(..., dtype=..., trust_remote_code=...)
    HF-->>Script: returns model instance
    Script->>ModelOpt: patched hook invoked after instantiation
    ModelOpt->>FS: check for modelopt_state.pth
    FS-->>ModelOpt: q_tensor_state (if present)
    ModelOpt->>ModelOpt: re-wrap weights preserving QTensorWrapper metadata
    ModelOpt-->>Script: model with restored wrappers
    Script-->>User: continue (quantize/export/generate)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Docstring example contains hardcoded trust_remote_code=True without justification, violating SECURITY.md requirements and contradicting PR objectives. Remove trust_remote_code=True parameter, add inline comment justifying its necessity, or make it a configurable parameter with secure False default.
Docstring Coverage ⚠️ Warning Docstring coverage is 34.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main objective: adding experimental support for transformers>=5.0, which aligns with the comprehensive changes throughout the changeset.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabi/bump-transformers-5.0

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 3e28ada to 2b24815 Compare March 24, 2026 21:34
@github-actions
Copy link
Contributor

github-actions bot commented Mar 24, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-975/

Built to branch gh-pages at 2026-03-26 09:47 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@kevalmorabia97
Copy link
Collaborator Author

/ok to test 2b24815

@kevalmorabia97
Copy link
Collaborator Author

/ok to test 1f0726e

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 1f0726e to 48b426f Compare March 25, 2026 08:24
@kevalmorabia97
Copy link
Collaborator Author

/ok to test 48b426f

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 48b426f to 0781ac7 Compare March 25, 2026 08:42
@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 59.32203% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.60%. Comparing base (291498b) to head (af9825f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/__init__.py 70.73% 12 Missing ⚠️
...lopt/torch/quantization/nn/modules/quant_linear.py 23.07% 10 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py 0.00% 1 Missing ⚠️
modelopt/torch/speculative/utils.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #975      +/-   ##
==========================================
+ Coverage   70.21%   70.60%   +0.39%     
==========================================
  Files         228      229       +1     
  Lines       25952    26055     +103     
==========================================
+ Hits        18221    18396     +175     
+ Misses       7731     7659      -72     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 marked this pull request as ready for review March 25, 2026 09:41
@kevalmorabia97 kevalmorabia97 requested review from a team as code owners March 25, 2026 09:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
modelopt/torch/opt/plugins/transformers.py (1)

74-76: ⚠️ Potential issue | 🟠 Major

Resolve Hub IDs to a local snapshot path before restoring modelopt_state.

Line 107 passes raw pretrained_model_name_or_path into _restore_qtensor_wrappers; for Hub IDs this is not a local dir, so Line 75 returns early and wrapper metadata restoration is skipped.

Proposed direction
-def _restore_qtensor_wrappers(model, model_path):
+def _restore_qtensor_wrappers(model, model_path, **hf_kwargs):
+    model_path = _resolve_local_model_path(model_path, **hf_kwargs)
     modelopt_state_path = _get_modelopt_state_path(model_path)
     if not os.path.isfile(modelopt_state_path):
         return
@@
-def _new_from_pretrained(cls, /, pretrained_model_name_or_path, *args, **kwargs):
+def _new_from_pretrained(cls, /, pretrained_model_name_or_path, *args, **kwargs):
@@
-    _restore_qtensor_wrappers(model, pretrained_model_name_or_path)
+    _restore_qtensor_wrappers(model, pretrained_model_name_or_path, **kwargs)

Also applies to: 107-107

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 74 - 76, The code
calls _get_modelopt_state_path(model_path) and returns early when
modelopt_state_path is not a local file, which skips _restore_qtensor_wrappers
when pretrained_model_name_or_path is a Hub ID; fix by resolving
pretrained_model_name_or_path to a local snapshot path before computing
modelopt_state_path (i.e., when calling _get_modelopt_state_path and before
invoking _restore_qtensor_wrappers), using the Hugging Face snapshot download
utility to convert Hub IDs into a local directory and then pass that resolved
local path into _get_modelopt_state_path and _restore_qtensor_wrappers so
wrapper metadata restoration runs for Hub-model inputs.
🧹 Nitpick comments (2)
modelopt/torch/opt/plugins/transformers.py (1)

133-139: Prefer keyword forwarding for load_config into the private Transformers helper.

Using positional arg here tightly couples to private signature ordering; forwarding as keyword is safer for v5+ drift.

Suggested tweak
 return tf_modeling_utils._modelopt_cache["_load_state_dict_into_zero3_model"](
-    model_to_load, state_dict, load_config
+    model_to_load, state_dict, load_config=load_config
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 133 - 139, The call
to the private helper
tf_modeling_utils._modelopt_cache["_load_state_dict_into_zero3_model"] uses
positional args which can break if its signature changes; update the call in
_load_params_and_buffers_into_zero3_model to forward load_config as a keyword
(e.g., load_config=load_config) and keep model_to_load and state_dict passed by
name to avoid positional coupling with the private helper.
modelopt/onnx/llm_export_utils/export_utils.py (1)

89-90: Use to_legacy_cache() API instead of direct key_cache/value_cache access for better maintainability.

Direct access to key_cache and value_cache creates tight coupling to internal cache structure. The Transformers v5.x public API provides to_legacy_cache() for converting cache objects to legacy tuple format, which is the recommended approach.

Suggested refactor
-        cache = outputs.past_key_values
-        past_key_values = tuple(zip(cache.key_cache, cache.value_cache))
+        cache = outputs.past_key_values
+        if hasattr(cache, "to_legacy_cache"):
+            past_key_values = cache.to_legacy_cache()
+        else:
+            past_key_values = tuple(zip(cache.key_cache, cache.value_cache))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/export_utils.py` around lines 89 - 90, The
code is directly reading cache.key_cache/value_cache from
outputs.past_key_values; replace that with the public API by calling
to_legacy_cache() on the cache object (e.g., use
outputs.past_key_values.to_legacy_cache() or cache.to_legacy_cache()) and assign
its return to past_key_values so you get the legacy tuple format without tightly
coupling to internal attributes; update the variable use from
cache/past_key_values accordingly and remove direct key_cache/value_cache
access.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 74-76: The code calls _get_modelopt_state_path(model_path) and
returns early when modelopt_state_path is not a local file, which skips
_restore_qtensor_wrappers when pretrained_model_name_or_path is a Hub ID; fix by
resolving pretrained_model_name_or_path to a local snapshot path before
computing modelopt_state_path (i.e., when calling _get_modelopt_state_path and
before invoking _restore_qtensor_wrappers), using the Hugging Face snapshot
download utility to convert Hub IDs into a local directory and then pass that
resolved local path into _get_modelopt_state_path and _restore_qtensor_wrappers
so wrapper metadata restoration runs for Hub-model inputs.

---

Nitpick comments:
In `@modelopt/onnx/llm_export_utils/export_utils.py`:
- Around line 89-90: The code is directly reading cache.key_cache/value_cache
from outputs.past_key_values; replace that with the public API by calling
to_legacy_cache() on the cache object (e.g., use
outputs.past_key_values.to_legacy_cache() or cache.to_legacy_cache()) and assign
its return to past_key_values so you get the legacy tuple format without tightly
coupling to internal attributes; update the variable use from
cache/past_key_values accordingly and remove direct key_cache/value_cache
access.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 133-139: The call to the private helper
tf_modeling_utils._modelopt_cache["_load_state_dict_into_zero3_model"] uses
positional args which can break if its signature changes; update the call in
_load_params_and_buffers_into_zero3_model to forward load_config as a keyword
(e.g., load_config=load_config) and keep model_to_load and state_dict passed by
name to avoid positional coupling with the private helper.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d833d30a-3607-43be-9bc5-df9cabc71f86

📥 Commits

Reviewing files that changed from the base of the PR and between 0af910b and 048f6f7.

📒 Files selected for processing (6)
  • examples/speculative_decoding/scripts/ar_validate.py
  • examples/speculative_decoding/scripts/export_hf_checkpoint.py
  • modelopt/onnx/llm_export_utils/export_utils.py
  • modelopt/torch/opt/plugins/transformers.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • tox.ini
✅ Files skipped from review due to trivial changes (1)
  • examples/speculative_decoding/scripts/export_hf_checkpoint.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tox.ini
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • examples/speculative_decoding/scripts/ar_validate.py

@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner March 25, 2026 12:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/export/model_config_export.py`:
- Around line 154-155: The code currently falls back to type(model).__name__
when model.config.architectures is missing, which can produce wrapper class
names and block the decoder-type fallback; change the assignment so that
architecture is architectures[0] if model.config.architectures is truthy,
otherwise set architecture to an empty string (i.e., use "" instead of
type(model).__name__). Update the code that defines architectures and
architecture in model_config_export.py (the architectures variable and the
architecture assignment) so downstream lookup against MODEL_MAP /
MODEL_NAME_TO_HF_ARCH_MAP can correctly fall back by decoder_type and so the
existing try-except KeyError path still behaves as intended.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca7fbde2-9a29-4930-914f-7004f05a8399

📥 Commits

Reviewing files that changed from the base of the PR and between 048f6f7 and a93ef69.

📒 Files selected for processing (3)
  • modelopt/torch/__init__.py
  • modelopt/torch/export/model_config_export.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/init.py

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch 4 times, most recently from 9d6ded6 to 8f66a3b Compare March 25, 2026 15:57
@kevalmorabia97 kevalmorabia97 changed the title Bump transformers to 5.0 Add experimental support for transformers >=5.0 Mar 25, 2026
@kevalmorabia97 kevalmorabia97 changed the title Add experimental support for transformers >=5.0 Add experimental support for transformers>=5.0 Mar 25, 2026
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch 2 times, most recently from 37f9810 to 0b5f34c Compare March 25, 2026 17:58
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 0b5f34c to 8f3726d Compare March 25, 2026 18:29
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant