Add experimental support for transformers>=5.0#975
Add experimental support for transformers>=5.0#975kevalmorabia97 wants to merge 18 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaced many Hugging Face Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script
participant HF as "HuggingFace.from_pretrained"
participant ModelOpt as "ModelOpt plugin (_restore_qtensor_wrappers)"
participant FS as "modelopt_state.pth (FS)"
User->>Script: run (optional --trust_remote_code)
Script->>HF: from_pretrained(..., dtype=..., trust_remote_code=...)
HF-->>Script: returns model instance
Script->>ModelOpt: patched hook invoked after instantiation
ModelOpt->>FS: check for modelopt_state.pth
FS-->>ModelOpt: q_tensor_state (if present)
ModelOpt->>ModelOpt: re-wrap weights preserving QTensorWrapper metadata
ModelOpt-->>Script: model with restored wrappers
Script-->>User: continue (quantize/export/generate)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
3e28ada to
2b24815
Compare
|
|
/ok to test 2b24815 |
|
/ok to test 1f0726e |
1f0726e to
48b426f
Compare
|
/ok to test 48b426f |
48b426f to
0781ac7
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #975 +/- ##
==========================================
+ Coverage 70.21% 70.60% +0.39%
==========================================
Files 228 229 +1
Lines 25952 26055 +103
==========================================
+ Hits 18221 18396 +175
+ Misses 7731 7659 -72 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
modelopt/torch/opt/plugins/transformers.py (1)
74-76:⚠️ Potential issue | 🟠 MajorResolve Hub IDs to a local snapshot path before restoring
modelopt_state.Line 107 passes raw
pretrained_model_name_or_pathinto_restore_qtensor_wrappers; for Hub IDs this is not a local dir, so Line 75 returns early and wrapper metadata restoration is skipped.Proposed direction
-def _restore_qtensor_wrappers(model, model_path): +def _restore_qtensor_wrappers(model, model_path, **hf_kwargs): + model_path = _resolve_local_model_path(model_path, **hf_kwargs) modelopt_state_path = _get_modelopt_state_path(model_path) if not os.path.isfile(modelopt_state_path): return @@ -def _new_from_pretrained(cls, /, pretrained_model_name_or_path, *args, **kwargs): +def _new_from_pretrained(cls, /, pretrained_model_name_or_path, *args, **kwargs): @@ - _restore_qtensor_wrappers(model, pretrained_model_name_or_path) + _restore_qtensor_wrappers(model, pretrained_model_name_or_path, **kwargs)Also applies to: 107-107
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 74 - 76, The code calls _get_modelopt_state_path(model_path) and returns early when modelopt_state_path is not a local file, which skips _restore_qtensor_wrappers when pretrained_model_name_or_path is a Hub ID; fix by resolving pretrained_model_name_or_path to a local snapshot path before computing modelopt_state_path (i.e., when calling _get_modelopt_state_path and before invoking _restore_qtensor_wrappers), using the Hugging Face snapshot download utility to convert Hub IDs into a local directory and then pass that resolved local path into _get_modelopt_state_path and _restore_qtensor_wrappers so wrapper metadata restoration runs for Hub-model inputs.
🧹 Nitpick comments (2)
modelopt/torch/opt/plugins/transformers.py (1)
133-139: Prefer keyword forwarding forload_configinto the private Transformers helper.Using positional arg here tightly couples to private signature ordering; forwarding as keyword is safer for v5+ drift.
Suggested tweak
return tf_modeling_utils._modelopt_cache["_load_state_dict_into_zero3_model"]( - model_to_load, state_dict, load_config + model_to_load, state_dict, load_config=load_config )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/opt/plugins/transformers.py` around lines 133 - 139, The call to the private helper tf_modeling_utils._modelopt_cache["_load_state_dict_into_zero3_model"] uses positional args which can break if its signature changes; update the call in _load_params_and_buffers_into_zero3_model to forward load_config as a keyword (e.g., load_config=load_config) and keep model_to_load and state_dict passed by name to avoid positional coupling with the private helper.modelopt/onnx/llm_export_utils/export_utils.py (1)
89-90: Useto_legacy_cache()API instead of directkey_cache/value_cacheaccess for better maintainability.Direct access to
key_cacheandvalue_cachecreates tight coupling to internal cache structure. The Transformers v5.x public API providesto_legacy_cache()for converting cache objects to legacy tuple format, which is the recommended approach.Suggested refactor
- cache = outputs.past_key_values - past_key_values = tuple(zip(cache.key_cache, cache.value_cache)) + cache = outputs.past_key_values + if hasattr(cache, "to_legacy_cache"): + past_key_values = cache.to_legacy_cache() + else: + past_key_values = tuple(zip(cache.key_cache, cache.value_cache))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/export_utils.py` around lines 89 - 90, The code is directly reading cache.key_cache/value_cache from outputs.past_key_values; replace that with the public API by calling to_legacy_cache() on the cache object (e.g., use outputs.past_key_values.to_legacy_cache() or cache.to_legacy_cache()) and assign its return to past_key_values so you get the legacy tuple format without tightly coupling to internal attributes; update the variable use from cache/past_key_values accordingly and remove direct key_cache/value_cache access.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 74-76: The code calls _get_modelopt_state_path(model_path) and
returns early when modelopt_state_path is not a local file, which skips
_restore_qtensor_wrappers when pretrained_model_name_or_path is a Hub ID; fix by
resolving pretrained_model_name_or_path to a local snapshot path before
computing modelopt_state_path (i.e., when calling _get_modelopt_state_path and
before invoking _restore_qtensor_wrappers), using the Hugging Face snapshot
download utility to convert Hub IDs into a local directory and then pass that
resolved local path into _get_modelopt_state_path and _restore_qtensor_wrappers
so wrapper metadata restoration runs for Hub-model inputs.
---
Nitpick comments:
In `@modelopt/onnx/llm_export_utils/export_utils.py`:
- Around line 89-90: The code is directly reading cache.key_cache/value_cache
from outputs.past_key_values; replace that with the public API by calling
to_legacy_cache() on the cache object (e.g., use
outputs.past_key_values.to_legacy_cache() or cache.to_legacy_cache()) and assign
its return to past_key_values so you get the legacy tuple format without tightly
coupling to internal attributes; update the variable use from
cache/past_key_values accordingly and remove direct key_cache/value_cache
access.
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 133-139: The call to the private helper
tf_modeling_utils._modelopt_cache["_load_state_dict_into_zero3_model"] uses
positional args which can break if its signature changes; update the call in
_load_params_and_buffers_into_zero3_model to forward load_config as a keyword
(e.g., load_config=load_config) and keep model_to_load and state_dict passed by
name to avoid positional coupling with the private helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d833d30a-3607-43be-9bc5-df9cabc71f86
📒 Files selected for processing (6)
examples/speculative_decoding/scripts/ar_validate.pyexamples/speculative_decoding/scripts/export_hf_checkpoint.pymodelopt/onnx/llm_export_utils/export_utils.pymodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/plugins/transformers_trainer.pytox.ini
✅ Files skipped from review due to trivial changes (1)
- examples/speculative_decoding/scripts/export_hf_checkpoint.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tox.ini
- modelopt/torch/quantization/plugins/transformers_trainer.py
- examples/speculative_decoding/scripts/ar_validate.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/export/model_config_export.py`:
- Around line 154-155: The code currently falls back to type(model).__name__
when model.config.architectures is missing, which can produce wrapper class
names and block the decoder-type fallback; change the assignment so that
architecture is architectures[0] if model.config.architectures is truthy,
otherwise set architecture to an empty string (i.e., use "" instead of
type(model).__name__). Update the code that defines architectures and
architecture in model_config_export.py (the architectures variable and the
architecture assignment) so downstream lookup against MODEL_MAP /
MODEL_NAME_TO_HF_ARCH_MAP can correctly fall back by decoder_type and so the
existing try-except KeyError path still behaves as intended.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ca7fbde2-9a29-4930-914f-7004f05a8399
📒 Files selected for processing (3)
modelopt/torch/__init__.pymodelopt/torch/export/model_config_export.pymodelopt/torch/quantization/plugins/transformers_trainer.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/torch/init.py
9d6ded6 to
8f66a3b
Compare
37f9810 to
0b5f34c
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
0b5f34c to
8f3726d
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
trust_remote_code=TrueTesting
CI/CD tests passing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True, usingtorch.load(..., weights_only=True), avoidingpickle, etc.).Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation
Chores
Tests