Skip to content

Latest release fails because of transformer key overlap? #7

@vrdn-23

Description

@vrdn-23

Hey @Ingvarstep,
Thanks for cutting the new release! I think the latest release for flashDeberta is failing due to an issue with how the _attn_implementation key is defined? I think huggingface does an internal check to see if the flash-attn library is installed in order to load the model. I think just renaming the key used by flashDeberta should be enough to resolve the issue, but I'm not a 100% sure

The example in the README fails at this step:

>>> model = FlashDebertaV2Model.from_pretrained("microsoft/deberta-v3-base").to('cuda')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/flashdeberta/model.py", line 572, in __init__
    super().__init__(config)
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2076, in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2686, in _check_and_adjust_attn_implementation
    applicable_attn_implementation = self.get_correct_attn_implementation(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2714, in get_correct_attn_implementation
    self._flash_attn_2_can_dispatch(is_init_check)
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2422, in _flash_attn_2_can_dispatch
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions