Latest release fails because of transformer key overlap?

Hey @Ingvarstep,
Thanks for cutting the new release! I think the latest release for flashDeberta is failing due to an issue with how the `_attn_implementation` key is defined? I think huggingface does an internal check to see if the `flash-attn` library is installed in order to load the model. I think just renaming the key used by flashDeberta should be enough to resolve the issue, but I'm not a 100% sure

The example in the README fails at this step:
```
>>> model = FlashDebertaV2Model.from_pretrained("microsoft/deberta-v3-base").to('cuda')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/flashdeberta/model.py", line 572, in __init__
    super().__init__(config)
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2076, in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2686, in _check_and_adjust_attn_implementation
    applicable_attn_implementation = self.get_correct_attn_implementation(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2714, in get_correct_attn_implementation
    self._flash_attn_2_can_dispatch(is_init_check)
  File "/app/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 2422, in _flash_attn_2_can_dispatch
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest release fails because of transformer key overlap? #7

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Latest release fails because of transformer key overlap? #7

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions