Describe the Bug
The current code relies on the side effect of import fla to register FLA model types such as transformer.
However, with flash-linear-attention==0.5.0, top-level import fla no longer imports fla.models, so AutoConfig.register(...) is not triggered.
As a result, config files with:
"model_type": "transformer"
fail to load with:
KeyError: 'transformer'
ValueError: The checkpoint you are trying to load has model type `transformer`
but Transformers does not recognize this architecture.
This breaks both training and checkpoint conversion.
Suggested local fix:
instead of:
Affected places include:
flame/train.py
flame/utils/convert_dcp_to_hf.py
flame/utils/convert_hf_to_dcp.py
train.sh config parsing one-liner
Steps to Reproduce the Bug
Install flash-linear-attention==0.5.0, then run:
python - <<'PY'
import fla
from transformers.models.auto.configuration_auto import CONFIG_MAPPING
print("transformer" in CONFIG_MAPPING)
PY
Outputs:
Expected Behavior
Install flash-linear-attention==0.5.0, then run:
python - <<'PY'
import fla.models
from transformers.models.auto.configuration_auto import CONFIG_MAPPING
print("transformer" in CONFIG_MAPPING)
PY
Outputs:
Environment Information
- Torch: 2.12.0+cu130
- Triton: 3.7.0
Describe the Bug
The current code relies on the side effect of
import flato register FLA model types such astransformer.However, with
flash-linear-attention==0.5.0, top-levelimport flano longer importsfla.models, soAutoConfig.register(...)is not triggered.As a result, config files with:
fail to load with:
This breaks both training and checkpoint conversion.
Suggested local fix:
instead of:
Affected places include:
Steps to Reproduce the Bug
Install
flash-linear-attention==0.5.0, then run:Outputs:
Expected Behavior
Install
flash-linear-attention==0.5.0, then run:Outputs:
Environment Information