WoRA integration into PEFT#2872
Conversation
…pha/beta parameters
- Add WoRA to test variant map in test_lora_variants.py - Add test case for WoRA variant application to all layer types - Add test for WoRA alpha/beta parameter gradients - Fix WoRA parameter initialization in Embedding.update_layer - Fix WoRA parameter initialization in _ConvNd.update_layer - Fix WoraEmbeddingLayer to include alpha in computation - Fix WoraConvNdLayer gradient flow for alpha/beta parameters - Transpose embedding matrices in WoraEmbeddingVariant.forward - Add embed_scale support in WoraEmbeddingVariant All WoRA tests now pass successfully.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
no not stale . and since when did PR's start going stale ? |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
no not stale . and since when did PR's start going stale ? |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
dear github , can you please stop staling my prs ??? |
WoRA (Weighted-Direction Low-Rank Adaptation) Implementation for PEFT
Summary
This pull request adds support for WoRA (Weighted-Direction Low-Rank Adaptation), a novel extension of DoRA that introduces learnable scalar parameters (alpha and beta) to create a weighted combination of the base weights and LoRA adapters. WoRA provides more fine-grained control over the adaptation process compared to standard LoRA and DoRA.
Fixes : #2861
Analysis and Understanding
WoRA Formula
WoRA extends DoRA by introducing two learnable scalar parameters:
Where:
mis the learned magnitude vector (from DoRA)W₀is the base weight matrixBAis the LoRA decomposition (B × A)α(alpha) controls the LoRA contributionβ(beta) controls the base weight contributionscalingis the LoRA scaling factorKey Insights
LoraVariant Pattern: The existing DoRA implementation uses a clean separation between:
wora.py) that handle forward computationvariants.py) that handle initialization and variant-specific logicParameter Naming Convention: PEFT automatically marks parameters as trainable if their names contain "lora_". This is why we use
lora_wora_alphaandlora_wora_betaParameterDict Storage: Using
nn.ParameterDictensures parameters are:Layer-Specific Challenges:
Implementation Approach
1. Core Architecture (wora.py)
Created four main layer classes:
WoraLinearLayer: Base implementation for linear transformationsWoraEmbeddingLayer: Handles token embeddings with proper matrix transposition_WoraConvNdLayer: Base class for convolutional layersWoraConv1dLayer,WoraConv2dLayer,WoraConv3dLayer: Specialized conv layersKey Design Decisions:
.item()) to avoid affecting the norm computation2. Variant Classes (variants.py)
Implemented five variant classes following PEFT's LoraVariant pattern:
WoraLinearVariantWoraEmbeddingVariantWoraConv1dVariant,WoraConv2dVariant,WoraConv3dVariantEach variant handles:
init(): Creating and initializing WoRA-specific parametersforward(): Calling the appropriate layer forward methodmerge_safe/merge_unsafe(): Merging adapters with base weightsunmerge(): Restoring original weights3. Parameter Initialization (layer.py)
Modified three key methods to initialize WoRA parameters:
LoraLayer.update_layer(): Base implementation for Linear layersEmbedding.update_layer(): Special handling for embedding layers_ConvNd.update_layer(): Handling for convolutional layersInitialization Pattern:
4. Configuration (config.py)
Added
use_woraboolean flag toLoraConfigwith proper validation:Falsefor backward compatibilityTrue5. Testing (test_lora_variants.py)
Added comprehensive tests:
test_variant_is_applied_to_layers: Verifies WoRA variants are correctly applied to all layer typestest_wora_params_have_gradients: Ensures alpha and beta parameters receive gradients during backpropagationKey Technical Challenges and Solutions
Challenge 1: Gradient Flow for Alpha and Beta
Problem: Initial implementation used
.item()to convert Parameters to scalars throughout the computation, breaking gradient flow.Solution:
Challenge 2: Embedding Layer Matrix Dimensions
Problem: Embedding layers store lora_embedding_A and lora_embedding_B with shapes that need transposition before use.
Solution:
lora_embedding_A.Tandlora_embedding_B.TChallenge 3: Parameter Initialization in Override Methods
Problem:
Embeddingand_ConvNdclasses overrideupdate_layer()without callingsuper(), so they missed WoRA parameter initialization.Solution:
requires_grad_(True)to ensure trainabilityChallenge 4: Conv Layer Forward Pass
Problem: Convolutional layers have more complex forward logic with bias handling and reshaping requirements.
Solution:
Verification and Testing
Test Coverage
The implementation includes two parametrized tests that cover:
Variant Application Test: Verifies that:
Gradient Flow Test: Verifies that:
Test Results
All tests pass successfully:

Files Modified
src/peft/tuners/lora/config.py: Addeduse_woraconfiguration parametersrc/peft/tuners/lora/layer.py: Added WoRA parameter initialization in update_layer methodssrc/peft/tuners/lora/wora.py: Implemented WoRA layer classessrc/peft/tuners/lora/variants.py: Implemented WoRA variant classestests/test_lora_variants.py: Added comprehensive WoRA testsBackward Compatibility
This implementation maintains full backward compatibility:
cc: @BenjaminBossan