Dear authors, thanks for your great work. I tried the method on Qwen2.5-7B-Instruct, but the model performance drops drastically. I have tried to tune the L1 regularization weight within [0.05, 0.1, 0.15, 0.2, 0.5, 1], but none of them leads to a satisfying pattern. I wonder whether you have tried other model architectures not mentioned in the paper. Do you have suggestions on how to generalize this method to other models?