Skip to content

Conversation

@Ja1Zhou
Copy link

@Ja1Zhou Ja1Zhou commented Feb 2, 2025

Description

Supporting OLMo2 following commits regarding OLMoe.

Question

I wonder if I missed anything. I see that the logits of my implementation do not match those from the hf implementation. Review and help would be so much appreciated.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jonasrohw
Copy link
Owner

@Ja1Zhou Cool. I will take a look in the next 1-2 days and get back to you!

@jonasrohw
Copy link
Owner

@Ja1Zhou Sorry for taking so long to respond. Was quite busy. I fixed our issue in: b1fd04b, it might need more refactoring, but Olmo2 applies normalization after Attention/MLP. Currently library has no cfg for this.

@Ja1Zhou
Copy link
Author

Ja1Zhou commented Feb 17, 2025

@Ja1Zhou Sorry for taking so long to respond. Was quite busy. I fixed our issue in: b1fd04b, it might need more refactoring, but Olmo2 applies normalization after Attention/MLP. Currently library has no cfg for this.

No worries. Would try it out. Thanks for looking into this!

@dabin-k
Copy link

dabin-k commented Mar 24, 2025

Hi! Could we add support for the SFT* and DPO** models as well when we get round to it?
*allenai/OLMo-2-1124-7B-SFT
**allenai/OLMo-2-1124-7B-DPO

@jleechung jleechung mentioned this pull request Jul 22, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants