-
Notifications
You must be signed in to change notification settings - Fork 150
Open
Labels
type:supportFurther information is requestedFurther information is requested
Description
I'm trying to reproduce the Gemma fine-tuning shown in your blog post. I ran the grpo_demo.ipynb notebook on a v5e-4 machine on GCP.
Expected Behavior
I expected to see a performance increase like 52.67% to 64.06% as shown in your blog post.
Actual Behavior
After about an hour the training finished, and I saw a reduction in accuracy from 53% to 49%:
Before training: corr=53, total=100, accuracy=53.0%, partial_accuracy=61.0%, format_accuracy=65.0%
After training: corr=49, total=100, accuracy=49.0%, partial_accuracy=56.00000000000001%, format_accuracy=91.0%
I uploaded the notebook to Colab for you to inspect the output: https://colab.research.google.com/drive/1HfFkurkr-FSuDF_YZbFcPHFGNS5BI0i5 Note that I did not run this on Colab but rather on GCP, because Colab won't provide v5e-4 as far as I know.
Steps to Reproduce the Problem
- Run the notebook.
Environment
- OS: Ubuntu 22.04.3 LTS
- Project Version:
git+https://github.com/google/tunix@f0d5d1e63e24f42647fd4e6122641d689f8bfd0e
Checklist
- I have searched the existing issues for a similar bug report.
- I have provided all the required information in the "Environment" section.
- I have provided a minimal, reproducible example.
Would you like to help us fix it?
Yes.
Metadata
Metadata
Assignees
Labels
type:supportFurther information is requestedFurther information is requested