Skip to content

grpo_demo.ipynb: Training reduces accuracy from 53.0% to 49.0% #688

@cool-RR

Description

@cool-RR

I'm trying to reproduce the Gemma fine-tuning shown in your blog post. I ran the grpo_demo.ipynb notebook on a v5e-4 machine on GCP.

Expected Behavior

I expected to see a performance increase like 52.67% to 64.06% as shown in your blog post.

Actual Behavior

After about an hour the training finished, and I saw a reduction in accuracy from 53% to 49%:

Before training: corr=53, total=100, accuracy=53.0%, partial_accuracy=61.0%, format_accuracy=65.0%
After training: corr=49, total=100, accuracy=49.0%, partial_accuracy=56.00000000000001%, format_accuracy=91.0%

I uploaded the notebook to Colab for you to inspect the output: https://colab.research.google.com/drive/1HfFkurkr-FSuDF_YZbFcPHFGNS5BI0i5 Note that I did not run this on Colab but rather on GCP, because Colab won't provide v5e-4 as far as I know.

Steps to Reproduce the Problem

  1. Run the notebook.

Environment

  • OS: Ubuntu 22.04.3 LTS
  • Project Version: git+https://github.com/google/tunix@f0d5d1e63e24f42647fd4e6122641d689f8bfd0e

Checklist

  • I have searched the existing issues for a similar bug report.
  • I have provided all the required information in the "Environment" section.
  • I have provided a minimal, reproducible example.

Would you like to help us fix it?

Yes.

Metadata

Metadata

Assignees

Labels

type:supportFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions