-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Description
Currently, it is possible to get the same score multiple times by restarting CodeAssist and immediately exiting with Ctrl+C.
Each time the process exits and triggers training, the same completed episodes under persistent-data/state-service/episodes are used again, so previously earned rewards are effectively re-counted.
Steps to Reproduce
Start CodeAssist (for example: uv run run.py).
Solve at least one problem and wait until the score is saved.
Exit the process with Ctrl+C and let the training step finish.
Start CodeAssist again.
Without doing any new work, press Ctrl+C again and let training run.
Observe that the score increases as if the previous episodes were new.
Expected Behavior
Episodes that have already been used for training should not contribute to the score again on subsequent runs.
Restarting CodeAssist and exiting with Ctrl+C without new work should not increase the score.
Actual Behavior
Every time run.py triggers training on exit, training_loop.py appears to pass all existing episodes from persistent-data/state-service/episodes to train_from_episodes again, so completed episodes are reused and their rewards are counted multiple times.
Suspected Cause (Files)
In run.py, the shutdown flow calls run_training(config) on Ctrl+C but only cleans up incomplete episodes; completed episodes are kept and have no “consumed” flag or move.
In training_loop.py, run_training passes the entire episodes_dir_initial (defaulting to persistent-data/state-service/episodes) into policy_models.cli.run_tasks train_from_episodes on each invocation, without excluding episodes that were already trained on.
Possible Fix (Suggestion)
After a successful training run, move the episodes that were used into a separate directory (for example persistent-data/state-service/consumed-episodes) or mark them in their JSON metadata as already trained on.
Change the episode discovery logic to only include episodes that have not yet been consumed.