ARC-AGI-2 example + Event-based early stopping#375
Merged
codelion merged 1 commit intoalgorithmicsuperintelligence:mainfrom Jan 28, 2026
Merged
Conversation
…t-based" early stopping functionality
codelion
added a commit
that referenced
this pull request
Jan 28, 2026
Add comprehensive tests for recently merged PRs: - test_llm_config_optional_params.py: Tests for optional temperature/top_p parameters (PR #385 - Anthropic model compatibility) - test_snapshot_artifacts_limit.py: Tests for configurable max_snapshot_artifacts (PR #386) - test_visualization_sanitization.py: Tests for -inf/+inf/NaN sanitization in visualization (PR #384) - test_early_stopping_config.py: Tests for event-based early stopping configuration (PR #375) - test_changes_description.py: Tests for large codebase support via changes description (PR #376) Total tests increased from 264 to 326. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2 tasks
codelion
added a commit
that referenced
this pull request
Jan 28, 2026
* Make max snapshot artifacts limit configurable Add `database.max_snapshot_artifacts` config option to control how many program artifacts are included in worker process snapshots. Default remains 100 for backward compatibility. - Set to a higher number to include more artifacts in prompts - Set to `null` (None) for unlimited artifacts (use with caution for large populations as this can significantly increase memory usage) Note: This limit only affects artifacts passed to worker processes, not the total artifacts stored. All program code is always available regardless of this setting. Closes #383 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add tests for recent features Add comprehensive tests for recently merged PRs: - test_llm_config_optional_params.py: Tests for optional temperature/top_p parameters (PR #385 - Anthropic model compatibility) - test_snapshot_artifacts_limit.py: Tests for configurable max_snapshot_artifacts (PR #386) - test_visualization_sanitization.py: Tests for -inf/+inf/NaN sanitization in visualization (PR #384) - test_early_stopping_config.py: Tests for event-based early stopping configuration (PR #375) - test_changes_description.py: Tests for large codebase support via changes description (PR #376) Total tests increased from 264 to 326. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add integration tests for example validation Add comprehensive integration tests that verify: - Example config files load correctly - Initial programs have EVOLVE-BLOCK markers - Evaluators exist and have required functions - Evaluators can run on initial programs - Cascade evaluation functions are detected - Database stores and retrieves programs correctly - Program evolution tracking works Tests cover function_minimization, circle_packing, and signal_processing examples, plus general structure validation for all examples. Total tests: 346 (was 326) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I added an example setup to run ARC-AGI-2 tasks individually using open evolve. The relevant code files including
initial_program.py,evaluator.py, etc. can be found inexamples/arc_benchmark, along with the relevant details in theREADME.mdfile. To stop this evolution process early, I needed to add an "event-based" early stopping feature, ie- stop the evolution process early if the task is solved (stop early whencombined_scorereaches1.0). I kept the early stopping parameters the same within the config file. settingearly_stopping_patience<0will trigger event-based early stopping. In this setting whencurrent_score == convergence_threshold, early stopping will be triggered. I have kept most of the code for the original early stopping method untouched to ensure backward compatibility.