Refactor Parallel Lists Into Structured Data Models #185
MarwanMashra
started this conversation in
Ideas
Replies: 1 comment
-
|
Totally agree that this is desired, and that the current structure is getting a bit unwieldy (it was more sensible when we had fewer features, but alas). Currently there's a fair amount of co-development between verifiers and prime-rl, and I'm also looking at adding support for other trainer libraries. A refactor for encapsulating rollout state will eventually be done with all of this in mind, but will need to be strategic and have a migration roadmap that doesn't box out important planned features. Will take at your suggestions, thanks!! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Note
The following suggestion is intended to support the project's long-term maintainability and ease future evolution. However, its relevance depends on your current perspective: if you consider this project mostly complete and your primary focus is adding new environments, the benefits may seem marginal. On the other hand, if you envision substantial ongoing development, restructuring data patterns now could significantly simplify future improvements.
Additionally, I saw that verifiers was used in prime-rl, so I guess it also depends on how critical this project is within Prime Intellect's internal systems, or other open-source projects you're maintaining.
Suggestion
I noticed a common pattern in the code that consists of passing around separate lists and zipping them to reconstruct an object instead of grouping related elements in a single object. For example, you have
instead of
This pattern tend to create a lot of problems down the line, for example:
zipdoesn't allow for any list to be optional, since you iterate over all lists until hitting the shortest. So you can't havetasksandinfosbe optional, although bothtaskandinfoare optional for a single rollout.task='default'oranswer=''becomes very hard, since they're hard coded in all methods signatures.Additionally, grouping related attributes (e.g. the inputs of a rollout, the outputs of a rollout...etc) makes it easier to understand and follow when reading the code.
I also noticed some data structures that represents a collection of elements like
GenerateOutputsorRolloutScores. I see how they could be useful sometimes, but often defining a single element (in a pattern likelist[Element]instead ofElements) can have a lot of advantages. It could look something like thisI pushed a new branch on my fork refactor/rollouts_structure that partially implements a new structure for rollouts, just to give a sense of what it could look like. I haven't fully implemented it yet, since I wanted to get your perspective first before making broader changes.
I'd be happy to contribute more to the project , I think it has potential to benefit the community, and I like RL. Let me know what you think!
Beta Was this translation helpful? Give feedback.
All reactions