Refactor Parallel Lists Into Structured Data Models #185

MarwanMashra · 2025-08-03T13:19:51Z

MarwanMashra
Aug 3, 2025

Note

The following suggestion is intended to support the project's long-term maintainability and ease future evolution. However, its relevance depends on your current perspective: if you consider this project mostly complete and your primary focus is adding new environments, the benefits may seem marginal. On the other hand, if you envision substantial ongoing development, restructuring data patterns now could significantly simplify future improvements.

Additionally, I saw that verifiers was used in prime-rl, so I guess it also depends on how critical this project is within Prime Intellect's internal systems, or other open-source projects you're maintaining.

Suggestion

I noticed a common pattern in the code that consists of passing around separate lists and zipping them to reconstruct an object instead of grouping related elements in a single object. For example, you have

async def run_rollouts(
        self,
        client: AsyncOpenAI,
        model: str,
        prompts: list[Messages],
        answers: list[str],
        tasks: list[str],
        infos: list[Info],
        sampling_args: SamplingArgs | None = None,
        max_concurrent: int = -1,
        **kwargs,
    ) -> list[tuple[Messages, State]]:

instead of

class RolloutRequest(BaseModel):
    """Pydantic model for rollout requests."""

    prompt: Messages
    answer: str = ""
    task: str = "default"
    info: Info = Field(default_factory=dict)

async def run_rollouts(
        self,
        client: AsyncOpenAI,
        model: str,
        requests: list[RolloutRequest],
        sampling_args: SamplingArgs | None = None,
        max_concurrent: int = -1,
        **kwargs,
    ) -> list[tuple[Messages, State]]:

This pattern tend to create a lot of problems down the line, for example:

using zip doesn't allow for any list to be optional, since you iterate over all lists until hitting the shortest. So you can't have tasks and infos be optional, although both task and info are optional for a single rollout.
managing the default values such as task='default' or answer='' becomes very hard, since they're hard coded in all methods signatures.
not having a single source of truth that defines your data structure (e.g. what data constitutes the input for a rollout) makes the code harder maintain. Adding/removing an attribute, or updating its type hint would require changes in too many places and can becomes intractable.

Additionally, grouping related attributes (e.g. the inputs of a rollout, the outputs of a rollout...etc) makes it easier to understand and follow when reading the code.

I also noticed some data structures that represents a collection of elements like GenerateOutputs or RolloutScores. I see how they could be useful sometimes, but often defining a single element (in a pattern like list[Element] instead of Elements) can have a lot of advantages. It could look something like this

    async def a_generate(
        self,
        inputs: list[RolloutRequest] | Dataset,
        client: AsyncOpenAI | None = None,
        model: str | None = None,
        sampling_args: SamplingArgs | None = None,
        score_rollouts: bool = True,
        max_concurrent: int = -1,
        **kwargs,
    ) -> list[GenerationOutput]:
        """
        Generate completions and rewards for a given set of inputs.
        """
        # use class-level client and model if not provided
        # code here

        if isinstance(inputs, Dataset):
            inputs = RolloutRequest.from_dataset(inputs)

        rollout_results: list[RolloutResult] = await self.run_rollouts(
            requests=inputs,
            client=client,
            model=model,
            sampling_args=gen_sampling_args,
            max_concurrent=max_concurrent,
            **kwargs,
        )

        if score_rollouts:
            rollout_scores: list[RolloutScore] = await self.rubric.score_rollouts(
                requests=inputs,
                results=rollout_results,
                apply_weights=True,
            ) 
        
        # code here

I pushed a new branch on my fork refactor/rollouts_structure that partially implements a new structure for rollouts, just to give a sense of what it could look like. I haven't fully implemented it yet, since I wanted to get your perspective first before making broader changes.

I'd be happy to contribute more to the project , I think it has potential to benefit the community, and I like RL. Let me know what you think!

willccbb · 2025-08-03T20:51:57Z

willccbb
Aug 3, 2025
Maintainer

Totally agree that this is desired, and that the current structure is getting a bit unwieldy (it was more sensible when we had fewer features, but alas). Currently there's a fair amount of co-development between verifiers and prime-rl, and I'm also looking at adding support for other trainer libraries. A refactor for encapsulating rollout state will eventually be done with all of this in mind, but will need to be strategic and have a migration roadmap that doesn't box out important planned features.

Will take at your suggestions, thanks!!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Parallel Lists Into Structured Data Models #185

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Refactor Parallel Lists Into Structured Data Models #185

Uh oh!

MarwanMashra Aug 3, 2025

Note

Suggestion

Replies: 1 comment

Uh oh!

willccbb Aug 3, 2025 Maintainer

MarwanMashra
Aug 3, 2025

willccbb
Aug 3, 2025
Maintainer