GRPO training fails in multi-process mode

**Expected Behavior**
GRPO training with Tunix in multi-process SPMD (e.g., 2 GPU nodes) should generate rollouts and compute advantages without failures.

**Actual Behavior**
Crash during rollout sampling when iterating a distributed `jax.Array`:
- `AssertionError` at `jax/_src/array.py:380`: `assert self.is_fully_replicated or self.is_fully_addressable`
- Triggered in `tunix/generate/sampler.py` while doing `zip(out_tokens, lengths)`

**Steps to Reproduce the Problem**
1. Add `jax.distributed.initialize()` at the beginning of the GRPO example script.
2. Run the GRPO demo
```
srun -u --label --ntasks=2 --ntasks-per-node=1 -c${SLURM_CPUS_ON_NODE} python <grpo-script>
```

**Checklist**
- [x] I have searched the existing issues for a similar bug report.
- [x] I have provided all the required information in the "Environment" section.
- [x] I have provided a minimal, reproducible example.

**Would you like to help us fix it?**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO training fails in multi-process mode #698

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO training fails in multi-process mode #698

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions