Conversation
There was a problem hiding this comment.
Pull request overview
Adds a --critic-train-only mode intended to support running critic training without full actor/critic coupling, and adjusts a few initialization/sync paths to account for this mode.
Changes:
- Add
--critic-train-onlyCLI flag. - Change Megatron
args.world_sizecomputation whencritic_train_onlyis enabled. - Skip actor↔critic connection and adjust rollout manager wiring / data sync behavior in critic-only mode.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
slime/utils/arguments.py |
Introduces --critic-train-only and changes how world_size is set for Megatron. |
slime/ray/placement_group.py |
Adjusts training model initialization to avoid actor↔critic coupling when critic-only. |
slime/backends/megatron_utils/actor.py |
Avoids actor↔critic data sync during critic training when critic-only. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if args.critic_train_only: | ||
| args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node | ||
| else: | ||
| args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node |
There was a problem hiding this comment.
args.world_size is used to validate/derive Megatron parallelism defaults, but in critic_train_only mode it’s set from critic node/GPU counts even though the codebase still initializes an actor RayTrainGroup (with potentially different actor node/GPU counts). This can lead to incompatible TP/PP/DP settings and runtime failures when the actor group initializes. Consider keeping args.world_size based on the actor group (status quo) unless you also skip actor initialization in critic_train_only, or add validation that actor/critic world sizes must match when this flag is set (or introduce separate actor/critic world-size configs).
| if args.critic_train_only: | |
| args.world_size = args.critic_num_nodes * args.critic_num_gpus_per_node | |
| else: | |
| args.world_size = args.actor_num_nodes * args.actor_num_gpus_per_node | |
| actor_world_size = args.actor_num_nodes * args.actor_num_gpus_per_node | |
| # NOTE: | |
| # Megatron parallelism defaults and validations are derived from args.world_size, | |
| # and the training actor RayTrainGroup is also defined based on the actor world size. | |
| # To avoid mismatches, always keep args.world_size aligned with the actor group. | |
| if getattr(args, "critic_train_only", False): | |
| critic_num_nodes = getattr(args, "critic_num_nodes", None) | |
| critic_num_gpus_per_node = getattr(args, "critic_num_gpus_per_node", None) | |
| if critic_num_nodes is not None and critic_num_gpus_per_node is not None: | |
| critic_world_size = critic_num_nodes * critic_num_gpus_per_node | |
| if critic_world_size != actor_world_size: | |
| raise ValueError( | |
| f"In critic_train_only mode, actor and critic world sizes must match, " | |
| f"but got actor_world_size={actor_world_size} " | |
| f"and critic_world_size={critic_world_size}." | |
| ) | |
| args.world_size = actor_world_size |
| actor_model.set_rollout_manager(rollout_manager) | ||
| if args.critic_train_only: | ||
| critic_model.set_rollout_manager(rollout_manager) |
There was a problem hiding this comment.
When --critic-train-only is set but --use-critic is not, critic_model is None and this will raise an AttributeError. Either validate that critic_train_only implies use_critic, or guard this block with if args.use_critic (and provide a clear error message if not).
There was a problem hiding this comment.
we can always run
critic_model.set_rollout_manager(rollout_manager)
maybe change if args.critic_train_only to if use_critic and remove the critic.set_rollout_manager in actor_model.set_rollout_manager
slime/ray/placement_group.py
Outdated
| ) | ||
| ) | ||
| if not args.critic_train_only: | ||
| assert len(set(start_rollout_ids)) == 1 |
There was a problem hiding this comment.
the critic training should also load the start_rollout_ids.
| actor_model.set_rollout_manager(rollout_manager) | ||
| if args.critic_train_only: | ||
| critic_model.set_rollout_manager(rollout_manager) |
There was a problem hiding this comment.
we can always run
critic_model.set_rollout_manager(rollout_manager)
maybe change if args.critic_train_only to if use_critic and remove the critic.set_rollout_manager in actor_model.set_rollout_manager
No description provided.