Skip to content

Commit 2f8274e

Browse files
committed
docs: Refactor the documentation structure
1 parent 0ce972c commit 2f8274e

File tree

117 files changed

+409
-223
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+409
-223
lines changed

README.md

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,14 @@ Leveraging a multi-role distributed architecture with Ray for flexible resource
4646
| **[11/08/2025]** 🎉 Our [ROCK: Reinforcement Open Construction Kit](https://github.com/alibaba/ROCK) released, Explore the new capabilities!. |
4747
| **[10/23/2025]** 🎉 Our Papers released, see [Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning](https://arxiv.org/abs/2510.01656) and [Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization](https://arxiv.org/abs/2510.13554). |
4848
| **[10/14/2025]** 🎉 Our Paper released, see [Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony](https://arxiv.org/abs/2510.11345). |
49-
| **[09/28/2025]** 🎉 Ascend NPU support — see [usage guide](https://alibaba.github.io/ROLL/docs/English/UserGuide/ascend/ascend_usage). |
49+
| **[09/28/2025]** 🎉 Ascend NPU support — see [usage guide](https://alibaba.github.io/ROLL/docs/User%20Guides/Hardware%20Support/ascend_usage). |
5050
| **[09/25/2025]** 🎉 Our Paper released, see [RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training](https://arxiv.org/abs/2509.21009) |
5151
| **[09/24/2025]** 🎉 Support [Wan2_2 Reward FL pipeline](examples/wan2.2-14B-reward_fl_ds/reward_fl_config.yaml). Explore the new capabilities! |
5252
| **[09/23/2025]** 🎉 ROLL aligns with GEM environment definition, providing agentic Tool Use training capabilities, [ToolUse docs](docs_roll/docs/English/UserGuide/agentic/Tool_Use.md). |
5353
| **[09/16/2025]** 🎉 Qwen3-Next model training is supported, refer to [configuration](examples/qwen3-next-80BA3B-rlvr_megatron/rlvr_config.yaml). |
5454
| **[09/04/2025]** 🎉 ROLL supports vLLM dynamic FP8 rollout and remove_padding for acceleration. |
5555
| **[08/28/2025]** 🎉 ROLL supports SFT pipeline, refer to [configuration](examples/qwen2.5-7B-sft_megatron/sft_config.yaml). |
56-
| **[08/13/2025]** 🎉 ROLL supports AMD GPUs with out-of-box image docker and Dockerfile and specific yamls under `examples/` directory. Please refer to [Installation](https://alibaba.github.io/ROLL/docs/English/QuickStart/installation). |
56+
| **[08/13/2025]** 🎉 ROLL supports AMD GPUs with out-of-box image docker and Dockerfile and specific yamls under `examples/` directory. Please refer to [Installation](https://alibaba.github.io/ROLL/docs/Getting%20Started/Installation/). |
5757
| **[08/11/2025]** 🎉 Our Paper released, see [Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning](https://arxiv.org/abs/2508.08221). |
5858
| **[08/10/2025]** 🎉 Agentic RL supports [stepwise learning](examples/qwen2.5-0.5B-agentic/agent_val_frozen_lake_gigpo.yaml), like [GiGPO](https://arxiv.org/abs/2505.10978); Distill supports [VLM](examples/qwen2.5-vl-7B-distill/distill_vl_megatron.yaml). Explore the new capabilities! |
5959
| **[08/06/2025]** 🎉 ROLL PPT is now available, [Slides](assets/ROLL%20高效且用户友好的大模型RL训练框架.pdf). |
@@ -72,52 +72,52 @@ Leveraging a multi-role distributed architecture with Ray for flexible resource
7272
[Documents](https://alibaba.github.io/ROLL/)
7373

7474
### Quick Start
75-
[Installation](https://alibaba.github.io/ROLL/docs/English/QuickStart/installation)
76-
[Config System Explanation](https://alibaba.github.io/ROLL/docs/English/QuickStart/config_system)
77-
[Debugging Guide](https://alibaba.github.io/ROLL/docs/English/QuickStart/debugging_guide_en)
78-
[Trackers and Metrics](https://alibaba.github.io/ROLL/docs/English/UserGuide/trackers_and_metrics)
79-
[Checkpoint Saving and Resuming Guide](https://alibaba.github.io/ROLL/docs/English/UserGuide/checkpoint_and_resume)
80-
[Converting MCoreAdapter Models to Hugging Face Format](https://alibaba.github.io/ROLL/docs/English/UserGuide/megatron_convert_2_hf)
81-
[Quick Start: Single-Node Deployment Guide](https://alibaba.github.io/ROLL/docs/English/QuickStart/single_node_quick_start)
82-
[Quick Start: Multi-Node Deployment Guide](https://alibaba.github.io/ROLL/docs/English/QuickStart/multi_node_quick_start)
83-
[Frequently Asked Questions](https://alibaba.github.io/ROLL/docs/English/QuickStart/qa_issues)
75+
[Installation](https://alibaba.github.io/ROLL/docs/Getting%20Started/Installation/)
76+
[Config System Explanation](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/config_system)
77+
[Debugging Guide](https://alibaba.github.io/ROLL/docs/Getting%20Started/Debugging%20Guide/debug_guide)
78+
[Trackers and Metrics](https://alibaba.github.io/ROLL/docs/User%20Guides/Tracker%20and%20Metrics/trackers_and_metrics)
79+
[Checkpoint Saving and Resuming Guide](https://alibaba.github.io/ROLL/docs/User%20Guides/Advanced%20Features/checkpoint_and_resume)
80+
[Converting MCoreAdapter Models to Hugging Face Format](https://alibaba.github.io/ROLL/docs/User%20Guides/Advanced%20Features/megatron_convert_2_hf)
81+
[Quick Start: Single-Node Deployment Guide](https://alibaba.github.io/ROLL/docs/Getting%20Started/Quick%20Start/single_node_quick_start)
82+
[Quick Start: Multi-Node Deployment Guide](https://alibaba.github.io/ROLL/docs/Getting%20Started/Quick%20Start/multi_nodes_quick_start)
83+
[Frequently Asked Questions](https://alibaba.github.io/ROLL/docs/Getting%20Started/FAQ/qa_issues)
8484

8585
### UserGuide
8686

8787
#### Pipeline Step by Step
88-
[RLVR Pipeline](https://alibaba.github.io/ROLL/docs/English/UserGuide/pipeline/rlvr_pipeline_start)
89-
[Agentic Pipeline](https://alibaba.github.io/ROLL/docs/English/UserGuide/pipeline/agentic_pipeline_start)
90-
[Agentic Comprehensive Guide](https://alibaba.github.io/ROLL/docs/English/UserGuide/pipeline/agent_pipeline_start)
91-
[Distill Pipeline](https://alibaba.github.io/ROLL/docs/English/UserGuide/pipeline/distill_pipeline_start)
88+
[RLVR Pipeline](https://alibaba.github.io/ROLL/docs/User%20Guides/Pipeline/rlvr_pipeline_start)
89+
[Agentic Pipeline](https://alibaba.github.io/ROLL/docs/User%20Guides/Pipeline/agentic_pipeline_start)
90+
[Agentic Comprehensive Guide](https://alibaba.github.io/ROLL/docs/User%20Guides/Pipeline/agent_pipeline_start)
91+
[Distill Pipeline](https://alibaba.github.io/ROLL/docs/User%20Guides/Pipeline/distill_pipeline_start)
9292

9393
#### Algorithms
94-
[Reinforce++](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/Reinforce_Plus_Plus)
95-
[TOPR](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/TOPR)
96-
[GiGPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/agentic_GiGPO)
97-
[PPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/PPO)
98-
[Lite PPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/LitePPO)
99-
[GRPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/GRPO)
100-
[GSPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/GSPO)
101-
[RAFT++](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/RAFT_Plus_Plus)
102-
[StarPO](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/agentic_StarPO)
103-
[RewardFL](https://alibaba.github.io/ROLL/docs/English/UserGuide/algorithms/Reward_FL)
94+
[Reinforce++](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/Reinforce_Plus_Plus)
95+
[TOPR](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/TOPR)
96+
[GiGPO](https://alibaba.github.io/ROLL/docs/User%20Guides/Agentic/agentic_GiGPO)
97+
[PPO](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/PPO)
98+
[Lite PPO](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/LitePPO)
99+
[GRPO](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/GRPO)
100+
[GSPO](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/GSPO)
101+
[RAFT++](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/RAFT_Plus_Plus)
102+
[StarPO](https://alibaba.github.io/ROLL/docs/User%20Guides/Agentic/agentic_StarPO)
103+
[RewardFL](https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/Reward_FL)
104104

105105
#### Backend
106-
[DeepSeed](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/deepspeed)
107-
[Megatron](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/megatron)
108-
[vLLM](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/vllm)
109-
[SGLang](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/sglang)
106+
[DeepSeed](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/deepspeed)
107+
[Megatron](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/megatron)
108+
[vLLM](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/vllm)
109+
[SGLang](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/sglang)
110110

111111
#### Advanced Features
112-
[Asynchronous Parallel Rollout](https://alibaba.github.io/ROLL/docs/English/UserGuide/async_parallel_rollout)
113-
[Asynchronous Training Feature](https://alibaba.github.io/ROLL/docs/English/UserGuide/async_training)
112+
[Asynchronous Parallel Rollout](https://alibaba.github.io/ROLL/docs/User%20Guides/Advanced%20Features/async_parallel_rollout)
113+
[Asynchronous Training Feature](https://alibaba.github.io/ROLL/docs/User%20Guides/Advanced%20Features/async_training)
114114

115115
#### Performance Optimization & Resource Management
116-
[Resource Config](https://alibaba.github.io/ROLL/docs/English/UserGuide/device_mapping)
117-
[GPU Time-Division Multiplexing Control](https://alibaba.github.io/ROLL/docs/English/UserGuide/offload_reload_control)
116+
[Resource Config](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/device_mapping)
117+
[GPU Time-Division Multiplexing Control](https://alibaba.github.io/ROLL/docs/User%20Guides/Advanced%20Features/offload_reload_control)
118118

119119
#### ROLL x Ascend
120-
[Ascend Usage Guide](https://alibaba.github.io/ROLL/docs/English/UserGuide/ascend/ascend_usage)
120+
[Ascend Usage Guide](https://alibaba.github.io/ROLL/docs/User%20Guides/Hardware%20Support/ascend_usage)
121121

122122
---
123123

@@ -138,7 +138,7 @@ Leveraging a multi-role distributed architecture with Ray for flexible resource
138138
* Inference/Generation supports vLLM, SGLang.
139139
* Training supports DeepSpeed (ZeRO), Megatron-LM 5D parallelism (mcore-adapter, dp/tp/pp/cp/ep), FSDP under implementation.
140140
* Extreme offload/reload capabilities.
141-
* Supports [LoRA](https://alibaba.github.io/ROLL/docs/English/UserGuide/backend/lora) training.
141+
* Supports [LoRA](https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/lora) training.
142142
* Supports FP8 rollout (FP8 inference for LLM as judge, FP8 rollout with BF16 training under development).
143143
* **AutoDeviceMapping:** Supports custom device mapping for different roles, flexibly managing colocated and disaggregated deployments.
144144
* **Observability:** Integrated with SwanLab / WandB / TensorBoard, tracking of performance for each domain and reward type.

docs/qa.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,10 @@ profiler_output_dir: /data/oss_bucket_0/yali/llm/profile/${exp_name}
7474
7575
roll数据分配的时候,会将rollout_batch_size的样本,按dp size 分发到每个actor_train worker上,然后再按gradient_accumulation_steps计算每次梯度更新的样本。配置一除就是0;
7676
77-
详细配置逻辑可以参考手册:https://alibaba.github.io/ROLL/docs/English/QuickStart/config_guide#training-arguments-training_args
77+
详细配置逻辑可以参考手册:https://alibaba.github.io/ROLL/docs/User%20Guides/Configuration/config_guide#training-arguments-training_args
7878
7979
80-
0. **如果出现这种错误:AssertionError: batch_size 32 < chunks 64**
80+
1. **如果出现这种错误:AssertionError: batch_size 32 < chunks 64**
8181
8282
batch_size 小于reference/actor_train 的DP size,导致dispatch时数据不够切分,可以调整rollout_batch_size解决
8383

docs_roll/docs/DesignImplementation/AgenticPipeline.md renamed to docs_roll/docs/Development/Architecture/AgenticPipeline.md

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs_roll/docs/DevelopmentGuide/support_new_models.md renamed to docs_roll/docs/Development/Developer Guide/support_new_models.md

File renamed without changes.

docs_roll/docs/QuickStart/debug_guide.md renamed to docs_roll/docs/Getting Started/Debugging Guide/debug_guide.md

File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)