Skip to content

Commit 7984a23

Browse files
committed
[megatron] feat: add arg to offload bridged weights to CPU
Now `offload_bridge` is a supported option to store Megatron exported HF format weights for vLLM updates in CPU main memory to reduce GPU memory usage. Default is False. Signed-off-by: Hollow Man <[email protected]>
1 parent 57cd99f commit 7984a23

File tree

14 files changed

+29
-1
lines changed

14 files changed

+29
-1
lines changed

docs/source/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -710,6 +710,7 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
710710
- mcore_model: mcore格式模型路径。默认为None。
711711
- mcore_adapters: mcore格式模型的adapter路径列表,默认为空列表。
712712
- thread_count: `--to_mcore true`时的模型切片数。默认为None,根据模型大小自动设置,使得最大分片小于10GB。
713+
- 🔥offload_bridge: Megatron导出的用于vLLM更新HF格式权重使用CPU主存存放,以降低 GPU 显存占用。默认为 False。
713714
- 🔥test_convert_precision: 测试HF和Megatron格式权重转换的精度误差。默认为False。
714715
- test_convert_dtype: 转换精度测试使用的dtype,默认为'float32'。
715716
- 🔥push_to_hub: 是否推送hub,默认为False。例子参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh)

docs/source/Instruction/GRPO/GetStarted/GRPO.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,12 @@ GRPO 训练框架支持集成高性能推理引擎(如 vLLM)来加速采样
159159
--move_model_batches [批次数量]
160160
```
161161

162+
6. 将 Megatron 导出的用于 vLLM 更新的 HF 格式权重存放在 CPU 主存中,以降低 GPU 显存占用:
163+
164+
```bash
165+
--offload_bridge true
166+
```
167+
162168
### 2. Async(External) Mode
163169

164170
训练与推理资源分离,启动单独的推理服务器

docs/source/Megatron-SWIFT/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,7 @@ lora训练:
258258
- hub_token: hub token. modelscope的hub token可以查看[这里](https://modelscope.cn/my/myaccesstoken)。默认为None。
259259
- merge_lora: 是否存储合并后的权重。默认为None,若`save_safetensors`设置为True,该参数默认值为`True`,否则为False。即默认情况下,存储为safetensors格式时会合并LoRA;存储为torch_dist格式时,不会合并LoRA。
260260
- max_shard_size: safetensors格式存储文件最大大小,默认'5GB'。
261+
- 🔥offload_bridge: Megatron导出的用于vLLM更新HF格式权重使用CPU主存存放,以降低 GPU 显存占用。默认为 False。
261262

262263
## 训练参数
263264

docs/source/Megatron-SWIFT/Mcore-Bridge.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,8 @@ swift infer \
190190
--stream true
191191
```
192192

193+
提示:如果在vLLM权重更新期间遇到 GPU OOM 问题,您可以设置 `--offload_bridge true` 将张量卸载到 CPU 并减少 GPU 内存使用量。
194+
193195
## 导出与转换精度测试
194196

195197
Mcore-Bridge除了支持在训练中进行safetensors的转换和保存,也支持了`megatron export`命令用于单独的权重导出。`megatron export`支持在权重转换时,对转换精度进行测试,这在接入新模型时验证接入准确性很有帮助。通常,Megatron-SWIFT已经接入的模型不会出现精度不对齐的情况,你可以放心设置`--test_convert_precision false`

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -728,6 +728,7 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
728728
- mcore_model: Path to the mcore format model. Default is None.
729729
- mcore_adapters: List of paths to mcore format model adapters, default is empty list.
730730
- thread_count: The number of model slices when `--to_mcore true` is set. Defaults to None, and is automatically configured based on the model size, ensuring that the largest slice is less than 10GB.
731+
- 🔥offload_bridge: Store Megatron exported HF format weights for vLLM updates in CPU main memory to reduce GPU memory usage. Default is False.
731732
- 🔥test_convert_precision: Test the precision error when converting weights between HF and Megatron formats. Default is False.
732733
- test_convert_dtype: The dtype used for conversion precision testing, defaults to 'float32'.
733734
- 🔥push_to_hub: Whether to push to the hub, with the default being False. Examples can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh).

docs/source_en/Instruction/GRPO/GetStarted/GRPO.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,12 @@ When running in Colocate mode, out-of-memory (OOM) issues may frequently occur.
159159
--move_model_batches [批次数量]
160160
```
161161

162+
6. Store Megatron exported HF format weights for vLLM updates in CPU main memory to reduce GPU memory usage:
163+
164+
```bash
165+
--offload_bridge true
166+
```
167+
162168
### 2. Async(External) Mode
163169

164170
Training and inference resources are separated, with a dedicated inference server deployed.

docs/source_en/Megatron-SWIFT/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,7 @@ LoRA Training:
275275
- hub_token: Hub token. ModelScope hub token can be found [here](https://modelscope.cn/my/myaccesstoken). Default is None.
276276
- merge_lora: Whether to store merged weights. Defaults to None. If `save_safetensors` is set to True, this parameter defaults to `True`; otherwise, it defaults to False. That is, by default, LoRA will be merged when storing in safetensors format; LoRA will not be merged when storing in torch_dist format.
277277
- max_shard_size: Maximum file size for safetensors format storage, defaults to '5GB'.
278+
- 🔥offload_bridge: Store Megatron exported HF format weights for vLLM updates in CPU main memory to reduce GPU memory usage. Default is False.
278279

279280

280281
## Training Parameters

docs/source_en/Megatron-SWIFT/Mcore-Bridge.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,8 @@ swift infer \
200200
--stream true
201201
```
202202

203+
Tip: If you encounter GPU OOM issues during weight synchronization with vLLM, you can set `--offload_bridge true` to offload intermediate tensors to the CPU and reduce GPU memory usage.
204+
203205
## Export and Conversion Precision Testing
204206

205207
In addition to supporting safetensors conversion and saving during training, Mcore-Bridge also supports the `megatron export` command for standalone weight export. `megatron export` supports conversion precision testing during weight conversion, which is very helpful for verifying accuracy when integrating new models. Typically, models already integrated into Megatron-SWIFT will not have precision misalignment issues, so you can confidently set `--test_convert_precision false`.

examples/megatron/grpo/dense_colocate.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ megatron rlhf \
4545
--loss_type grpo \
4646
--sleep_level 2 \
4747
--offload_model true \
48+
--offload_bridge false \
4849
--offload_optimizer true \
4950
--log_interval 1 \
5051
--recompute_granularity selective \

examples/megatron/grpo/dense_server.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ megatron rlhf \
5252
--loss_type grpo \
5353
--sleep_level 2 \
5454
--offload_model true \
55+
--offload_bridge false \
5556
--offload_optimizer true \
5657
--log_interval 1 \
5758
--recompute_granularity selective \

0 commit comments

Comments
 (0)