-
Notifications
You must be signed in to change notification settings - Fork 332
Add HuggingFace PTQ pipeline to launcher #1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
677da32
bdf06f8
cfcfbeb
a81451e
0ee2df3
3bcf6a7
6a88cb1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| #!/bin/bash | ||
| # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # HuggingFace PTQ wrapper: downloads the model if needed, then runs huggingface_example.sh. | ||
| # | ||
| # Usage: | ||
| # ptq.sh --repo <org/model> --local-dir <path> -- [huggingface_example.sh args...] | ||
| # | ||
| # Everything before "--" is handled by this wrapper (download logic). | ||
| # Everything after "--" is passed directly to huggingface_example.sh. | ||
| # The --model arg is automatically set to <local-dir> for huggingface_example.sh. | ||
|
|
||
| set -e | ||
|
|
||
| REPO="" | ||
| LOCAL_DIR="" | ||
| PTQ_ARGS=() | ||
|
|
||
| # Parse wrapper args up to "--", collect the rest for huggingface_example.sh | ||
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --repo) REPO="$2"; shift 2 ;; | ||
| --local-dir) LOCAL_DIR="$2"; shift 2 ;; | ||
| --) shift; PTQ_ARGS=("$@"); break ;; | ||
| *) echo "Unknown argument: $1 (use -- to separate PTQ args)" >&2; exit 1 ;; | ||
| esac | ||
| done | ||
|
|
||
| if [ -z "$REPO" ] || [ -z "$LOCAL_DIR" ]; then | ||
| echo "Usage: ptq.sh --repo <org/model> --local-dir <path> -- [huggingface_example.sh args...]" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| # --- Step 1: Download model if not already present --- | ||
| if [ -f "$LOCAL_DIR/config.json" ]; then | ||
| echo "Model already exists at $LOCAL_DIR, skipping download." | ||
| else | ||
| echo "Downloading $REPO to $LOCAL_DIR ..." | ||
| pip install -q huggingface_hub 2>/dev/null || true | ||
| huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR" | ||
| echo "Download complete: $LOCAL_DIR" | ||
| fi | ||
|
|
||
| # --- Step 2: Run huggingface_example.sh --- | ||
| script_dir="$(dirname "$(readlink -f "$0")")" | ||
| HF_EXAMPLE="${script_dir}/../../modules/Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh" | ||
|
|
||
| echo "Running huggingface_example.sh --model $LOCAL_DIR --trust_remote_code ${PTQ_ARGS[*]}" | ||
| exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}" | ||
This file was deleted.
cjluo-nv marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,51 @@ | ||||||
| # HuggingFace PTQ via huggingface_example.sh | ||||||
| # | ||||||
| # Quantizes a HuggingFace model using examples/llm_ptq/scripts/huggingface_example.sh. | ||||||
| # Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Documentation mismatch with actual configuration. Line 4 states "Default: Qwen/Qwen3.5-9B" but the actual configuration at line 34 uses 📝 Suggested fix-# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
+# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| # | ||||||
| # Usage (Slurm): | ||||||
| # export SLURM_HOST=<slurm-host> | ||||||
| # export SLURM_ACCOUNT=<your-team> | ||||||
| # export SLURM_PARTITION=<your-partition> # default: batch | ||||||
| # export SLURM_JOB_DIR=/home/scratch.<user>/experiments | ||||||
| # export SLURM_HF_LOCAL=/home/scratch.<user>/hf-local | ||||||
| # export HF_TOKEN=<your-hf-token> # for gated models; auto-injected into all tasks | ||||||
| # cd tools/launcher | ||||||
| # uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes | ||||||
| # | ||||||
| # Usage (local Docker): | ||||||
| # cd tools/launcher | ||||||
| # uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes | ||||||
| # | ||||||
| # Override model/quant via CLI: | ||||||
| # uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \ | ||||||
| # pipeline.global_vars.hf_model=Qwen/Qwen3-8B \ | ||||||
| # pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \ | ||||||
| # --yes | ||||||
|
Comment on lines
+14
to
+24
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usage examples are inconsistent with this file and wrapper argument contract. These commands reference Suggested doc fix-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml --yes
...
-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml hf_local=/mnt/hf-local --yes
...
-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
-# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
-# pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml \
+# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
# --yes🤖 Prompt for AI Agents |
||||||
|
|
||||||
| job_name: hf_ptq_nvfp4 | ||||||
| pipeline: | ||||||
| skip: false | ||||||
| allow_to_fail: false | ||||||
| note: "HF PTQ with nvfp4" | ||||||
|
|
||||||
| global_vars: | ||||||
| hf_local: /hf-local/ | ||||||
| hf_model: Qwen/Qwen3-8B | ||||||
|
|
||||||
| # Downloads model if needed, then runs huggingface_example.sh | ||||||
| task_0: | ||||||
| script: common/hf/ptq.sh | ||||||
| args: | ||||||
| - --repo <<global_vars.hf_model>> | ||||||
| - --local-dir <<global_vars.hf_local>><<global_vars.hf_model>> | ||||||
| - -- | ||||||
| - --quant nvfp4 | ||||||
| - --tasks quant | ||||||
| slurm_config: | ||||||
| _factory_: "slurm_factory" | ||||||
| nodes: 1 | ||||||
| ntasks_per_node: 1 | ||||||
| gpus_per_node: 1 | ||||||
| time: "04:00:00" | ||||||
| container: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc7 | ||||||
|
Comment on lines
+32
to
+51
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add required model metadata environment variables for this new model config. This new model YAML does not define As per coding guidelines, “Set 🤖 Prompt for AI Agents |
||||||
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,6 +30,7 @@ | |
|
|
||
| import getpass | ||
| import os | ||
| import subprocess # nosec B404 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove These suppressions bypass enforced security checks without documented justification/approval in this PR. Suggested change-import subprocess # nosec B404
+import subprocess
...
- subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True) # nosec B603 B607
+ subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)As per coding guidelines, “Bandit security checks are enforced via pre-commit hooks. Also applies to: 95-95 🤖 Prompt for AI Agents |
||
| import warnings | ||
|
|
||
| import nemo_run as run | ||
|
|
@@ -61,10 +62,11 @@ | |
| "modules/Megatron-LM/examples/*", | ||
| "modules/Megatron-LM/*.py", | ||
| "modules/Model-Optimizer/modelopt/*", | ||
| "modules/Model-Optimizer/modelopt_recipes/*", | ||
| "modules/Model-Optimizer/examples/*", | ||
| "common/*", | ||
| ], | ||
| relative_path=[LAUNCHER_DIR] * 6, | ||
| relative_path=[LAUNCHER_DIR] * 7, | ||
| ) | ||
|
|
||
| MODELOPT_SRC_PATH = os.path.join(LAUNCHER_DIR, "modules/Model-Optimizer/modelopt") | ||
|
|
@@ -84,8 +86,14 @@ def launch( | |
| user: str = getpass.getuser(), | ||
| identity: str = None, # noqa: RUF013 | ||
| detach: bool = False, | ||
| clean: bool = False, | ||
| ) -> None: | ||
| """Launch ModelOpt jobs on Slurm or locally with Docker.""" | ||
| if clean: | ||
| examples_dir = os.path.join(_mo_symlink, "examples") | ||
| print(f"Cleaning {examples_dir} with git clean -xdf ...") | ||
| subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True) # nosec B603 B607 | ||
|
|
||
| if "NEMORUN_HOME" not in os.environ: | ||
| warnings.warn("NEMORUN_HOME is not set. Defaulting to current working directory.") | ||
| run.config.set_nemorun_home(os.environ.get("NEMORUN_HOME", os.getcwd())) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,6 +41,7 @@ class SlurmConfig: | |
| nodes: int = 1 | ||
| ntasks_per_node: int = 1 | ||
| gpus_per_node: int = 1 | ||
| time: str = "04:00:00" | ||
| local: bool = False | ||
|
|
||
|
|
||
|
|
@@ -49,7 +50,7 @@ class SlurmConfig: | |
| def slurm_factory( | ||
| host: str = os.environ.get("SLURM_HOST", ""), | ||
| account: str = os.environ.get("SLURM_ACCOUNT", ""), | ||
| partition: str = "batch", | ||
| partition: str = os.environ.get("SLURM_PARTITION", "batch"), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check if existing tests explicitly set partition or rely on default
rg -n "slurm_factory\(" tools/launcher/tests/ -A 5 | head -60Repository: NVIDIA/Model-Optimizer Length of output: 3174 🏁 Script executed: # Check the actual test implementations and their assertions
sed -n '71,120p' tools/launcher/tests/test_slurm_config.pyRepository: NVIDIA/Model-Optimizer Length of output: 1679 🏁 Script executed: # Check if there are any monkeypatch calls for SLURM_PARTITION
rg "SLURM_PARTITION" tools/launcher/tests/Repository: NVIDIA/Model-Optimizer Length of output: 48 🏁 Script executed: # Check the slurm_factory function signature and implementation
sed -n '1,100p' tools/launcher/slurm_config.pyRepository: NVIDIA/Model-Optimizer Length of output: 2722 Add explicit partition isolation to tests. Tests at lines 71–85 call 🤖 Prompt for AI Agents |
||
| nodes: int = 1, | ||
| ntasks_per_node: int = 1, | ||
| gpus_per_node: int = 1, | ||
|
|
@@ -60,6 +61,7 @@ def slurm_factory( | |
| ], | ||
| srun_args: list[str] = ["--no-container-mount-home"], | ||
| array: str = None, # noqa: RUF013 | ||
| time: str = "04:00:00", | ||
| ) -> SlurmConfig: | ||
| """Generic Slurm factory — configure via environment variables or CLI overrides.""" | ||
| return SlurmConfig( | ||
|
|
@@ -74,4 +76,5 @@ def slurm_factory( | |
| container_mounts=container_mounts, | ||
| srun_args=srun_args, | ||
| array=array, | ||
| time=time, | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Hardcoded
--trust_remote_codeenables arbitrary code execution.The
--trust_remote_codeflag is hardcoded, which allows execution of arbitrary Python code bundled with HuggingFace checkpoints. IfREPOpoints to a malicious or compromised model, this creates a remote code execution (RCE) vector.Per coding guidelines: "Do not hardcode
trust_remote_code=True... Let the caller decide via a parameter; default toFalse."🔒 Suggested fix: Make trust_remote_code configurable
Then at the exec line:
The YAML config would then need to explicitly opt-in:
🤖 Prompt for AI Agents