Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions tools/launcher/common/hf/ptq.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# HuggingFace PTQ wrapper: downloads the model if needed, then runs huggingface_example.sh.
#
# Usage:
# ptq.sh --repo <org/model> --local-dir <path> -- [huggingface_example.sh args...]
#
# Everything before "--" is handled by this wrapper (download logic).
# Everything after "--" is passed directly to huggingface_example.sh.
# The --model arg is automatically set to <local-dir> for huggingface_example.sh.

set -e

REPO=""
LOCAL_DIR=""
PTQ_ARGS=()

# Parse wrapper args up to "--", collect the rest for huggingface_example.sh
while [[ $# -gt 0 ]]; do
case "$1" in
--repo) REPO="$2"; shift 2 ;;
--local-dir) LOCAL_DIR="$2"; shift 2 ;;
--) shift; PTQ_ARGS=("$@"); break ;;
*) echo "Unknown argument: $1 (use -- to separate PTQ args)" >&2; exit 1 ;;
esac
done

if [ -z "$REPO" ] || [ -z "$LOCAL_DIR" ]; then
echo "Usage: ptq.sh --repo <org/model> --local-dir <path> -- [huggingface_example.sh args...]" >&2
exit 1
fi

# --- Step 1: Download model if not already present ---
if [ -f "$LOCAL_DIR/config.json" ]; then
echo "Model already exists at $LOCAL_DIR, skipping download."
else
echo "Downloading $REPO to $LOCAL_DIR ..."
pip install -q huggingface_hub 2>/dev/null || true
huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR"
echo "Download complete: $LOCAL_DIR"
fi

# --- Step 2: Run huggingface_example.sh ---
script_dir="$(dirname "$(readlink -f "$0")")"
HF_EXAMPLE="${script_dir}/../../modules/Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh"

echo "Running huggingface_example.sh --model $LOCAL_DIR --trust_remote_code ${PTQ_ARGS[*]}"
exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded --trust_remote_code enables arbitrary code execution.

The --trust_remote_code flag is hardcoded, which allows execution of arbitrary Python code bundled with HuggingFace checkpoints. If REPO points to a malicious or compromised model, this creates a remote code execution (RCE) vector.

Per coding guidelines: "Do not hardcode trust_remote_code=True... Let the caller decide via a parameter; default to False."

🔒 Suggested fix: Make trust_remote_code configurable
 REPO=""
 LOCAL_DIR=""
 PTQ_ARGS=()
+TRUST_REMOTE_CODE=""
 
 # Parse wrapper args up to "--", collect the rest for huggingface_example.sh
 while [[ $# -gt 0 ]]; do
     case "$1" in
         --repo) REPO="$2"; shift 2 ;;
         --local-dir) LOCAL_DIR="$2"; shift 2 ;;
+        --trust-remote-code) TRUST_REMOTE_CODE="--trust_remote_code"; shift ;;
         --) shift; PTQ_ARGS=("$@"); break ;;
         *) echo "Unknown argument: $1 (use -- to separate PTQ args)" >&2; exit 1 ;;
     esac
 done

Then at the exec line:

-exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"
+exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" $TRUST_REMOTE_CODE "${PTQ_ARGS[@]}"

The YAML config would then need to explicitly opt-in:

args:
  - --repo <<global_vars.hf_model>>
  - --local-dir <<global_vars.hf_local>><<global_vars.hf_model>>
  - --trust-remote-code
  - --
  - --quant nvfp4
  - --tasks quant
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/common/hf/ptq.sh` at line 62, The exec line currently always
passes the hardcoded --trust_remote_code flag (exec bash "$HF_EXAMPLE" --model
"$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk;
change this so trust_remote_code is configurable (e.g., read a parameter/env var
like TRUST_REMOTE_CODE or an extra CLI arg) and only append the
--trust-remote-code token to the exec arguments when that value is explicitly
true/opted-in (default false). Update the invocation building logic that
constructs the final exec command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to
conditionally include the --trust-remote-code flag instead of always passing it,
and ensure downstream YAML/launcher docs show how to opt in (the PR already
suggests adding --trust-remote-code to args when desired).

39 changes: 0 additions & 39 deletions tools/launcher/common/hf_ptq/hf_ptq.sh

This file was deleted.

2 changes: 1 addition & 1 deletion tools/launcher/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ def build_slurm_executor(
container_image=slurm_config.container,
container_mounts=container_mounts,
array=slurm_config.array,
time="04:00:00",
time=slurm_config.time,
mem="0",
retries=0,
packager=packager,
Expand Down
51 changes: 51 additions & 0 deletions tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# HuggingFace PTQ via huggingface_example.sh
#
# Quantizes a HuggingFace model using examples/llm_ptq/scripts/huggingface_example.sh.
# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation mismatch with actual configuration.

Line 4 states "Default: Qwen/Qwen3.5-9B" but the actual configuration at line 34 uses Qwen/Qwen3-8B. Update the comment to match the implementation.

📝 Suggested fix
-# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
+# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml` at line 4, Update the header
comment that documents the default model to match the actual configured model:
change the text "Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200." to
reflect "Qwen/Qwen3-8B" (and keep the rest of the note about nvfp4_mlp_only and
8xH200 intact) so the comment matches the configuration used later in
hf_ptq.yaml.

#
# Usage (Slurm):
# export SLURM_HOST=<slurm-host>
# export SLURM_ACCOUNT=<your-team>
# export SLURM_PARTITION=<your-partition> # default: batch
# export SLURM_JOB_DIR=/home/scratch.<user>/experiments
# export SLURM_HF_LOCAL=/home/scratch.<user>/hf-local
# export HF_TOKEN=<your-hf-token> # for gated models; auto-injected into all tasks
# cd tools/launcher
# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
#
# Usage (local Docker):
# cd tools/launcher
# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
#
# Override model/quant via CLI:
# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
# pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
# --yes
Comment on lines +14 to +24
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Usage examples are inconsistent with this file and wrapper argument contract.

These commands reference examples/llm_ptq/hf_ptq.yaml and the override example drops wrapper-required --repo/--local-dir flow for common/hf/ptq.sh, so users can hit Unknown argument errors.

Suggested doc fix
-#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml --yes
...
-#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml hf_local=/mnt/hf-local --yes
...
-#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
-#       pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
-#       pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml \
+#       pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
 #       --yes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 14 - 24,
Update the usage examples to point to the actual YAML in this folder
(tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the
wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add
--repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant
override), and make the override CLI use the correct global var names
(pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example
lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit
--repo/--local-dir are replaced with consistent commands that pass the wrapper
args and the correct YAML path.


job_name: hf_ptq_nvfp4
pipeline:
skip: false
allow_to_fail: false
note: "HF PTQ with nvfp4"

global_vars:
hf_local: /hf-local/
hf_model: Qwen/Qwen3-8B

# Downloads model if needed, then runs huggingface_example.sh
task_0:
script: common/hf/ptq.sh
args:
- --repo <<global_vars.hf_model>>
- --local-dir <<global_vars.hf_local>><<global_vars.hf_model>>
- --
- --quant nvfp4
- --tasks quant
slurm_config:
_factory_: "slurm_factory"
nodes: 1
ntasks_per_node: 1
gpus_per_node: 1
time: "04:00:00"
container: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc7
Comment on lines +32 to +51
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add required model metadata environment variables for this new model config.

This new model YAML does not define MLM_MODEL_CFG and QUANT_CFG, which are required by launcher conventions for new model configs.

As per coding guidelines, “Set MLM_MODEL_CFG environment variable… [and] set QUANT_CFG environment variable… when adding a new model config.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 32 - 51, The
YAML is missing the required environment variables MLM_MODEL_CFG and QUANT_CFG
for this new model config; add them (e.g., under global_vars or inside task_0's
environment block) so the launcher can pick up model and quantization settings.
Specifically, set MLM_MODEL_CFG to the model config identifier for Qwen3-8B
(matching <<global_vars.hf_model>>/config name) and set QUANT_CFG to the chosen
quantization profile (e.g., "nvfp4" or a config name used by ptq.sh); update
either global_vars (preferred) or add an env: map under task_0 so MLM_MODEL_CFG
and QUANT_CFG are exported when script common/hf/ptq.sh runs.

27 changes: 0 additions & 27 deletions tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml

This file was deleted.

10 changes: 9 additions & 1 deletion tools/launcher/launch.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@

import getpass
import os
import subprocess # nosec B404
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove # nosec Bandit bypasses or add a formal security exception workflow.

These suppressions bypass enforced security checks without documented justification/approval in this PR.

Suggested change
-import subprocess  # nosec B404
+import subprocess
...
-        subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)  # nosec B603 B607
+        subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)

As per coding guidelines, “Bandit security checks are enforced via pre-commit hooks. # nosec comments are not allowed as bypasses…”.

Also applies to: 95-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/launch.py` at line 33, The import line "import subprocess"
currently has a "# nosec B404" bypass; remove the "# nosec" comment and either
(a) eliminate the need for suppressing by using a safer API/pattern or
validating inputs before calling subprocess, or (b) add a formal, documented
security exception entry (approved in the security exceptions registry) and
reference that approval in a short inline comment; apply the same change to the
other occurrence of the "# nosec" on the "import subprocess" usage so no Bandit
bypasses remain in launch.py.

import warnings

import nemo_run as run
Expand Down Expand Up @@ -61,10 +62,11 @@
"modules/Megatron-LM/examples/*",
"modules/Megatron-LM/*.py",
"modules/Model-Optimizer/modelopt/*",
"modules/Model-Optimizer/modelopt_recipes/*",
"modules/Model-Optimizer/examples/*",
"common/*",
],
relative_path=[LAUNCHER_DIR] * 6,
relative_path=[LAUNCHER_DIR] * 7,
)

MODELOPT_SRC_PATH = os.path.join(LAUNCHER_DIR, "modules/Model-Optimizer/modelopt")
Expand All @@ -84,8 +86,14 @@ def launch(
user: str = getpass.getuser(),
identity: str = None, # noqa: RUF013
detach: bool = False,
clean: bool = False,
) -> None:
"""Launch ModelOpt jobs on Slurm or locally with Docker."""
if clean:
examples_dir = os.path.join(_mo_symlink, "examples")
print(f"Cleaning {examples_dir} with git clean -xdf ...")
subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True) # nosec B603 B607

if "NEMORUN_HOME" not in os.environ:
warnings.warn("NEMORUN_HOME is not set. Defaulting to current working directory.")
run.config.set_nemorun_home(os.environ.get("NEMORUN_HOME", os.getcwd()))
Expand Down
5 changes: 4 additions & 1 deletion tools/launcher/slurm_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ class SlurmConfig:
nodes: int = 1
ntasks_per_node: int = 1
gpus_per_node: int = 1
time: str = "04:00:00"
local: bool = False


Expand All @@ -49,7 +50,7 @@ class SlurmConfig:
def slurm_factory(
host: str = os.environ.get("SLURM_HOST", ""),
account: str = os.environ.get("SLURM_ACCOUNT", ""),
partition: str = "batch",
partition: str = os.environ.get("SLURM_PARTITION", "batch"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if existing tests explicitly set partition or rely on default
rg -n "slurm_factory\(" tools/launcher/tests/ -A 5 | head -60

Repository: NVIDIA/Model-Optimizer

Length of output: 3174


🏁 Script executed:

# Check the actual test implementations and their assertions
sed -n '71,120p' tools/launcher/tests/test_slurm_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1679


🏁 Script executed:

# Check if there are any monkeypatch calls for SLURM_PARTITION
rg "SLURM_PARTITION" tools/launcher/tests/

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check the slurm_factory function signature and implementation
sed -n '1,100p' tools/launcher/slurm_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2722


Add explicit partition isolation to tests.

Tests at lines 71–85 call slurm_factory() without mocking SLURM_PARTITION. If this variable is set in the CI environment, the tests will not evaluate the default "batch" value as intended. Use monkeypatch.setenv("SLURM_PARTITION", "") (or delete it) before these test calls to ensure consistent behavior, or explicitly pass partition="batch" to isolate tests from environment state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/slurm_config.py` at line 53, Tests calling slurm_factory()
rely on the default partition but can be affected by the CI environment's
SLURM_PARTITION; ensure isolation by either clearing or setting the env var
before those test calls (use monkeypatch.setenv("SLURM_PARTITION", "") or
monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise
slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory
so the code path using partition: str = os.environ.get("SLURM_PARTITION",
\"batch\") is exercised deterministically.

nodes: int = 1,
ntasks_per_node: int = 1,
gpus_per_node: int = 1,
Expand All @@ -60,6 +61,7 @@ def slurm_factory(
],
srun_args: list[str] = ["--no-container-mount-home"],
array: str = None, # noqa: RUF013
time: str = "04:00:00",
) -> SlurmConfig:
"""Generic Slurm factory — configure via environment variables or CLI overrides."""
return SlurmConfig(
Expand All @@ -74,4 +76,5 @@ def slurm_factory(
container_mounts=container_mounts,
srun_args=srun_args,
array=array,
time=time,
)
1 change: 1 addition & 0 deletions tools/launcher/tests/test_slurm_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ def test_executor_params(self, mock_tunnel, mock_executor):
ntasks_per_node=8,
gpus_per_node=8,
array="0-3",
time="04:00:00",
)

packager = MagicMock()
Expand Down
Loading