Add HuggingFace PTQ pipeline to launcher#1100
Conversation
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIntroduces a new HF PTQ wrapper script that manages model acquisition and invokes a downstream example; removes the old hf_ptq launcher; adds Slurm time configuration; expands launcher packaging and a Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Wrapper as tools/launcher/common/hf/ptq.sh
participant HFHub as HuggingFace Hub (huggingface-cli)
participant FS as Local FS
participant Example as downstream huggingface_example.sh
User->>Wrapper: run ptq.sh --repo <org/model> --local-dir <path> -- --<ptq-args>
Wrapper->>FS: check for `<local_dir>/config.json`
alt config exists
Wrapper-->>FS: skip download
else missing
Wrapper->>Wrapper: ensure huggingface_hub installed (pip, best-effort)
Wrapper->>HFHub: huggingface-cli download "<repo>" --local-dir "<local_dir>"
HFHub-->>FS: write model files to <local_dir>
end
Wrapper->>Example: exec bash huggingface_example.sh --model "<local_dir>" --trust_remote_code <ptq-args>
Example-->>FS: read model, run PTQ, write outputs
Example-->>User: exit status / logs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (3 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #1100 +/- ##
===========================================
- Coverage 70.18% 54.53% -15.65%
===========================================
Files 230 348 +118
Lines 26080 39778 +13698
===========================================
+ Hits 18304 21694 +3390
- Misses 7776 18084 +10308
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
tools/launcher/common/hf/ptq.sh (2)
52-53: Suppressed pip errors may hide installation failures.The
|| truesuppresses all pip errors, which could mask genuine issues (network failures, incompatible Python versions, etc.). Ifhuggingface_hubfails to install, the subsequenthuggingface-clicommand will fail with a confusing error.Consider checking if
huggingface-cliis available before attempting the download, or letting pip errors propagate.♻️ Suggested improvement
- pip install -q huggingface_hub 2>/dev/null || true - huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR" + if ! command -v huggingface-cli &>/dev/null; then + echo "Installing huggingface_hub..." + pip install -q huggingface_hub || { echo "Failed to install huggingface_hub" >&2; exit 1; } + fi + huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/hf/ptq.sh` around lines 52 - 53, The pip install line currently silences all errors via "|| true", which can hide installation failures and lead to confusing downstream errors when running "huggingface-cli download \"$REPO\" --local-dir \"$LOCAL_DIR\""; remove the "|| true" and instead verify the installer succeeded (or explicitly check availability of the CLI) before calling "huggingface-cli" — e.g., run the pip install for huggingface_hub without suppression, then check for the presence of the CLI (using command -v huggingface-cli or attempting a small import/entrypoint) and exit with a clear error if it's not available, referencing the pip install step and the "huggingface-cli download" invocation and the variables REPO and LOCAL_DIR.
48-49: Existence check based solely onconfig.jsonmay be insufficient.A partially downloaded model could have
config.jsonbut be missing critical files (weights, tokenizer). Consider a more robust completeness check, or document that users should manually clean partial downloads.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/hf/ptq.sh` around lines 48 - 49, The script ptq.sh currently treats presence of "$LOCAL_DIR/config.json" as sufficient to skip download, which can leave partial/invalid models; update the existence check to verify required model artifacts (e.g., check for weight files like "pytorch_model.bin" or "model.safetensors" and tokenizer files such as "tokenizer.json", "vocab.json" or "merges.txt") or implement a download-complete marker (e.g., write a ".complete" file after a successful download) and only skip when that marker or all required files exist; use the LOCAL_DIR variable and verify the set of expected files or marker before deciding to skip.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/launcher/common/hf/ptq.sh`:
- Line 62: The exec line currently always passes the hardcoded
--trust_remote_code flag (exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR"
--trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk; change this so
trust_remote_code is configurable (e.g., read a parameter/env var like
TRUST_REMOTE_CODE or an extra CLI arg) and only append the --trust-remote-code
token to the exec arguments when that value is explicitly true/opted-in (default
false). Update the invocation building logic that constructs the final exec
command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to conditionally include the
--trust-remote-code flag instead of always passing it, and ensure downstream
YAML/launcher docs show how to opt in (the PR already suggests adding
--trust-remote-code to args when desired).
In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml`:
- Line 4: Update the header comment that documents the default model to match
the actual configured model: change the text "Default: Qwen/Qwen3.5-9B with
nvfp4_mlp_only on 8xH200." to reflect "Qwen/Qwen3-8B" (and keep the rest of the
note about nvfp4_mlp_only and 8xH200 intact) so the comment matches the
configuration used later in hf_ptq.yaml.
In `@tools/launcher/launch.py`:
- Around line 92-95: The current use of subprocess.run(["git", "clean", "-xdf",
examples_dir], check=True) is dangerously destructive; update the behavior in
the launch.py block that checks the clean variable (where examples_dir is
derived from LAUNCHER_DIR) by either: 1) changing the git flags to
["git","clean","-df", examples_dir] to preserve .gitignore’d files, or 2)
prompting the user for confirmation before running the clean command (e.g., use
input() to require an explicit "yes"), and/or 3) documenting the destructive
nature in the function/class docstring or the clean parameter docs so callers
are aware; choose one or more of these fixes and apply them around the
subprocess.run invocation and the clean parameter handling.
In `@tools/launcher/slurm_config.py`:
- Line 53: Tests calling slurm_factory() rely on the default partition but can
be affected by the CI environment's SLURM_PARTITION; ensure isolation by either
clearing or setting the env var before those test calls (use
monkeypatch.setenv("SLURM_PARTITION", "") or
monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise
slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory
so the code path using partition: str = os.environ.get("SLURM_PARTITION",
\"batch\") is exercised deterministically.
---
Nitpick comments:
In `@tools/launcher/common/hf/ptq.sh`:
- Around line 52-53: The pip install line currently silences all errors via "||
true", which can hide installation failures and lead to confusing downstream
errors when running "huggingface-cli download \"$REPO\" --local-dir
\"$LOCAL_DIR\""; remove the "|| true" and instead verify the installer succeeded
(or explicitly check availability of the CLI) before calling "huggingface-cli" —
e.g., run the pip install for huggingface_hub without suppression, then check
for the presence of the CLI (using command -v huggingface-cli or attempting a
small import/entrypoint) and exit with a clear error if it's not available,
referencing the pip install step and the "huggingface-cli download" invocation
and the variables REPO and LOCAL_DIR.
- Around line 48-49: The script ptq.sh currently treats presence of
"$LOCAL_DIR/config.json" as sufficient to skip download, which can leave
partial/invalid models; update the existence check to verify required model
artifacts (e.g., check for weight files like "pytorch_model.bin" or
"model.safetensors" and tokenizer files such as "tokenizer.json", "vocab.json"
or "merges.txt") or implement a download-complete marker (e.g., write a
".complete" file after a successful download) and only skip when that marker or
all required files exist; use the LOCAL_DIR variable and verify the set of
expected files or marker before deciding to skip.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e39d9517-5fe7-4c98-baa2-0f0e464950ac
📒 Files selected for processing (5)
tools/launcher/common/hf/ptq.shtools/launcher/core.pytools/launcher/examples/llm_ptq/hf_ptq.yamltools/launcher/launch.pytools/launcher/slurm_config.py
| HF_EXAMPLE="${script_dir}/../../modules/Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh" | ||
|
|
||
| echo "Running huggingface_example.sh --model $LOCAL_DIR --trust_remote_code ${PTQ_ARGS[*]}" | ||
| exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}" |
There was a problem hiding this comment.
Critical: Hardcoded --trust_remote_code enables arbitrary code execution.
The --trust_remote_code flag is hardcoded, which allows execution of arbitrary Python code bundled with HuggingFace checkpoints. If REPO points to a malicious or compromised model, this creates a remote code execution (RCE) vector.
Per coding guidelines: "Do not hardcode trust_remote_code=True... Let the caller decide via a parameter; default to False."
🔒 Suggested fix: Make trust_remote_code configurable
REPO=""
LOCAL_DIR=""
PTQ_ARGS=()
+TRUST_REMOTE_CODE=""
# Parse wrapper args up to "--", collect the rest for huggingface_example.sh
while [[ $# -gt 0 ]]; do
case "$1" in
--repo) REPO="$2"; shift 2 ;;
--local-dir) LOCAL_DIR="$2"; shift 2 ;;
+ --trust-remote-code) TRUST_REMOTE_CODE="--trust_remote_code"; shift ;;
--) shift; PTQ_ARGS=("$@"); break ;;
*) echo "Unknown argument: $1 (use -- to separate PTQ args)" >&2; exit 1 ;;
esac
doneThen at the exec line:
-exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"
+exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" $TRUST_REMOTE_CODE "${PTQ_ARGS[@]}"The YAML config would then need to explicitly opt-in:
args:
- --repo <<global_vars.hf_model>>
- --local-dir <<global_vars.hf_local>><<global_vars.hf_model>>
- --trust-remote-code
- --
- --quant nvfp4
- --tasks quant🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/common/hf/ptq.sh` at line 62, The exec line currently always
passes the hardcoded --trust_remote_code flag (exec bash "$HF_EXAMPLE" --model
"$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk;
change this so trust_remote_code is configurable (e.g., read a parameter/env var
like TRUST_REMOTE_CODE or an extra CLI arg) and only append the
--trust-remote-code token to the exec arguments when that value is explicitly
true/opted-in (default false). Update the invocation building logic that
constructs the final exec command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to
conditionally include the --trust-remote-code flag instead of always passing it,
and ensure downstream YAML/launcher docs show how to opt in (the PR already
suggests adding --trust-remote-code to args when desired).
| # HuggingFace PTQ via huggingface_example.sh | ||
| # | ||
| # Quantizes a HuggingFace model using examples/llm_ptq/scripts/huggingface_example.sh. | ||
| # Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200. |
There was a problem hiding this comment.
Documentation mismatch with actual configuration.
Line 4 states "Default: Qwen/Qwen3.5-9B" but the actual configuration at line 34 uses Qwen/Qwen3-8B. Update the comment to match the implementation.
📝 Suggested fix
-# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
+# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200. | |
| # Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml` at line 4, Update the header
comment that documents the default model to match the actual configured model:
change the text "Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200." to
reflect "Qwen/Qwen3-8B" (and keep the rest of the note about nvfp4_mlp_only and
8xH200 intact) so the comment matches the configuration used later in
hf_ptq.yaml.
tools/launcher/launch.py
Outdated
| if clean: | ||
| examples_dir = os.path.join(LAUNCHER_DIR, "examples") | ||
| print(f"Cleaning {examples_dir} with git clean -xdf ...") | ||
| subprocess.run(["git", "clean", "-xdf", examples_dir], check=True) # nosec B603 B607 |
There was a problem hiding this comment.
Consider the destructive scope of git clean -xdf.
The -xdf flags will remove all untracked files, including ignored files and directories, without confirmation. If a user has uncommitted work or local config files in the examples directory, they will be permanently deleted.
Consider either:
- Using
-df(without-x) to preserve.gitignored files - Adding a warning/confirmation before cleaning
- Documenting this behavior prominently in the
cleanparameter docstring
💡 Suggested documentation improvement
def launch(
job_name: str = "01_job",
job_dir: str = os.environ.get("SLURM_JOB_DIR", os.path.expanduser("~/experiments")),
pipeline: SandboxPipeline = None,
hf_local: str = None, # noqa: RUF013
user: str = getpass.getuser(),
identity: str = None, # noqa: RUF013
detach: bool = False,
- clean: bool = False,
+ clean: bool = False, # WARNING: Runs 'git clean -xdf' on examples/, removing ALL untracked files
) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/launch.py` around lines 92 - 95, The current use of
subprocess.run(["git", "clean", "-xdf", examples_dir], check=True) is
dangerously destructive; update the behavior in the launch.py block that checks
the clean variable (where examples_dir is derived from LAUNCHER_DIR) by either:
1) changing the git flags to ["git","clean","-df", examples_dir] to preserve
.gitignore’d files, or 2) prompting the user for confirmation before running the
clean command (e.g., use input() to require an explicit "yes"), and/or 3)
documenting the destructive nature in the function/class docstring or the clean
parameter docs so callers are aware; choose one or more of these fixes and apply
them around the subprocess.run invocation and the clean parameter handling.
| host: str = os.environ.get("SLURM_HOST", ""), | ||
| account: str = os.environ.get("SLURM_ACCOUNT", ""), | ||
| partition: str = "batch", | ||
| partition: str = os.environ.get("SLURM_PARTITION", "batch"), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if existing tests explicitly set partition or rely on default
rg -n "slurm_factory\(" tools/launcher/tests/ -A 5 | head -60Repository: NVIDIA/Model-Optimizer
Length of output: 3174
🏁 Script executed:
# Check the actual test implementations and their assertions
sed -n '71,120p' tools/launcher/tests/test_slurm_config.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1679
🏁 Script executed:
# Check if there are any monkeypatch calls for SLURM_PARTITION
rg "SLURM_PARTITION" tools/launcher/tests/Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check the slurm_factory function signature and implementation
sed -n '1,100p' tools/launcher/slurm_config.pyRepository: NVIDIA/Model-Optimizer
Length of output: 2722
Add explicit partition isolation to tests.
Tests at lines 71–85 call slurm_factory() without mocking SLURM_PARTITION. If this variable is set in the CI environment, the tests will not evaluate the default "batch" value as intended. Use monkeypatch.setenv("SLURM_PARTITION", "") (or delete it) before these test calls to ensure consistent behavior, or explicitly pass partition="batch" to isolate tests from environment state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/slurm_config.py` at line 53, Tests calling slurm_factory()
rely on the default partition but can be affected by the CI environment's
SLURM_PARTITION; ensure isolation by either clearing or setting the env var
before those test calls (use monkeypatch.setenv("SLURM_PARTITION", "") or
monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise
slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory
so the code path using partition: str = os.environ.get("SLURM_PARTITION",
\"batch\") is exercised deterministically.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
tools/launcher/launch.py (1)
92-95:⚠️ Potential issue | 🟡 Minor
clean=Truestill performs destructive cleanup without explicit safeguard.This still runs
git clean -xdf, which removes ignored/untracked files irreversibly in the examples tree.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/launch.py` around lines 92 - 95, The current branch still runs subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True) whenever clean is truthy, which is dangerous; change the logic around the clean flag (the block that computes examples_dir using _mo_symlink) to require an explicit safeguard before invoking subprocess.run: either prompt the user for confirmation (y/N) or require an extra explicit flag/ENV (e.g., force_clean or CONFIRM_CLEAN) to be set true, and if not confirmed, skip running git clean (or run git clean -n first and show the result); ensure the subprocess.run call is only executed when the new confirmation/force check passes.tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml (1)
4-4:⚠️ Potential issue | 🟡 MinorHeader default is stale and conflicts with actual config.
The comment says
Qwen/Qwen3.5-9B ... 8xH200, but this file runsQwen/Qwen3-8Bon 1 GPU.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` at line 4, Update the stale header comment in hf_ptq.yaml so it matches the actual config: replace the incorrect model string "Qwen/Qwen3.5-9B" and hardware note "8xH200" with the correct values used in this file ("Qwen/Qwen3-8B" and single-GPU / 1 GPU) so the top-of-file comment accurately reflects the current config (search for the header comment containing "Qwen/Qwen3.5-9B" and "8xH200" and edit it to mention "Qwen/Qwen3-8B" and "1 GPU").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`:
- Around line 32-51: The YAML is missing the required environment variables
MLM_MODEL_CFG and QUANT_CFG for this new model config; add them (e.g., under
global_vars or inside task_0's environment block) so the launcher can pick up
model and quantization settings. Specifically, set MLM_MODEL_CFG to the model
config identifier for Qwen3-8B (matching <<global_vars.hf_model>>/config name)
and set QUANT_CFG to the chosen quantization profile (e.g., "nvfp4" or a config
name used by ptq.sh); update either global_vars (preferred) or add an env: map
under task_0 so MLM_MODEL_CFG and QUANT_CFG are exported when script
common/hf/ptq.sh runs.
- Around line 14-24: Update the usage examples to point to the actual YAML in
this folder (tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the
wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add
--repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant
override), and make the override CLI use the correct global var names
(pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example
lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit
--repo/--local-dir are replaced with consistent commands that pass the wrapper
args and the correct YAML path.
In `@tools/launcher/launch.py`:
- Line 33: The import line "import subprocess" currently has a "# nosec B404"
bypass; remove the "# nosec" comment and either (a) eliminate the need for
suppressing by using a safer API/pattern or validating inputs before calling
subprocess, or (b) add a formal, documented security exception entry (approved
in the security exceptions registry) and reference that approval in a short
inline comment; apply the same change to the other occurrence of the "# nosec"
on the "import subprocess" usage so no Bandit bypasses remain in launch.py.
---
Duplicate comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`:
- Line 4: Update the stale header comment in hf_ptq.yaml so it matches the
actual config: replace the incorrect model string "Qwen/Qwen3.5-9B" and hardware
note "8xH200" with the correct values used in this file ("Qwen/Qwen3-8B" and
single-GPU / 1 GPU) so the top-of-file comment accurately reflects the current
config (search for the header comment containing "Qwen/Qwen3.5-9B" and "8xH200"
and edit it to mention "Qwen/Qwen3-8B" and "1 GPU").
In `@tools/launcher/launch.py`:
- Around line 92-95: The current branch still runs subprocess.run(["git",
"clean", "-xdf", "."], cwd=examples_dir, check=True) whenever clean is truthy,
which is dangerous; change the logic around the clean flag (the block that
computes examples_dir using _mo_symlink) to require an explicit safeguard before
invoking subprocess.run: either prompt the user for confirmation (y/N) or
require an extra explicit flag/ENV (e.g., force_clean or CONFIRM_CLEAN) to be
set true, and if not confirmed, skip running git clean (or run git clean -n
first and show the result); ensure the subprocess.run call is only executed when
the new confirmation/force check passes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cb5d3c68-0cc0-476f-8fa5-468667bdf516
📒 Files selected for processing (4)
tools/launcher/common/hf_ptq/hf_ptq.shtools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yamltools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yamltools/launcher/launch.py
💤 Files with no reviewable changes (2)
- tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml
- tools/launcher/common/hf_ptq/hf_ptq.sh
| # uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes | ||
| # | ||
| # Usage (local Docker): | ||
| # cd tools/launcher | ||
| # uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes | ||
| # | ||
| # Override model/quant via CLI: | ||
| # uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \ | ||
| # pipeline.global_vars.hf_model=Qwen/Qwen3-8B \ | ||
| # pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \ | ||
| # --yes |
There was a problem hiding this comment.
Usage examples are inconsistent with this file and wrapper argument contract.
These commands reference examples/llm_ptq/hf_ptq.yaml and the override example drops wrapper-required --repo/--local-dir flow for common/hf/ptq.sh, so users can hit Unknown argument errors.
Suggested doc fix
-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml --yes
...
-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml hf_local=/mnt/hf-local --yes
...
-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
-# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
-# pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml \
+# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
# --yes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 14 - 24,
Update the usage examples to point to the actual YAML in this folder
(tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the
wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add
--repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant
override), and make the override CLI use the correct global var names
(pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example
lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit
--repo/--local-dir are replaced with consistent commands that pass the wrapper
args and the correct YAML path.
| global_vars: | ||
| hf_local: /hf-local/ | ||
| hf_model: Qwen/Qwen3-8B | ||
|
|
||
| # Downloads model if needed, then runs huggingface_example.sh | ||
| task_0: | ||
| script: common/hf/ptq.sh | ||
| args: | ||
| - --repo <<global_vars.hf_model>> | ||
| - --local-dir <<global_vars.hf_local>><<global_vars.hf_model>> | ||
| - -- | ||
| - --quant nvfp4 | ||
| - --tasks quant | ||
| slurm_config: | ||
| _factory_: "slurm_factory" | ||
| nodes: 1 | ||
| ntasks_per_node: 1 | ||
| gpus_per_node: 1 | ||
| time: "04:00:00" | ||
| container: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc7 |
There was a problem hiding this comment.
Add required model metadata environment variables for this new model config.
This new model YAML does not define MLM_MODEL_CFG and QUANT_CFG, which are required by launcher conventions for new model configs.
As per coding guidelines, “Set MLM_MODEL_CFG environment variable… [and] set QUANT_CFG environment variable… when adding a new model config.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 32 - 51, The
YAML is missing the required environment variables MLM_MODEL_CFG and QUANT_CFG
for this new model config; add them (e.g., under global_vars or inside task_0's
environment block) so the launcher can pick up model and quantization settings.
Specifically, set MLM_MODEL_CFG to the model config identifier for Qwen3-8B
(matching <<global_vars.hf_model>>/config name) and set QUANT_CFG to the chosen
quantization profile (e.g., "nvfp4" or a config name used by ptq.sh); update
either global_vars (preferred) or add an env: map under task_0 so MLM_MODEL_CFG
and QUANT_CFG are exported when script common/hf/ptq.sh runs.
|
|
||
| import getpass | ||
| import os | ||
| import subprocess # nosec B404 |
There was a problem hiding this comment.
Remove # nosec Bandit bypasses or add a formal security exception workflow.
These suppressions bypass enforced security checks without documented justification/approval in this PR.
Suggested change
-import subprocess # nosec B404
+import subprocess
...
- subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True) # nosec B603 B607
+ subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)As per coding guidelines, “Bandit security checks are enforced via pre-commit hooks. # nosec comments are not allowed as bypasses…”.
Also applies to: 95-95
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/launcher/launch.py` at line 33, The import line "import subprocess"
currently has a "# nosec B404" bypass; remove the "# nosec" comment and either
(a) eliminate the need for suppressing by using a safer API/pattern or
validating inputs before calling subprocess, or (b) add a formal, documented
security exception entry (approved in the security exceptions registry) and
reference that approval in a short inline comment; apply the same change to the
other occurrence of the "# nosec" on the "import subprocess" usage so no Bandit
bypasses remain in launch.py.
What does this PR do?
Type of change: New feature
Adds a HuggingFace PTQ pipeline to the launcher, replacing the old
hf_ptq.sh/hf_ptq_local.yamlapproach with a cleaner wrapper aroundhuggingface_example.sh.Key changes:
common/hf/ptq.sh— wrapper script that downloads the model viahuggingface-cliif needed, then delegates toexamples/llm_ptq/scripts/huggingface_example.shexamples/Qwen/Qwen3-8B/hf_ptq.yaml— example config for Qwen3-8B nvfp4 quantization, supports both Slurm and local Dockercommon/hf_ptq/hf_ptq.shandexamples/Qwen/Qwen3-8B/hf_ptq_local.yaml— replaced by the new unified pipelineSlurmConfig.timefield replaces the hardcoded"04:00:00"inbuild_slurm_executorslurm_factorynow readsSLURM_PARTITIONenv var (default:batch)--cleanflag — newlaunch.pyoption togit clean -xdfthe examples directory before job submissionmodelopt_recipes/— added to the nemo_run packager include listTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/ASummary by CodeRabbit
New Features
cleanoption to purge example directories before execution.Improvements
Deprecated