Add HuggingFace PTQ pipeline to launcher by cjluo-nv · Pull Request #1100 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-03-23T16:54:22Z

What does this PR do?

Type of change: New feature

Adds a HuggingFace PTQ pipeline to the launcher, replacing the old hf_ptq.sh/hf_ptq_local.yaml approach with a cleaner wrapper around huggingface_example.sh.

Key changes:

New common/hf/ptq.sh — wrapper script that downloads the model via huggingface-cli if needed, then delegates to examples/llm_ptq/scripts/huggingface_example.sh
New examples/Qwen/Qwen3-8B/hf_ptq.yaml — example config for Qwen3-8B nvfp4 quantization, supports both Slurm and local Docker
Removed common/hf_ptq/hf_ptq.sh and examples/Qwen/Qwen3-8B/hf_ptq_local.yaml — replaced by the new unified pipeline
Configurable Slurm time limit — SlurmConfig.time field replaces the hardcoded "04:00:00" in build_slurm_executor
Configurable Slurm partition — slurm_factory now reads SLURM_PARTITION env var (default: batch)
--clean flag — new launch.py option to git clean -xdf the examples directory before job submission
Package modelopt_recipes/ — added to the nemo_run packager include list

Testing

Tested HF PTQ pipeline on Slurm with Qwen3-8B

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌
Did you update Changelog?: N/A

Summary by CodeRabbit

New Features
- Added a Hugging Face PTQ wrapper and a pipeline configuration to run PTQ workflows for the Qwen model.
- Launcher gained a clean option to purge example directories before execution.
Improvements
- Slurm job time is now configurable per job.
- Slurm partition defaults now respect environment settings.
Deprecated
- Removed legacy local PTQ launcher and its local pipeline configuration.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

copy-pr-bot · 2026-03-23T16:54:27Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-23T16:55:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7351f98-ce61-44ac-8103-819065fab21a

📥 Commits

Reviewing files that changed from the base of the PR and between 3bcf6a7 and 6a88cb1.

📒 Files selected for processing (1)

tools/launcher/tests/test_slurm_executor.py

📝 Walkthrough

Walkthrough

Introduces a new HF PTQ wrapper script that manages model acquisition and invokes a downstream example; removes the old hf_ptq launcher; adds Slurm time configuration; expands launcher packaging and a clean option; and adds a new HF PTQ example YAML while deleting the prior local YAML.

Changes

Cohort / File(s)	Summary
New PTQ Wrapper `tools/launcher/common/hf/ptq.sh`	Added Bash wrapper that parses wrapper args (requires `--repo` and `--local-dir`), conditionally downloads model via `huggingface-cli`, and execs the downstream Hugging Face example script forwarding PTQ args.
Removed Old PTQ Script `tools/launcher/common/hf_ptq/hf_ptq.sh`	Deleted previous HF PTQ launcher script that sourced utilities and invoked `hf_ptq.py` with fixed defaults.
Example configs `tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`, `tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml`	Added new pipeline YAML (`hf_ptq.yaml`) that uses the new wrapper with repo/local-dir arguments; removed the prior local YAML (`hf_ptq_local.yaml`).
Launcher entry & packaging `tools/launcher/launch.py`	Added `clean: bool = False` parameter to `launch()` and, when true, runs `git clean -xdf` in Model-Optimizer examples. Expanded PatternPackager include patterns to cover `modules/Model-Optimizer/modelopt_recipes/*` and adjusted relative path depth.
Slurm config `tools/launcher/slurm_config.py`	Added `time: str = "04:00:00"` to `SlurmConfig`; `slurm_factory` now accepts `time` and uses `SLURM_PARTITION` env var as `partition` default.
Slurm executor usage `tools/launcher/core.py`, `tools/launcher/tests/test_slurm_executor.py`	`build_slurm_executor` now passes `slurm_config.time` to `run.SlurmExecutor`. Test updated to assert `time="04:00:00"` is forwarded.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Wrapper as tools/launcher/common/hf/ptq.sh
    participant HFHub as HuggingFace Hub (huggingface-cli)
    participant FS as Local FS
    participant Example as downstream huggingface_example.sh

    User->>Wrapper: run ptq.sh --repo <org/model> --local-dir <path> -- --<ptq-args>
    Wrapper->>FS: check for `<local_dir>/config.json`
    alt config exists
        Wrapper-->>FS: skip download
    else missing
        Wrapper->>Wrapper: ensure huggingface_hub installed (pip, best-effort)
        Wrapper->>HFHub: huggingface-cli download "<repo>" --local-dir "<local_dir>"
        HFHub-->>FS: write model files to <local_dir>
    end
    Wrapper->>Example: exec bash huggingface_example.sh --model "<local_dir>" --trust_remote_code <ptq-args>
    Example-->>FS: read model, run PTQ, write outputs
    Example-->>User: exit status / logs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Pull request introduces multiple disallowed `# nosec` comments in launcher files without required inline justification or codeowners approval, violating SECURITY.md guidelines.	Remove all `# nosec` comments and replace with detailed inline comments explaining why each security-sensitive pattern is necessary and safe, then request explicit codeowners review.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add HuggingFace PTQ pipeline to launcher' accurately describes the main objective of the PR, which is to add a new HuggingFace post-training quantization pipeline to the launcher infrastructure.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chenjiel/hf_ptq_computelab

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-23T17:06:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.53%. Comparing base (ada1e26) to head (6a88cb1).
⚠️ Report is 13 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (ada1e26) and HEAD (6a88cb1). Click for more details.

HEAD has 2 uploads less than BASE

Flag BASE (ada1e26) HEAD (6a88cb1)

2 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1100       +/-   ##
===========================================
- Coverage   70.18%   54.53%   -15.65%     
===========================================
  Files         230      348      +118     
  Lines       26080    39778    +13698     
===========================================
+ Hits        18304    21694     +3390     
- Misses       7776    18084    +10308

Flag	Coverage Δ
unit	`54.53% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tools/launcher/examples/llm_ptq/hf_ptq.yaml

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml

…o chenjiel/hf_ptq_computelab

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

github-actions · 2026-03-31T22:26:45Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-02 18:57 UTC

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

tools/launcher/common/hf/ptq.sh (2)

52-53: Suppressed pip errors may hide installation failures.

The || true suppresses all pip errors, which could mask genuine issues (network failures, incompatible Python versions, etc.). If huggingface_hub fails to install, the subsequent huggingface-cli command will fail with a confusing error.

Consider checking if huggingface-cli is available before attempting the download, or letting pip errors propagate.

♻️ Suggested improvement

-    pip install -q huggingface_hub 2>/dev/null || true
-    huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR"
+    if ! command -v huggingface-cli &>/dev/null; then
+        echo "Installing huggingface_hub..."
+        pip install -q huggingface_hub || { echo "Failed to install huggingface_hub" >&2; exit 1; }
+    fi
+    huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/common/hf/ptq.sh` around lines 52 - 53, The pip install line
currently silences all errors via "|| true", which can hide installation
failures and lead to confusing downstream errors when running "huggingface-cli
download \"$REPO\" --local-dir \"$LOCAL_DIR\""; remove the "|| true" and instead
verify the installer succeeded (or explicitly check availability of the CLI)
before calling "huggingface-cli" — e.g., run the pip install for huggingface_hub
without suppression, then check for the presence of the CLI (using command -v
huggingface-cli or attempting a small import/entrypoint) and exit with a clear
error if it's not available, referencing the pip install step and the
"huggingface-cli download" invocation and the variables REPO and LOCAL_DIR.

48-49: Existence check based solely on config.json may be insufficient.

A partially downloaded model could have config.json but be missing critical files (weights, tokenizer). Consider a more robust completeness check, or document that users should manually clean partial downloads.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/common/hf/ptq.sh` around lines 48 - 49, The script ptq.sh
currently treats presence of "$LOCAL_DIR/config.json" as sufficient to skip
download, which can leave partial/invalid models; update the existence check to
verify required model artifacts (e.g., check for weight files like
"pytorch_model.bin" or "model.safetensors" and tokenizer files such as
"tokenizer.json", "vocab.json" or "merges.txt") or implement a download-complete
marker (e.g., write a ".complete" file after a successful download) and only
skip when that marker or all required files exist; use the LOCAL_DIR variable
and verify the set of expected files or marker before deciding to skip.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/launcher/common/hf/ptq.sh`:
- Line 62: The exec line currently always passes the hardcoded
--trust_remote_code flag (exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR"
--trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk; change this so
trust_remote_code is configurable (e.g., read a parameter/env var like
TRUST_REMOTE_CODE or an extra CLI arg) and only append the --trust-remote-code
token to the exec arguments when that value is explicitly true/opted-in (default
false). Update the invocation building logic that constructs the final exec
command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to conditionally include the
--trust-remote-code flag instead of always passing it, and ensure downstream
YAML/launcher docs show how to opt in (the PR already suggests adding
--trust-remote-code to args when desired).

In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml`:
- Line 4: Update the header comment that documents the default model to match
the actual configured model: change the text "Default: Qwen/Qwen3.5-9B with
nvfp4_mlp_only on 8xH200." to reflect "Qwen/Qwen3-8B" (and keep the rest of the
note about nvfp4_mlp_only and 8xH200 intact) so the comment matches the
configuration used later in hf_ptq.yaml.

In `@tools/launcher/launch.py`:
- Around line 92-95: The current use of subprocess.run(["git", "clean", "-xdf",
examples_dir], check=True) is dangerously destructive; update the behavior in
the launch.py block that checks the clean variable (where examples_dir is
derived from LAUNCHER_DIR) by either: 1) changing the git flags to
["git","clean","-df", examples_dir] to preserve .gitignore’d files, or 2)
prompting the user for confirmation before running the clean command (e.g., use
input() to require an explicit "yes"), and/or 3) documenting the destructive
nature in the function/class docstring or the clean parameter docs so callers
are aware; choose one or more of these fixes and apply them around the
subprocess.run invocation and the clean parameter handling.

In `@tools/launcher/slurm_config.py`:
- Line 53: Tests calling slurm_factory() rely on the default partition but can
be affected by the CI environment's SLURM_PARTITION; ensure isolation by either
clearing or setting the env var before those test calls (use
monkeypatch.setenv("SLURM_PARTITION", "") or
monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise
slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory
so the code path using partition: str = os.environ.get("SLURM_PARTITION",
\"batch\") is exercised deterministically.

---

Nitpick comments:
In `@tools/launcher/common/hf/ptq.sh`:
- Around line 52-53: The pip install line currently silences all errors via "||
true", which can hide installation failures and lead to confusing downstream
errors when running "huggingface-cli download \"$REPO\" --local-dir
\"$LOCAL_DIR\""; remove the "|| true" and instead verify the installer succeeded
(or explicitly check availability of the CLI) before calling "huggingface-cli" —
e.g., run the pip install for huggingface_hub without suppression, then check
for the presence of the CLI (using command -v huggingface-cli or attempting a
small import/entrypoint) and exit with a clear error if it's not available,
referencing the pip install step and the "huggingface-cli download" invocation
and the variables REPO and LOCAL_DIR.
- Around line 48-49: The script ptq.sh currently treats presence of
"$LOCAL_DIR/config.json" as sufficient to skip download, which can leave
partial/invalid models; update the existence check to verify required model
artifacts (e.g., check for weight files like "pytorch_model.bin" or
"model.safetensors" and tokenizer files such as "tokenizer.json", "vocab.json"
or "merges.txt") or implement a download-complete marker (e.g., write a
".complete" file after a successful download) and only skip when that marker or
all required files exist; use the LOCAL_DIR variable and verify the set of
expected files or marker before deciding to skip.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e39d9517-5fe7-4c98-baa2-0f0e464950ac

📥 Commits

Reviewing files that changed from the base of the PR and between ada1e26 and 0ee2df3.

📒 Files selected for processing (5)

tools/launcher/common/hf/ptq.sh
tools/launcher/core.py
tools/launcher/examples/llm_ptq/hf_ptq.yaml
tools/launcher/launch.py
tools/launcher/slurm_config.py

coderabbitai · 2026-03-31T23:49:52Z

tools/launcher/common/hf/ptq.sh

+HF_EXAMPLE="${script_dir}/../../modules/Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh"
+
+echo "Running huggingface_example.sh --model $LOCAL_DIR --trust_remote_code ${PTQ_ARGS[*]}"
+exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"


⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded --trust_remote_code enables arbitrary code execution.

The --trust_remote_code flag is hardcoded, which allows execution of arbitrary Python code bundled with HuggingFace checkpoints. If REPO points to a malicious or compromised model, this creates a remote code execution (RCE) vector.

Per coding guidelines: "Do not hardcode trust_remote_code=True... Let the caller decide via a parameter; default to False."

🔒 Suggested fix: Make trust_remote_code configurable

REPO="" LOCAL_DIR="" PTQ_ARGS=() +TRUST_REMOTE_CODE="" # Parse wrapper args up to "--", collect the rest for huggingface_example.sh while [[ $# -gt 0 ]]; do case "$1" in --repo) REPO="$2"; shift 2 ;; --local-dir) LOCAL_DIR="$2"; shift 2 ;; + --trust-remote-code) TRUST_REMOTE_CODE="--trust_remote_code"; shift ;; --) shift; PTQ_ARGS=("$@"); break ;; *) echo "Unknown argument: $1 (use -- to separate PTQ args)" >&2; exit 1 ;; esac done

Then at the exec line:

-exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}" +exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" $TRUST_REMOTE_CODE "${PTQ_ARGS[@]}"

The YAML config would then need to explicitly opt-in:

args: - --repo <<global_vars.hf_model>> - --local-dir <<global_vars.hf_local>><<global_vars.hf_model>> - --trust-remote-code - -- - --quant nvfp4 - --tasks quant

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/common/hf/ptq.sh` at line 62, The exec line currently always passes the hardcoded --trust_remote_code flag (exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk; change this so trust_remote_code is configurable (e.g., read a parameter/env var like TRUST_REMOTE_CODE or an extra CLI arg) and only append the --trust-remote-code token to the exec arguments when that value is explicitly true/opted-in (default false). Update the invocation building logic that constructs the final exec command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to conditionally include the --trust-remote-code flag instead of always passing it, and ensure downstream YAML/launcher docs show how to opt in (the PR already suggests adding --trust-remote-code to args when desired).

coderabbitai · 2026-03-31T23:49:52Z

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml

+# HuggingFace PTQ via huggingface_example.sh
+#
+# Quantizes a HuggingFace model using examples/llm_ptq/scripts/huggingface_example.sh.
+# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.


⚠️ Potential issue | 🟡 Minor

Documentation mismatch with actual configuration.

Line 4 states "Default: Qwen/Qwen3.5-9B" but the actual configuration at line 34 uses Qwen/Qwen3-8B. Update the comment to match the implementation.

📝 Suggested fix

-# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200. +# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.

# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml` at line 4, Update the header comment that documents the default model to match the actual configured model: change the text "Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200." to reflect "Qwen/Qwen3-8B" (and keep the rest of the note about nvfp4_mlp_only and 8xH200 intact) so the comment matches the configuration used later in hf_ptq.yaml.

coderabbitai · 2026-03-31T23:49:52Z

tools/launcher/launch.py

+    if clean:
+        examples_dir = os.path.join(LAUNCHER_DIR, "examples")
+        print(f"Cleaning {examples_dir} with git clean -xdf ...")
+        subprocess.run(["git", "clean", "-xdf", examples_dir], check=True)  # nosec B603 B607


⚠️ Potential issue | 🟡 Minor

Consider the destructive scope of git clean -xdf.

The -xdf flags will remove all untracked files, including ignored files and directories, without confirmation. If a user has uncommitted work or local config files in the examples directory, they will be permanently deleted.

Consider either:

Using -df (without -x) to preserve .gitignored files

Adding a warning/confirmation before cleaning

Documenting this behavior prominently in the clean parameter docstring

💡 Suggested documentation improvement

def launch( job_name: str = "01_job", job_dir: str = os.environ.get("SLURM_JOB_DIR", os.path.expanduser("~/experiments")), pipeline: SandboxPipeline = None, hf_local: str = None, # noqa: RUF013 user: str = getpass.getuser(), identity: str = None, # noqa: RUF013 detach: bool = False, - clean: bool = False, + clean: bool = False, # WARNING: Runs 'git clean -xdf' on examples/, removing ALL untracked files ) -> None:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/launch.py` around lines 92 - 95, The current use of subprocess.run(["git", "clean", "-xdf", examples_dir], check=True) is dangerously destructive; update the behavior in the launch.py block that checks the clean variable (where examples_dir is derived from LAUNCHER_DIR) by either: 1) changing the git flags to ["git","clean","-df", examples_dir] to preserve .gitignore’d files, or 2) prompting the user for confirmation before running the clean command (e.g., use input() to require an explicit "yes"), and/or 3) documenting the destructive nature in the function/class docstring or the clean parameter docs so callers are aware; choose one or more of these fixes and apply them around the subprocess.run invocation and the clean parameter handling.

coderabbitai · 2026-03-31T23:49:52Z

tools/launcher/slurm_config.py

    host: str = os.environ.get("SLURM_HOST", ""),
    account: str = os.environ.get("SLURM_ACCOUNT", ""),
-    partition: str = "batch",
+    partition: str = os.environ.get("SLURM_PARTITION", "batch"),


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if existing tests explicitly set partition or rely on default rg -n "slurm_factory\(" tools/launcher/tests/ -A 5 | head -60

Repository: NVIDIA/Model-Optimizer

Length of output: 3174

🏁 Script executed:

# Check the actual test implementations and their assertions sed -n '71,120p' tools/launcher/tests/test_slurm_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1679

🏁 Script executed:

# Check if there are any monkeypatch calls for SLURM_PARTITION rg "SLURM_PARTITION" tools/launcher/tests/

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check the slurm_factory function signature and implementation sed -n '1,100p' tools/launcher/slurm_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2722

Add explicit partition isolation to tests.

Tests at lines 71–85 call slurm_factory() without mocking SLURM_PARTITION. If this variable is set in the CI environment, the tests will not evaluate the default "batch" value as intended. Use monkeypatch.setenv("SLURM_PARTITION", "") (or delete it) before these test calls to ensure consistent behavior, or explicitly pass partition="batch" to isolate tests from environment state.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/slurm_config.py` at line 53, Tests calling slurm_factory() rely on the default partition but can be affected by the CI environment's SLURM_PARTITION; ensure isolation by either clearing or setting the env var before those test calls (use monkeypatch.setenv("SLURM_PARTITION", "") or monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory so the code path using partition: str = os.environ.get("SLURM_PARTITION", \"batch\") is exercised deterministically.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

tools/launcher/launch.py (1)

92-95: ⚠️ Potential issue | 🟡 Minor

clean=True still performs destructive cleanup without explicit safeguard.

This still runs git clean -xdf, which removes ignored/untracked files irreversibly in the examples tree.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/launch.py` around lines 92 - 95, The current branch still runs
subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)
whenever clean is truthy, which is dangerous; change the logic around the clean
flag (the block that computes examples_dir using _mo_symlink) to require an
explicit safeguard before invoking subprocess.run: either prompt the user for
confirmation (y/N) or require an extra explicit flag/ENV (e.g., force_clean or
CONFIRM_CLEAN) to be set true, and if not confirmed, skip running git clean (or
run git clean -n first and show the result); ensure the subprocess.run call is
only executed when the new confirmation/force check passes.

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Header default is stale and conflicts with actual config.

The comment says Qwen/Qwen3.5-9B ... 8xH200, but this file runs Qwen/Qwen3-8B on 1 GPU.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` at line 4, Update the
stale header comment in hf_ptq.yaml so it matches the actual config: replace the
incorrect model string "Qwen/Qwen3.5-9B" and hardware note "8xH200" with the
correct values used in this file ("Qwen/Qwen3-8B" and single-GPU / 1 GPU) so the
top-of-file comment accurately reflects the current config (search for the
header comment containing "Qwen/Qwen3.5-9B" and "8xH200" and edit it to mention
"Qwen/Qwen3-8B" and "1 GPU").

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`:
- Around line 32-51: The YAML is missing the required environment variables
MLM_MODEL_CFG and QUANT_CFG for this new model config; add them (e.g., under
global_vars or inside task_0's environment block) so the launcher can pick up
model and quantization settings. Specifically, set MLM_MODEL_CFG to the model
config identifier for Qwen3-8B (matching <<global_vars.hf_model>>/config name)
and set QUANT_CFG to the chosen quantization profile (e.g., "nvfp4" or a config
name used by ptq.sh); update either global_vars (preferred) or add an env: map
under task_0 so MLM_MODEL_CFG and QUANT_CFG are exported when script
common/hf/ptq.sh runs.
- Around line 14-24: Update the usage examples to point to the actual YAML in
this folder (tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the
wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add
--repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant
override), and make the override CLI use the correct global var names
(pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example
lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit
--repo/--local-dir are replaced with consistent commands that pass the wrapper
args and the correct YAML path.

In `@tools/launcher/launch.py`:
- Line 33: The import line "import subprocess" currently has a "# nosec B404"
bypass; remove the "# nosec" comment and either (a) eliminate the need for
suppressing by using a safer API/pattern or validating inputs before calling
subprocess, or (b) add a formal, documented security exception entry (approved
in the security exceptions registry) and reference that approval in a short
inline comment; apply the same change to the other occurrence of the "# nosec"
on the "import subprocess" usage so no Bandit bypasses remain in launch.py.

---

Duplicate comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`:
- Line 4: Update the stale header comment in hf_ptq.yaml so it matches the
actual config: replace the incorrect model string "Qwen/Qwen3.5-9B" and hardware
note "8xH200" with the correct values used in this file ("Qwen/Qwen3-8B" and
single-GPU / 1 GPU) so the top-of-file comment accurately reflects the current
config (search for the header comment containing "Qwen/Qwen3.5-9B" and "8xH200"
and edit it to mention "Qwen/Qwen3-8B" and "1 GPU").

In `@tools/launcher/launch.py`:
- Around line 92-95: The current branch still runs subprocess.run(["git",
"clean", "-xdf", "."], cwd=examples_dir, check=True) whenever clean is truthy,
which is dangerous; change the logic around the clean flag (the block that
computes examples_dir using _mo_symlink) to require an explicit safeguard before
invoking subprocess.run: either prompt the user for confirmation (y/N) or
require an extra explicit flag/ENV (e.g., force_clean or CONFIRM_CLEAN) to be
set true, and if not confirmed, skip running git clean (or run git clean -n
first and show the result); ensure the subprocess.run call is only executed when
the new confirmation/force check passes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb5d3c68-0cc0-476f-8fa5-468667bdf516

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee2df3 and 3bcf6a7.

📒 Files selected for processing (4)

tools/launcher/common/hf_ptq/hf_ptq.sh
tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml
tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml
tools/launcher/launch.py

💤 Files with no reviewable changes (2)

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml
tools/launcher/common/hf_ptq/hf_ptq.sh

coderabbitai · 2026-04-01T06:16:04Z

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml

+#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
+#
+# Usage (local Docker):
+#   cd tools/launcher
+#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
+#
+# Override model/quant via CLI:
+#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
+#       pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
+#       pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
+#       --yes


⚠️ Potential issue | 🟡 Minor

Usage examples are inconsistent with this file and wrapper argument contract.

These commands reference examples/llm_ptq/hf_ptq.yaml and the override example drops wrapper-required --repo/--local-dir flow for common/hf/ptq.sh, so users can hit Unknown argument errors.

Suggested doc fix

-# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes +# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml --yes ... -# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes +# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml hf_local=/mnt/hf-local --yes ... -# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \ -# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \ -# pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \ +# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml \ +# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \ # --yes

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 14 - 24, Update the usage examples to point to the actual YAML in this folder (tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add --repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant override), and make the override CLI use the correct global var names (pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit --repo/--local-dir are replaced with consistent commands that pass the wrapper args and the correct YAML path.

coderabbitai · 2026-04-01T06:16:04Z

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml

+  global_vars:
+    hf_local: /hf-local/
+    hf_model: Qwen/Qwen3-8B
+
+  # Downloads model if needed, then runs huggingface_example.sh
+  task_0:
+    script: common/hf/ptq.sh
+    args:
+      - --repo <<global_vars.hf_model>>
+      - --local-dir <<global_vars.hf_local>><<global_vars.hf_model>>
+      - --
+      - --quant nvfp4
+      - --tasks quant
+    slurm_config:
+      _factory_: "slurm_factory"
+      nodes: 1
+      ntasks_per_node: 1
+      gpus_per_node: 1
+      time: "04:00:00"
+      container: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc7


⚠️ Potential issue | 🟠 Major

Add required model metadata environment variables for this new model config.

This new model YAML does not define MLM_MODEL_CFG and QUANT_CFG, which are required by launcher conventions for new model configs.

As per coding guidelines, “Set MLM_MODEL_CFG environment variable… [and] set QUANT_CFG environment variable… when adding a new model config.”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 32 - 51, The YAML is missing the required environment variables MLM_MODEL_CFG and QUANT_CFG for this new model config; add them (e.g., under global_vars or inside task_0's environment block) so the launcher can pick up model and quantization settings. Specifically, set MLM_MODEL_CFG to the model config identifier for Qwen3-8B (matching <<global_vars.hf_model>>/config name) and set QUANT_CFG to the chosen quantization profile (e.g., "nvfp4" or a config name used by ptq.sh); update either global_vars (preferred) or add an env: map under task_0 so MLM_MODEL_CFG and QUANT_CFG are exported when script common/hf/ptq.sh runs.

coderabbitai · 2026-04-01T06:16:04Z

tools/launcher/launch.py


 import getpass
 import os
+import subprocess  # nosec B404


⚠️ Potential issue | 🔴 Critical

Remove # nosec Bandit bypasses or add a formal security exception workflow.

These suppressions bypass enforced security checks without documented justification/approval in this PR.

Suggested change

-import subprocess # nosec B404 +import subprocess ... - subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True) # nosec B603 B607 + subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)

As per coding guidelines, “Bandit security checks are enforced via pre-commit hooks. # nosec comments are not allowed as bypasses…”.

Also applies to: 95-95

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/launcher/launch.py` at line 33, The import line "import subprocess" currently has a "# nosec B404" bypass; remove the "# nosec" comment and either (a) eliminate the need for suppressing by using a safer API/pattern or validating inputs before calling subprocess, or (b) add a formal, documented security exception entry (approved in the security exceptions registry) and reference that approval in a short inline comment; apply the same change to the other occurrence of the "# nosec" on the "import subprocess" usage so no Bandit bypasses remain in launch.py.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

Run hf ptq on computelab

677da32

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

ChenhanYu reviewed Mar 23, 2026

View reviewed changes

tools/launcher/examples/llm_ptq/hf_ptq.yaml Outdated Show resolved Hide resolved

ChenhanYu reviewed Mar 23, 2026

View reviewed changes

tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml Show resolved Hide resolved

cjluo-nv added 3 commits March 31, 2026 21:45

Merge branch 'main' of github.com:NVIDIA/TensorRT-Model-Optimizer int…

bdf06f8

…o chenjiel/hf_ptq_computelab

Fix

cfcfbeb

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

Update

a81451e

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

Fix

0ee2df3

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

ChenhanYu marked this pull request as ready for review March 31, 2026 23:43

ChenhanYu self-requested a review March 31, 2026 23:43

ChenhanYu approved these changes Mar 31, 2026

View reviewed changes

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Update

3bcf6a7

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv changed the title ~~Run hf ptq on computelab~~ Add HuggingFace PTQ pipeline to launcher Apr 1, 2026

cjluo-nv enabled auto-merge (squash) April 1, 2026 06:13

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Fix

6a88cb1

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv merged commit 87ea8ba into main Apr 2, 2026
37 of 38 checks passed

cjluo-nv deleted the chenjiel/hf_ptq_computelab branch April 2, 2026 18:55

	# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
	# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.

Conversation

cjluo-nv commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjluo-nv commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 31, 2026 •

edited

Loading