Skip to content

Add HuggingFace PTQ pipeline to launcher#1100

Merged
cjluo-nv merged 7 commits intomainfrom
chenjiel/hf_ptq_computelab
Apr 2, 2026
Merged

Add HuggingFace PTQ pipeline to launcher#1100
cjluo-nv merged 7 commits intomainfrom
chenjiel/hf_ptq_computelab

Conversation

@cjluo-nv
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv commented Mar 23, 2026

What does this PR do?

Type of change: New feature

Adds a HuggingFace PTQ pipeline to the launcher, replacing the old hf_ptq.sh/hf_ptq_local.yaml approach with a cleaner wrapper around huggingface_example.sh.

Key changes:

  • New common/hf/ptq.sh — wrapper script that downloads the model via huggingface-cli if needed, then delegates to examples/llm_ptq/scripts/huggingface_example.sh
  • New examples/Qwen/Qwen3-8B/hf_ptq.yaml — example config for Qwen3-8B nvfp4 quantization, supports both Slurm and local Docker
  • Removed common/hf_ptq/hf_ptq.sh and examples/Qwen/Qwen3-8B/hf_ptq_local.yaml — replaced by the new unified pipeline
  • Configurable Slurm time limitSlurmConfig.time field replaces the hardcoded "04:00:00" in build_slurm_executor
  • Configurable Slurm partitionslurm_factory now reads SLURM_PARTITION env var (default: batch)
  • --clean flag — new launch.py option to git clean -xdf the examples directory before job submission
  • Package modelopt_recipes/ — added to the nemo_run packager include list

Testing

  • Tested HF PTQ pipeline on Slurm with Qwen3-8B

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ❌
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • New Features

    • Added a Hugging Face PTQ wrapper and a pipeline configuration to run PTQ workflows for the Qwen model.
    • Launcher gained a clean option to purge example directories before execution.
  • Improvements

    • Slurm job time is now configurable per job.
    • Slurm partition defaults now respect environment settings.
  • Deprecated

    • Removed legacy local PTQ launcher and its local pipeline configuration.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7351f98-ce61-44ac-8103-819065fab21a

📥 Commits

Reviewing files that changed from the base of the PR and between 3bcf6a7 and 6a88cb1.

📒 Files selected for processing (1)
  • tools/launcher/tests/test_slurm_executor.py

📝 Walkthrough

Walkthrough

Introduces a new HF PTQ wrapper script that manages model acquisition and invokes a downstream example; removes the old hf_ptq launcher; adds Slurm time configuration; expands launcher packaging and a clean option; and adds a new HF PTQ example YAML while deleting the prior local YAML.

Changes

Cohort / File(s) Summary
New PTQ Wrapper
tools/launcher/common/hf/ptq.sh
Added Bash wrapper that parses wrapper args (requires --repo and --local-dir), conditionally downloads model via huggingface-cli, and execs the downstream Hugging Face example script forwarding PTQ args.
Removed Old PTQ Script
tools/launcher/common/hf_ptq/hf_ptq.sh
Deleted previous HF PTQ launcher script that sourced utilities and invoked hf_ptq.py with fixed defaults.
Example configs
tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml, tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml
Added new pipeline YAML (hf_ptq.yaml) that uses the new wrapper with repo/local-dir arguments; removed the prior local YAML (hf_ptq_local.yaml).
Launcher entry & packaging
tools/launcher/launch.py
Added clean: bool = False parameter to launch() and, when true, runs git clean -xdf in Model-Optimizer examples. Expanded PatternPackager include patterns to cover modules/Model-Optimizer/modelopt_recipes/* and adjusted relative path depth.
Slurm config
tools/launcher/slurm_config.py
Added time: str = "04:00:00" to SlurmConfig; slurm_factory now accepts time and uses SLURM_PARTITION env var as partition default.
Slurm executor usage
tools/launcher/core.py, tools/launcher/tests/test_slurm_executor.py
build_slurm_executor now passes slurm_config.time to run.SlurmExecutor. Test updated to assert time="04:00:00" is forwarded.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Wrapper as tools/launcher/common/hf/ptq.sh
    participant HFHub as HuggingFace Hub (huggingface-cli)
    participant FS as Local FS
    participant Example as downstream huggingface_example.sh

    User->>Wrapper: run ptq.sh --repo <org/model> --local-dir <path> -- --<ptq-args>
    Wrapper->>FS: check for `<local_dir>/config.json`
    alt config exists
        Wrapper-->>FS: skip download
    else missing
        Wrapper->>Wrapper: ensure huggingface_hub installed (pip, best-effort)
        Wrapper->>HFHub: huggingface-cli download "<repo>" --local-dir "<local_dir>"
        HFHub-->>FS: write model files to <local_dir>
    end
    Wrapper->>Example: exec bash huggingface_example.sh --model "<local_dir>" --trust_remote_code <ptq-args>
    Example-->>FS: read model, run PTQ, write outputs
    Example-->>User: exit status / logs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error Pull request introduces multiple disallowed # nosec comments in launcher files without required inline justification or codeowners approval, violating SECURITY.md guidelines. Remove all # nosec comments and replace with detailed inline comments explaining why each security-sensitive pattern is necessary and safe, then request explicit codeowners review.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add HuggingFace PTQ pipeline to launcher' accurately describes the main objective of the PR, which is to add a new HuggingFace post-training quantization pipeline to the launcher infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chenjiel/hf_ptq_computelab

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.53%. Comparing base (ada1e26) to head (6a88cb1).
⚠️ Report is 13 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (ada1e26) and HEAD (6a88cb1). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (ada1e26) HEAD (6a88cb1)
2 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1100       +/-   ##
===========================================
- Coverage   70.18%   54.53%   -15.65%     
===========================================
  Files         230      348      +118     
  Lines       26080    39778    +13698     
===========================================
+ Hits        18304    21694     +3390     
- Misses       7776    18084    +10308     
Flag Coverage Δ
unit 54.53% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-02 18:57 UTC

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@ChenhanYu ChenhanYu marked this pull request as ready for review March 31, 2026 23:43
@ChenhanYu ChenhanYu self-requested a review March 31, 2026 23:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
tools/launcher/common/hf/ptq.sh (2)

52-53: Suppressed pip errors may hide installation failures.

The || true suppresses all pip errors, which could mask genuine issues (network failures, incompatible Python versions, etc.). If huggingface_hub fails to install, the subsequent huggingface-cli command will fail with a confusing error.

Consider checking if huggingface-cli is available before attempting the download, or letting pip errors propagate.

♻️ Suggested improvement
-    pip install -q huggingface_hub 2>/dev/null || true
-    huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR"
+    if ! command -v huggingface-cli &>/dev/null; then
+        echo "Installing huggingface_hub..."
+        pip install -q huggingface_hub || { echo "Failed to install huggingface_hub" >&2; exit 1; }
+    fi
+    huggingface-cli download "$REPO" --local-dir "$LOCAL_DIR"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/common/hf/ptq.sh` around lines 52 - 53, The pip install line
currently silences all errors via "|| true", which can hide installation
failures and lead to confusing downstream errors when running "huggingface-cli
download \"$REPO\" --local-dir \"$LOCAL_DIR\""; remove the "|| true" and instead
verify the installer succeeded (or explicitly check availability of the CLI)
before calling "huggingface-cli" — e.g., run the pip install for huggingface_hub
without suppression, then check for the presence of the CLI (using command -v
huggingface-cli or attempting a small import/entrypoint) and exit with a clear
error if it's not available, referencing the pip install step and the
"huggingface-cli download" invocation and the variables REPO and LOCAL_DIR.

48-49: Existence check based solely on config.json may be insufficient.

A partially downloaded model could have config.json but be missing critical files (weights, tokenizer). Consider a more robust completeness check, or document that users should manually clean partial downloads.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/common/hf/ptq.sh` around lines 48 - 49, The script ptq.sh
currently treats presence of "$LOCAL_DIR/config.json" as sufficient to skip
download, which can leave partial/invalid models; update the existence check to
verify required model artifacts (e.g., check for weight files like
"pytorch_model.bin" or "model.safetensors" and tokenizer files such as
"tokenizer.json", "vocab.json" or "merges.txt") or implement a download-complete
marker (e.g., write a ".complete" file after a successful download) and only
skip when that marker or all required files exist; use the LOCAL_DIR variable
and verify the set of expected files or marker before deciding to skip.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/launcher/common/hf/ptq.sh`:
- Line 62: The exec line currently always passes the hardcoded
--trust_remote_code flag (exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR"
--trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk; change this so
trust_remote_code is configurable (e.g., read a parameter/env var like
TRUST_REMOTE_CODE or an extra CLI arg) and only append the --trust-remote-code
token to the exec arguments when that value is explicitly true/opted-in (default
false). Update the invocation building logic that constructs the final exec
command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to conditionally include the
--trust-remote-code flag instead of always passing it, and ensure downstream
YAML/launcher docs show how to opt in (the PR already suggests adding
--trust-remote-code to args when desired).

In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml`:
- Line 4: Update the header comment that documents the default model to match
the actual configured model: change the text "Default: Qwen/Qwen3.5-9B with
nvfp4_mlp_only on 8xH200." to reflect "Qwen/Qwen3-8B" (and keep the rest of the
note about nvfp4_mlp_only and 8xH200 intact) so the comment matches the
configuration used later in hf_ptq.yaml.

In `@tools/launcher/launch.py`:
- Around line 92-95: The current use of subprocess.run(["git", "clean", "-xdf",
examples_dir], check=True) is dangerously destructive; update the behavior in
the launch.py block that checks the clean variable (where examples_dir is
derived from LAUNCHER_DIR) by either: 1) changing the git flags to
["git","clean","-df", examples_dir] to preserve .gitignore’d files, or 2)
prompting the user for confirmation before running the clean command (e.g., use
input() to require an explicit "yes"), and/or 3) documenting the destructive
nature in the function/class docstring or the clean parameter docs so callers
are aware; choose one or more of these fixes and apply them around the
subprocess.run invocation and the clean parameter handling.

In `@tools/launcher/slurm_config.py`:
- Line 53: Tests calling slurm_factory() rely on the default partition but can
be affected by the CI environment's SLURM_PARTITION; ensure isolation by either
clearing or setting the env var before those test calls (use
monkeypatch.setenv("SLURM_PARTITION", "") or
monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise
slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory
so the code path using partition: str = os.environ.get("SLURM_PARTITION",
\"batch\") is exercised deterministically.

---

Nitpick comments:
In `@tools/launcher/common/hf/ptq.sh`:
- Around line 52-53: The pip install line currently silences all errors via "||
true", which can hide installation failures and lead to confusing downstream
errors when running "huggingface-cli download \"$REPO\" --local-dir
\"$LOCAL_DIR\""; remove the "|| true" and instead verify the installer succeeded
(or explicitly check availability of the CLI) before calling "huggingface-cli" —
e.g., run the pip install for huggingface_hub without suppression, then check
for the presence of the CLI (using command -v huggingface-cli or attempting a
small import/entrypoint) and exit with a clear error if it's not available,
referencing the pip install step and the "huggingface-cli download" invocation
and the variables REPO and LOCAL_DIR.
- Around line 48-49: The script ptq.sh currently treats presence of
"$LOCAL_DIR/config.json" as sufficient to skip download, which can leave
partial/invalid models; update the existence check to verify required model
artifacts (e.g., check for weight files like "pytorch_model.bin" or
"model.safetensors" and tokenizer files such as "tokenizer.json", "vocab.json"
or "merges.txt") or implement a download-complete marker (e.g., write a
".complete" file after a successful download) and only skip when that marker or
all required files exist; use the LOCAL_DIR variable and verify the set of
expected files or marker before deciding to skip.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e39d9517-5fe7-4c98-baa2-0f0e464950ac

📥 Commits

Reviewing files that changed from the base of the PR and between ada1e26 and 0ee2df3.

📒 Files selected for processing (5)
  • tools/launcher/common/hf/ptq.sh
  • tools/launcher/core.py
  • tools/launcher/examples/llm_ptq/hf_ptq.yaml
  • tools/launcher/launch.py
  • tools/launcher/slurm_config.py

HF_EXAMPLE="${script_dir}/../../modules/Model-Optimizer/examples/llm_ptq/scripts/huggingface_example.sh"

echo "Running huggingface_example.sh --model $LOCAL_DIR --trust_remote_code ${PTQ_ARGS[*]}"
exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded --trust_remote_code enables arbitrary code execution.

The --trust_remote_code flag is hardcoded, which allows execution of arbitrary Python code bundled with HuggingFace checkpoints. If REPO points to a malicious or compromised model, this creates a remote code execution (RCE) vector.

Per coding guidelines: "Do not hardcode trust_remote_code=True... Let the caller decide via a parameter; default to False."

🔒 Suggested fix: Make trust_remote_code configurable
 REPO=""
 LOCAL_DIR=""
 PTQ_ARGS=()
+TRUST_REMOTE_CODE=""
 
 # Parse wrapper args up to "--", collect the rest for huggingface_example.sh
 while [[ $# -gt 0 ]]; do
     case "$1" in
         --repo) REPO="$2"; shift 2 ;;
         --local-dir) LOCAL_DIR="$2"; shift 2 ;;
+        --trust-remote-code) TRUST_REMOTE_CODE="--trust_remote_code"; shift ;;
         --) shift; PTQ_ARGS=("$@"); break ;;
         *) echo "Unknown argument: $1 (use -- to separate PTQ args)" >&2; exit 1 ;;
     esac
 done

Then at the exec line:

-exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"
+exec bash "$HF_EXAMPLE" --model "$LOCAL_DIR" $TRUST_REMOTE_CODE "${PTQ_ARGS[@]}"

The YAML config would then need to explicitly opt-in:

args:
  - --repo <<global_vars.hf_model>>
  - --local-dir <<global_vars.hf_local>><<global_vars.hf_model>>
  - --trust-remote-code
  - --
  - --quant nvfp4
  - --tasks quant
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/common/hf/ptq.sh` at line 62, The exec line currently always
passes the hardcoded --trust_remote_code flag (exec bash "$HF_EXAMPLE" --model
"$LOCAL_DIR" --trust_remote_code "${PTQ_ARGS[@]}"), which creates an RCE risk;
change this so trust_remote_code is configurable (e.g., read a parameter/env var
like TRUST_REMOTE_CODE or an extra CLI arg) and only append the
--trust-remote-code token to the exec arguments when that value is explicitly
true/opted-in (default false). Update the invocation building logic that
constructs the final exec command for HF_EXAMPLE/LOCAL_DIR/PTQ_ARGS to
conditionally include the --trust-remote-code flag instead of always passing it,
and ensure downstream YAML/launcher docs show how to opt in (the PR already
suggests adding --trust-remote-code to args when desired).

# HuggingFace PTQ via huggingface_example.sh
#
# Quantizes a HuggingFace model using examples/llm_ptq/scripts/huggingface_example.sh.
# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation mismatch with actual configuration.

Line 4 states "Default: Qwen/Qwen3.5-9B" but the actual configuration at line 34 uses Qwen/Qwen3-8B. Update the comment to match the implementation.

📝 Suggested fix
-# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
+# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200.
# Default: Qwen/Qwen3-8B with nvfp4 on 1 GPU.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/llm_ptq/hf_ptq.yaml` at line 4, Update the header
comment that documents the default model to match the actual configured model:
change the text "Default: Qwen/Qwen3.5-9B with nvfp4_mlp_only on 8xH200." to
reflect "Qwen/Qwen3-8B" (and keep the rest of the note about nvfp4_mlp_only and
8xH200 intact) so the comment matches the configuration used later in
hf_ptq.yaml.

Comment on lines +92 to +95
if clean:
examples_dir = os.path.join(LAUNCHER_DIR, "examples")
print(f"Cleaning {examples_dir} with git clean -xdf ...")
subprocess.run(["git", "clean", "-xdf", examples_dir], check=True) # nosec B603 B607
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider the destructive scope of git clean -xdf.

The -xdf flags will remove all untracked files, including ignored files and directories, without confirmation. If a user has uncommitted work or local config files in the examples directory, they will be permanently deleted.

Consider either:

  1. Using -df (without -x) to preserve .gitignored files
  2. Adding a warning/confirmation before cleaning
  3. Documenting this behavior prominently in the clean parameter docstring
💡 Suggested documentation improvement
 def launch(
     job_name: str = "01_job",
     job_dir: str = os.environ.get("SLURM_JOB_DIR", os.path.expanduser("~/experiments")),
     pipeline: SandboxPipeline = None,
     hf_local: str = None,  # noqa: RUF013
     user: str = getpass.getuser(),
     identity: str = None,  # noqa: RUF013
     detach: bool = False,
-    clean: bool = False,
+    clean: bool = False,  # WARNING: Runs 'git clean -xdf' on examples/, removing ALL untracked files
 ) -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/launch.py` around lines 92 - 95, The current use of
subprocess.run(["git", "clean", "-xdf", examples_dir], check=True) is
dangerously destructive; update the behavior in the launch.py block that checks
the clean variable (where examples_dir is derived from LAUNCHER_DIR) by either:
1) changing the git flags to ["git","clean","-df", examples_dir] to preserve
.gitignore’d files, or 2) prompting the user for confirmation before running the
clean command (e.g., use input() to require an explicit "yes"), and/or 3)
documenting the destructive nature in the function/class docstring or the clean
parameter docs so callers are aware; choose one or more of these fixes and apply
them around the subprocess.run invocation and the clean parameter handling.

host: str = os.environ.get("SLURM_HOST", ""),
account: str = os.environ.get("SLURM_ACCOUNT", ""),
partition: str = "batch",
partition: str = os.environ.get("SLURM_PARTITION", "batch"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if existing tests explicitly set partition or rely on default
rg -n "slurm_factory\(" tools/launcher/tests/ -A 5 | head -60

Repository: NVIDIA/Model-Optimizer

Length of output: 3174


🏁 Script executed:

# Check the actual test implementations and their assertions
sed -n '71,120p' tools/launcher/tests/test_slurm_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1679


🏁 Script executed:

# Check if there are any monkeypatch calls for SLURM_PARTITION
rg "SLURM_PARTITION" tools/launcher/tests/

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check the slurm_factory function signature and implementation
sed -n '1,100p' tools/launcher/slurm_config.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2722


Add explicit partition isolation to tests.

Tests at lines 71–85 call slurm_factory() without mocking SLURM_PARTITION. If this variable is set in the CI environment, the tests will not evaluate the default "batch" value as intended. Use monkeypatch.setenv("SLURM_PARTITION", "") (or delete it) before these test calls to ensure consistent behavior, or explicitly pass partition="batch" to isolate tests from environment state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/slurm_config.py` at line 53, Tests calling slurm_factory()
rely on the default partition but can be affected by the CI environment's
SLURM_PARTITION; ensure isolation by either clearing or setting the env var
before those test calls (use monkeypatch.setenv("SLURM_PARTITION", "") or
monkeypatch.delenv("SLURM_PARTITION", raising=False) in the tests that exercise
slurm_factory) or explicitly pass partition="batch" when invoking slurm_factory
so the code path using partition: str = os.environ.get("SLURM_PARTITION",
\"batch\") is exercised deterministically.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv changed the title Run hf ptq on computelab Add HuggingFace PTQ pipeline to launcher Apr 1, 2026
@cjluo-nv cjluo-nv enabled auto-merge (squash) April 1, 2026 06:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
tools/launcher/launch.py (1)

92-95: ⚠️ Potential issue | 🟡 Minor

clean=True still performs destructive cleanup without explicit safeguard.

This still runs git clean -xdf, which removes ignored/untracked files irreversibly in the examples tree.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/launch.py` around lines 92 - 95, The current branch still runs
subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)
whenever clean is truthy, which is dangerous; change the logic around the clean
flag (the block that computes examples_dir using _mo_symlink) to require an
explicit safeguard before invoking subprocess.run: either prompt the user for
confirmation (y/N) or require an extra explicit flag/ENV (e.g., force_clean or
CONFIRM_CLEAN) to be set true, and if not confirmed, skip running git clean (or
run git clean -n first and show the result); ensure the subprocess.run call is
only executed when the new confirmation/force check passes.
tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Header default is stale and conflicts with actual config.

The comment says Qwen/Qwen3.5-9B ... 8xH200, but this file runs Qwen/Qwen3-8B on 1 GPU.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` at line 4, Update the
stale header comment in hf_ptq.yaml so it matches the actual config: replace the
incorrect model string "Qwen/Qwen3.5-9B" and hardware note "8xH200" with the
correct values used in this file ("Qwen/Qwen3-8B" and single-GPU / 1 GPU) so the
top-of-file comment accurately reflects the current config (search for the
header comment containing "Qwen/Qwen3.5-9B" and "8xH200" and edit it to mention
"Qwen/Qwen3-8B" and "1 GPU").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`:
- Around line 32-51: The YAML is missing the required environment variables
MLM_MODEL_CFG and QUANT_CFG for this new model config; add them (e.g., under
global_vars or inside task_0's environment block) so the launcher can pick up
model and quantization settings. Specifically, set MLM_MODEL_CFG to the model
config identifier for Qwen3-8B (matching <<global_vars.hf_model>>/config name)
and set QUANT_CFG to the chosen quantization profile (e.g., "nvfp4" or a config
name used by ptq.sh); update either global_vars (preferred) or add an env: map
under task_0 so MLM_MODEL_CFG and QUANT_CFG are exported when script
common/hf/ptq.sh runs.
- Around line 14-24: Update the usage examples to point to the actual YAML in
this folder (tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the
wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add
--repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant
override), and make the override CLI use the correct global var names
(pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example
lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit
--repo/--local-dir are replaced with consistent commands that pass the wrapper
args and the correct YAML path.

In `@tools/launcher/launch.py`:
- Line 33: The import line "import subprocess" currently has a "# nosec B404"
bypass; remove the "# nosec" comment and either (a) eliminate the need for
suppressing by using a safer API/pattern or validating inputs before calling
subprocess, or (b) add a formal, documented security exception entry (approved
in the security exceptions registry) and reference that approval in a short
inline comment; apply the same change to the other occurrence of the "# nosec"
on the "import subprocess" usage so no Bandit bypasses remain in launch.py.

---

Duplicate comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml`:
- Line 4: Update the stale header comment in hf_ptq.yaml so it matches the
actual config: replace the incorrect model string "Qwen/Qwen3.5-9B" and hardware
note "8xH200" with the correct values used in this file ("Qwen/Qwen3-8B" and
single-GPU / 1 GPU) so the top-of-file comment accurately reflects the current
config (search for the header comment containing "Qwen/Qwen3.5-9B" and "8xH200"
and edit it to mention "Qwen/Qwen3-8B" and "1 GPU").

In `@tools/launcher/launch.py`:
- Around line 92-95: The current branch still runs subprocess.run(["git",
"clean", "-xdf", "."], cwd=examples_dir, check=True) whenever clean is truthy,
which is dangerous; change the logic around the clean flag (the block that
computes examples_dir using _mo_symlink) to require an explicit safeguard before
invoking subprocess.run: either prompt the user for confirmation (y/N) or
require an extra explicit flag/ENV (e.g., force_clean or CONFIRM_CLEAN) to be
set true, and if not confirmed, skip running git clean (or run git clean -n
first and show the result); ensure the subprocess.run call is only executed when
the new confirmation/force check passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb5d3c68-0cc0-476f-8fa5-468667bdf516

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee2df3 and 3bcf6a7.

📒 Files selected for processing (4)
  • tools/launcher/common/hf_ptq/hf_ptq.sh
  • tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml
  • tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml
  • tools/launcher/launch.py
💤 Files with no reviewable changes (2)
  • tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq_local.yaml
  • tools/launcher/common/hf_ptq/hf_ptq.sh

Comment on lines +14 to +24
# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
#
# Usage (local Docker):
# cd tools/launcher
# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
#
# Override model/quant via CLI:
# uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
# pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
# pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
# --yes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Usage examples are inconsistent with this file and wrapper argument contract.

These commands reference examples/llm_ptq/hf_ptq.yaml and the override example drops wrapper-required --repo/--local-dir flow for common/hf/ptq.sh, so users can hit Unknown argument errors.

Suggested doc fix
-#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml --yes
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml --yes
...
-#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml hf_local=/mnt/hf-local --yes
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml hf_local=/mnt/hf-local --yes
...
-#   uv run launch.py --yaml examples/llm_ptq/hf_ptq.yaml \
-#       pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
-#       pipeline.task_0.args='[--model,<<global_vars.hf_local>>Qwen/Qwen3-8B,--quant,nvfp4]' \
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_ptq.yaml \
+#       pipeline.global_vars.hf_model=Qwen/Qwen3-8B \
 #       --yes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 14 - 24,
Update the usage examples to point to the actual YAML in this folder
(tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml) and include the
wrapper-required repository/local-dir flags for common/hf/ptq.sh (e.g., add
--repo=... or --local-dir=<<global_vars.hf_local>> before the model/quant
override), and make the override CLI use the correct global var names
(pipeline.global_vars.hf_model and pipeline.global_vars.hf_local) so the example
lines that currently reference examples/llm_ptq/hf_ptq.yaml and omit
--repo/--local-dir are replaced with consistent commands that pass the wrapper
args and the correct YAML path.

Comment on lines +32 to +51
global_vars:
hf_local: /hf-local/
hf_model: Qwen/Qwen3-8B

# Downloads model if needed, then runs huggingface_example.sh
task_0:
script: common/hf/ptq.sh
args:
- --repo <<global_vars.hf_model>>
- --local-dir <<global_vars.hf_local>><<global_vars.hf_model>>
- --
- --quant nvfp4
- --tasks quant
slurm_config:
_factory_: "slurm_factory"
nodes: 1
ntasks_per_node: 1
gpus_per_node: 1
time: "04:00:00"
container: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc7
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add required model metadata environment variables for this new model config.

This new model YAML does not define MLM_MODEL_CFG and QUANT_CFG, which are required by launcher conventions for new model configs.

As per coding guidelines, “Set MLM_MODEL_CFG environment variable… [and] set QUANT_CFG environment variable… when adding a new model config.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml` around lines 32 - 51, The
YAML is missing the required environment variables MLM_MODEL_CFG and QUANT_CFG
for this new model config; add them (e.g., under global_vars or inside task_0's
environment block) so the launcher can pick up model and quantization settings.
Specifically, set MLM_MODEL_CFG to the model config identifier for Qwen3-8B
(matching <<global_vars.hf_model>>/config name) and set QUANT_CFG to the chosen
quantization profile (e.g., "nvfp4" or a config name used by ptq.sh); update
either global_vars (preferred) or add an env: map under task_0 so MLM_MODEL_CFG
and QUANT_CFG are exported when script common/hf/ptq.sh runs.


import getpass
import os
import subprocess # nosec B404
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove # nosec Bandit bypasses or add a formal security exception workflow.

These suppressions bypass enforced security checks without documented justification/approval in this PR.

Suggested change
-import subprocess  # nosec B404
+import subprocess
...
-        subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)  # nosec B603 B607
+        subprocess.run(["git", "clean", "-xdf", "."], cwd=examples_dir, check=True)

As per coding guidelines, “Bandit security checks are enforced via pre-commit hooks. # nosec comments are not allowed as bypasses…”.

Also applies to: 95-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/launcher/launch.py` at line 33, The import line "import subprocess"
currently has a "# nosec B404" bypass; remove the "# nosec" comment and either
(a) eliminate the need for suppressing by using a safer API/pattern or
validating inputs before calling subprocess, or (b) add a formal, documented
security exception entry (approved in the security exceptions registry) and
reference that approval in a short inline comment; apply the same change to the
other occurrence of the "# nosec" on the "import subprocess" usage so no Bandit
bypasses remain in launch.py.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv merged commit 87ea8ba into main Apr 2, 2026
37 of 38 checks passed
@cjluo-nv cjluo-nv deleted the chenjiel/hf_ptq_computelab branch April 2, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants