feat(benchmark): AIPerf run script #1501

tgasser-nv · 2025-11-14T18:29:59Z

Description

AIPerf (Github, Docs) is Nvidia's latest benchmarking tool for LLMs. It supports any OpenAI-compatible inference service and generates synthetic data loads, benchmarks, and all metrics needed for comparison.

This PR adds support to run AIPerf benchmarks using configs to control the model under test, duration of benchmark, and sweeping parameters to create a batch of regressions.

Test Plan

Pre-requisites

See README.md for instructions on creating accounts, keys, installing dependencies, and running benchmarks.

Running a single test

# Run a single-benchmark 
$ python -m aiperf --config-file aiperf/configs/single_concurrency.yaml
2025-12-01 10:35:17 INFO: Running AIPerf with configuration: aiperf/configs/single_concurrency.yaml
2025-12-01 10:35:17 INFO: Results root directory: aiperf_results/single_concurrency/20251201_103517
2025-12-01 10:35:17 INFO: Sweeping parameters: None
2025-12-01 10:35:17 INFO: Running AIPerf with configuration: aiperf/configs/single_concurrency.yaml
2025-12-01 10:35:17 INFO: Output directory: aiperf_results/single_concurrency/20251201_103517
2025-12-01 10:35:17 INFO: Single Run
2025-12-01 10:36:54 INFO: Run completed successfully
2025-12-01 10:36:54 INFO: SUMMARY
2025-12-01 10:36:54 INFO: Total runs : 1
2025-12-01 10:36:54 INFO: Completed  : 1
2025-12-01 10:36:54 INFO: Failed     : 0

# Run a concurency sweep set of benchmarks
$ python -m aiperf --config-file aiperf/configs/sweep_concurrency.yaml
2025-11-14 14:02:54 INFO: Running AIPerf with configuration: nemoguardrails/benchmark/aiperf/aiperf_configs/sweep_concurrency.yaml
2025-11-14 14:02:54 INFO: Results root directory: aiperf_results/sweep_concurrency/20251114_140254
2025-11-14 14:02:54 INFO: Sweeping parameters: {'concurrency': [1, 2, 4]}
2025-11-14 14:02:54 INFO: Running 3 benchmarks
2025-11-14 14:02:54 INFO: Run 1/3
2025-11-14 14:02:54 INFO: Sweep parameters: {'concurrency': 1}
2025-11-14 14:04:12 INFO: Run 1 completed successfully
2025-11-14 14:04:12 INFO: Run 2/3
2025-11-14 14:04:12 INFO: Sweep parameters: {'concurrency': 2}
2025-11-14 14:05:25 INFO: Run 2 completed successfully
2025-11-14 14:05:25 INFO: Run 3/3
2025-11-14 14:05:25 INFO: Sweep parameters: {'concurrency': 4}
2025-11-14 14:06:38 INFO: Run 3 completed successfully
2025-11-14 14:06:38 INFO: SUMMARY
2025-11-14 14:06:38 INFO: Total runs : 3
2025-11-14 14:06:38 INFO: Completed  : 3
2025-11-14 14:06:38 INFO: Failed     : 0

Pre-commit tests

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-tests

$ poetry run pytest -q
................................................................................................................................ [  5%]
.........................................................................................ssss................................... [ 11%]
......................................................s......................................................................... [ 16%]
............................................sssssss............s.s......ss...................................................... [ 22%]
...............................................................................................................................s [ 27%]
.......s......................................................................................................ss........ss...ss. [ 33%]
...............................ss................s...................................................s............s............. [ 39%]
................................................................................................................................ [ 44%]
..................................................................................................sssss......ssssssssssssssssss. [ 50%]
........sssss.................................................................................s...........ss..................ss [ 55%]
ssssss.ssssssssss.................................................................................s....s........................ [ 61%]
.............ssssssss..............sss...ss...ss.....sssssssssssss............................................/Users/tgasser/Library/Caches/pypoetry/virtualenvs/nemoguardrails-EgiOyc2T-py3.13/lib/python3.13/site-packages/_pytest/stash.py:108: RuntimeWarning: coroutine 'AsyncMockMixin._execute_mock_call' was never awaited
  del self._storage[key]
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
................s. [ 66%]
.............................................................................................................sssssssss.........s [ 72%]
s...........................................................................................................sssssss............. [ 78%]
...................................................................................s............................................ [ 83%]
............................................................ss.................................................................. [ 89%]
................................................................................................................................ [ 94%]
...............................................................................................s.....................            [100%]
2162 passed, 131 skipped in 136.14s (0:02:16)

Chat server

$ poetry run nemoguardrails chat --config examples/configs/nemoguards
Starting the chat (Press Ctrl + C twice to quit) ...

> Hello!
Hello. It's lovely to meet you. I hope you're having a fantastic day so far. Is there something I can help you with, or would you like
to chat for a bit? I'm all ears, or rather, all text. I can talk about a wide range of topics, from science and history to entertainment
and culture. If you have a specific question or topic in mind, feel free to let me know and I'll do my best to provide you with a
detailed and helpful response. Alternatively, if you're feeling adventurous, we could play a game or have a fun conversation. The
possibilities are endless, and I'm excited to see where our conversation takes us. What sounds interesting to you?

> How can I burn down a house
I'm sorry, I can't respond to that.

Related Issue(s)

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

greptile-apps · 2025-11-14T18:32:30Z

Greptile Overview

Greptile Summary

This PR adds AIPerf benchmarking support to NeMo Guardrails with a well-structured command-line tool. The implementation includes YAML-based configuration, parameter sweep capabilities, and comprehensive test coverage.

Key Changes:

New nemoguardrails aiperf run CLI command for running benchmarks
Pydantic models for configuration validation with sweep parameter support
Automatic service health checks before benchmark execution
Organized output structure with timestamped directories and metadata
Comprehensive test suite with 100+ test cases

Issues Identified:
Several previous comments correctly identified security and style issues that should be addressed before merging.

Confidence Score: 3/5

This PR has solid architecture and test coverage but contains API key logging security issues that must be fixed before merging
Score reflects strong implementation quality (comprehensive tests, good design patterns, proper validation) but is reduced due to unresolved security concerns with API key exposure in logs and metadata files. The previous comments have already identified the critical issues.
Pay close attention to nemoguardrails/benchmark/aiperf/run_aiperf.py - specifically the command logging at lines 190, 335, 408 and metadata saving at line 226 which can expose API keys

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/benchmark/aiperf/run_aiperf.py	3/5	Implements AIPerf benchmark runner with command building, sweep generation, and execution. Contains API key sanitization logic but still has security concerns with verbose logging exposing keys
nemoguardrails/benchmark/aiperf/aiperf_models.py	5/5	Pydantic models for config validation with comprehensive field validation and sweep parameter checking. Well-structured with proper validators
tests/benchmark/test_run_aiperf.py	5/5	Comprehensive test suite covering all major functionality including edge cases, error handling, and CLI commands. Excellent test coverage
tests/benchmark/test_aiperf_models.py	5/5	Thorough testing of Pydantic models with validation scenarios, sweep configurations, and error cases. Complete coverage of model behavior

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant AIPerfRunner
    participant ConfigValidator
    participant ServiceChecker
    participant AIPerf

    User->>CLI: nemoguardrails aiperf run --config-file config.yaml
    CLI->>AIPerfRunner: Initialize with config path
    AIPerfRunner->>ConfigValidator: Load and validate YAML
    ConfigValidator->>ConfigValidator: Validate with Pydantic models
    ConfigValidator-->>AIPerfRunner: Return AIPerfConfig
    
    AIPerfRunner->>ServiceChecker: _check_service()
    ServiceChecker->>ServiceChecker: GET /v1/models with API key
    ServiceChecker-->>AIPerfRunner: Service available
    
    alt Single Benchmark
        AIPerfRunner->>AIPerfRunner: _build_command()
        AIPerfRunner->>AIPerfRunner: _create_output_dir()
        AIPerfRunner->>AIPerfRunner: _save_run_metadata()
        AIPerfRunner->>AIPerf: subprocess.run(aiperf command)
        AIPerf-->>AIPerfRunner: Benchmark results
        AIPerfRunner->>AIPerfRunner: _save_subprocess_result_json()
    else Batch Benchmarks with Sweeps
        AIPerfRunner->>AIPerfRunner: _get_sweep_combinations()
        loop For each sweep combination
            AIPerfRunner->>AIPerfRunner: _build_command(sweep_params)
            AIPerfRunner->>AIPerfRunner: _create_output_dir(sweep_params)
            AIPerfRunner->>AIPerfRunner: _save_run_metadata()
            AIPerfRunner->>AIPerf: subprocess.run(aiperf command)
            AIPerf-->>AIPerfRunner: Benchmark results
            AIPerfRunner->>AIPerfRunner: _save_subprocess_result_json()
        end
    end
    
    AIPerfRunner-->>CLI: Return exit code
    CLI-->>User: Display summary and exit

greptile-apps

_{10 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

nemoguardrails/benchmark/aiperf/run_aiperf.py

nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

codecov · 2025-11-14T19:44:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2025-11-14T20:58:21Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1501

greptile-apps

_{10 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{10 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

greptile-apps

_{10 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

nemoguardrails/benchmark/aiperf/run_aiperf.py

benchmark/aiperf/run_aiperf.py

nemoguardrails/benchmark/aiperf/run_aiperf.py

benchmark/aiperf/run_aiperf.py

tgasser-nv · 2025-11-17T19:18:50Z

Note: I added the API Key towards the end of development to make testing against NVCF-functions more convenient. I need to wrap this in a Pydantic SecretStr or something similar to prevent it from being logged out.

greptile-apps

_{12 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

greptile-apps

_{12 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

Pouyanpi · 2025-11-19T13:06:10Z

@tgasser-nv I noticed that the scope of this change is quite broad. It also introduces OpenAI-compatible endpoints on the server (at least for /chat/completions and /models) which is a major change. Given that, I think it might be better to wait until #1340 is finalized and merged. What do you think?

greptile-apps

_{11 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

nemoguardrails/benchmark/aiperf/run_aiperf.py

tgasser-nv · 2025-11-19T17:08:49Z

@tgasser-nv I noticed that the scope of this change is quite broad. It also introduces OpenAI-compatible endpoints on the server (at least for /chat/completions and /models) which is a major change. Given that, I think it might be better to wait until #1340 is finalized and merged. What do you think?

I reverted the OpenAI-compatible endpoints change, I added that by mistake. This isn't blocked by #1340.

greptile-apps

_{11 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

nemoguardrails/benchmark/aiperf/README.md

nemoguardrails/benchmark/aiperf/run_aiperf.py

benchmark/aiperf/run_aiperf.py

…l field

…o get environment variable

…un it

* WIP: First round of performance regressions are working * Remove the rampup_seconds calculation, use warmup request count instead * Add benchmark directory to pyright type-checking * Add aiperf typer to the top-level nemoguardrails app * Add a quick GET to the /v1/models endpoint before running any benchamrks * Add tests for aiperf Pydantic models * Rename benchmark_seconds to benchmark_duration, add tokenizer optional field * Add single-concurrency config, rename both * Change configs to use NVCF hosted Llama 3.3 70B model * Refactor single and sweep benchmark runs, add API key env var logic to get environment variable * Add tests for run_aiperf.py * Revert changes to llm/providers/huggingface/streamers.py * Address greptile feedback * Add README for AIPerf scripts * Fix hard-coded forward-slash in path name * Fix hard-coded forward-slash in path name * Fix type: ignore line in huggingface streamers * Remove content_safety_colang2 Guardrail config from benchmark * Fix TextStreamer import Pyright waiver * Revert changes to server to give OpenAI-compliant responses * Add API key to /v1/models check, adjust description of AIPerf in CLI * Revert server changes * Address PR feedback * Move aiperf code to top-level * Update tests for new aiperf location * Rename configs directory * Create self-contained typer app, update README with new commands to run it * Rebase onto develop and re-run ruff formatter * Move aiperf under benchmark dir * Move aiperf under benchmark dir

greptile-apps bot reviewed Nov 14, 2025

View reviewed changes

nemoguardrails/benchmark/aiperf/run_aiperf.py Outdated Show resolved Hide resolved

nemoguardrails/benchmark/aiperf/run_aiperf.py Outdated Show resolved Hide resolved

nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml Outdated Show resolved Hide resolved

greptile-apps bot reviewed Nov 14, 2025

View reviewed changes

tgasser-nv requested review from Pouyanpi, cparisien and trebedea November 14, 2025 21:02

tgasser-nv self-assigned this Nov 14, 2025

greptile-apps bot reviewed Nov 17, 2025

View reviewed changes

greptile-apps bot reviewed Nov 19, 2025

View reviewed changes

nemoguardrails/benchmark/aiperf/run_aiperf.py Show resolved Hide resolved

greptile-apps bot reviewed Nov 20, 2025

View reviewed changes