Skip to content

Comments

UPSTREAM PR #1156: fix: sanitize LoRA paths and enable dynamic loading#43

Open
loci-dev wants to merge 6 commits intomainfrom
loci/pr-1156-master
Open

UPSTREAM PR #1156: fix: sanitize LoRA paths and enable dynamic loading#43
loci-dev wants to merge 6 commits intomainfrom
loci/pr-1156-master

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Feb 2, 2026

Note

Source pull request: leejet/stable-diffusion.cpp#1156

  • Implement sanitize_lora_path in SDGenerationParams to prevent directory traversal attacks via LoRA tags in prompts.
  • Restrict LoRA paths to be relative and strictly within the configured LoRA directory (no subdirectories allowed, optional? drawback: users cannot organize their LoRAs into subfolders).
  • Update server example to pass lora_model_dir to process_and_check, enabling LoRA extraction from prompts.
  • Force LORA_APPLY_AT_RUNTIME in the server to allow applying LoRAs dynamically per request without reloading the model and avoiding weight accumulation.

- Implement `sanitize_lora_path` in `SDGenerationParams` to prevent directory traversal attacks via LoRA tags in prompts.
- Restrict LoRA paths to be relative and strictly within the configured LoRA directory (no subdirectories allowed, optional? drawback: users cannot organize their LoRAs into subfolders.).
- Update server example to pass `lora_model_dir` to `process_and_check`, enabling LoRA extraction from prompts.
- Force `LORA_APPLY_AT_RUNTIME` in the server to allow applying LoRAs dynamically per request without reloading the model.
- Remove the restriction that LoRA models must be in the root of the LoRA directory, allowing them to be organized in subfolders.
- Refactor the directory containment check to use `std::mismatch` instead of `lexically_relative` to verify the path is inside the allowed root.
- Remove redundant `lexically_normal()` call when resolving file extensions.
@loci-dev loci-dev force-pushed the main branch 22 times, most recently from f99a420 to a234621 Compare February 4, 2026 04:39
@loci-review
Copy link

loci-review bot commented Feb 6, 2026

Overview

Analysis of 48,102 functions (100 modified, 10 new, 4 removed) across two binaries reveals minimal performance impact from security enhancements. Power consumption: build.bin.sd-server decreased 0.06% (512,975.76 nJ → 512,668.64 nJ), build.bin.sd-cli increased 0.1% (479,167.23 nJ → 479,645.75 nJ).

Function Analysis

extract_and_remove_lora (both binaries): Response time increased 21.8% (+49.5μs) due to new sanitize_lora_path() security validation preventing path traversal attacks. Throughput time remained constant (+0.6%), confirming overhead is in filesystem validation calls, not core logic. This is a justified security-correctness tradeoff for server deployments, with negligible impact since it's called once per generation request during initialization.

Standard library regressions (compiler/toolchain differences, no source changes): std::deque::back +243% throughput (+194ns), std::to_string(int) +121% throughput (+197ns), mz_zip_get_file_modified_time +130% throughput (+99ns), _S_max_size +206% throughput (+212ns). All occur in non-critical initialization or utility paths.

Standard library improvements: std::vector<ggml_tensor*> copy constructor -34% throughput (-74ns), ggml_e8m0_to_fp32_half -24% throughput (-35ns), std::to_string(long) -44% throughput (-133ns), benefiting tensor operations and memory management.

Other analyzed functions showed negligible changes in non-critical paths.

Additional Findings

Core ML inference pipeline (diffusion sampling, attention mechanisms, VAE operations) remains unaffected. The 5 commits focused on "sanitize LoRA paths and enable dynamic loading" successfully implement security hardening with <0.01% impact on end-to-end image generation time (5-30 seconds). Compiler optimizations offset security overhead, resulting in near-zero net power consumption change.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod February 22, 2026 04:18 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Feb 22, 2026

Overview

Analysis of 48,312 functions across two stable diffusion inference binaries reveals minimal overall performance impact despite 99 modified functions. Power consumption shows negligible changes: build.bin.sd-server decreased 0.062% (518,798.18 nJ → 518,475.41 nJ) and build.bin.sd-cli decreased 0.029% (483,665.30 nJ → 483,523.85 nJ). Ten new functions were added and four removed, with 48,199 functions unchanged.

Function Analysis

All significant performance changes occur in C++ standard library functions rather than application code, indicating compiler optimization differences between builds:

Regressions:

  • std::vector<ggml_backend_feature>::begin() (build.bin.sd-server): +180.81ns response time (+217%), +180.81ns throughput (+289%)
  • std::vector<ggml_backend_device*>::_S_max_size() (build.bin.sd-server): +212.41ns response time (+152%), +212.41ns throughput (+206%)
  • std::swap<nlohmann::json> (build.bin.sd-cli): +76.13ns response time (+76%), +76.13ns throughput (+106%)
  • std::less<void>::operator() variants (build.bin.sd-server): +44ns response time (+30%), +44ns throughput (+69%) each

Improvements:

  • std::vector<gguf_kv>::end() (build.bin.sd-server): -183.29ns response time (-69%), -183.29ns throughput (-75%)
  • std::vector<std::thread>::end() (build.bin.sd-cli): -183.29ns response time (-69%), -183.29ns throughput (-75%)
  • std::shared_ptr<T5CLIPEmbedder>::_M_destroy() (build.bin.sd-cli): -189.00ns response time (-38%), -188.71ns throughput (-64%)

No application source code changes were detected for any analyzed functions. Performance variations stem from GCC 13 standard library implementation differences or compiler optimization flag changes. The balanced improvements and regressions result in negligible net impact, confirmed by sub-0.1% power consumption changes.

Additional Findings

All analyzed functions affect initialization, memory management, and utility operations rather than compute-intensive inference paths. String comparison regressions impact model loading (tensor name lookups during GGUF parsing), but cumulative overhead remains under 2ms for typical models. Core ML operations (GGML tensor kernels, attention mechanisms, VAE processing) execute in backend-specific implementations not included in this analysis, explaining why standard library changes have minimal impact on overall performance.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants