Skip to content

Comments

UPSTREAM PR #1270: feat: accept legacy image parameter on v1/images/edits#64

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1270-sd_openai_legacy_image
Open

UPSTREAM PR #1270: feat: accept legacy image parameter on v1/images/edits#64
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1270-sd_openai_legacy_image

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1270

The OpenAI API originally supported only a single image, passed through 'image' instead of 'image[]'. 'Old' OpenAI modules (as recent as 1.69 from Debian stable), and probably other clients, send edit requests in this format.

The OpenAI API originally supported only a single image, passed
through 'image' instead of 'image[]'. 'Old' OpenAI modules (as
recent as 1.69 from Debian stable), and probably other clients,
send edit requests in this format.
@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod February 20, 2026 04:17 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Feb 20, 2026

Overview

Analysis of stable-diffusion.cpp following commit 36b3937 ("feat: accept legacy image parameter on v1/images/edits") reveals minimal performance impact. Of 48,313 total functions, 55 were modified (0.11%), with 0 new or removed functions.

Binaries analyzed:

  • build.bin.sd-server: 515,491.29 nJ → 519,006.76 nJ (+0.682%)
  • build.bin.sd-cli: 480,109.60 nJ → 483,483.03 nJ (+0.703%)

The single-file change adds backward compatibility for legacy API parameters in HTTP request parsing, isolated from inference paths.

Function Analysis

Standard Library Regressions:

  • std::swap<_Prime_rehash_policy>: +81ns throughput (+111%), +81ns response (+81%) — hash table rehashing utility
  • std::shared_ptr::operator= (GGMLBlock, VAE variants): +80ns throughput (+103%), +80ns response (+8%) — smart pointer assignments during model initialization
  • _M_allocate_buckets: +68ns throughput (+62%), +68ns response (+26%) — hash table bucket allocation
  • nlohmann::json::create: +133ns throughput (+54%), +133ns response (+5%) — JSON object allocation

Standard Library Improvements:

  • std::make_move_iterator: -169ns throughput (-68%), -169ns response (-59%)
  • __normal_iterator::operator+: -69ns throughput (-48%), -69ns response (-42%)

Quantization Function:

  • make_block_q4_Kx8: +642ns throughput (+8%), +642ns response (+8%) — quantized block repacking during model loading

All regressions occur in standard library functions with no source code changes, suggesting compiler optimization differences. Absolute timing changes remain under 1 microsecond. HTTP processing functions show throughput increases but negligible response time impact, confirming isolation from critical paths.

Additional Findings

The code change affects only HTTP request parsing (examples/server/main.cpp), adding one boolean check and conditional file retrieval. No changes to inference pipeline, GPU operations, quantization algorithms, or ML workloads. The make_block_q4_Kx8 regression affects model loading (one-time cost) rather than inference. For a model with 10,000 quantized blocks, cumulative impact is 6.4ms, negligible compared to typical loading times (5-30 seconds). All performance-critical paths (attention mechanisms, convolutions, VAE operations) remain unchanged.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants