UPSTREAM PR #1274: feat: option to enable vae tiling automatically for large images by loci-dev · Pull Request #60 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-02-18T04:21:23Z

Note

Source pull request: leejet/stable-diffusion.cpp#1274

This allows controlling the use of VAE tiling according to the requested image size: tiling will be enabled only for images larger than a given threshold.

Koboldcpp has this same feature. It's mainly useful when the size isn't known at launch time, like in the sd-server, and when the size comes from an input image.

The size is specified by a square image side: --vae-tiling 768 means enabling tiling for images larger than 768x768 - mainly because it's easy to test a NxN image, and more meaningful than something like 'megapixels'. Another possibility could be specifying the size with a string like "768x768". A memory threshold would arguably be better, but harder to implement, since the exact usage depends a lot on the specific backend and flags.

To keep it compatible with current usage, and still avoid an extra flag, the value is optional: --vae-tiling alone is the same as --vae-tiling 1. --vae-tiling 0 also means no tiling.

loci-review · 2026-02-18T05:18:26Z

Overview

Analysis of 48,340 functions across two Stable Diffusion C++ binaries reveals modest performance changes between versions. Modified functions: 102 (0.21%), new functions: 43, removed: 0, unchanged: 48,195 (99.7%).

Power Consumption:

build.bin.sd-server: 515,491 nJ → 518,799 nJ (+0.64%)
build.bin.sd-cli: 480,110 nJ → 483,356 nJ (+0.68%)

Function Analysis

The single intentional code change—VAE tiling threshold feature—shows significant improvements:

SDContextParams::operator() (both binaries): Response time improved 62% (6,757ns → 2,551ns), enabling adaptive VAE tiling based on image dimensions rather than fixed boolean flags

Most performance changes occur in C++ standard library functions with no source modifications:

Regressions:

std::vector::begin (sd-cli): +289% throughput time (+181ns), likely from compiler inlining changes
_M_const_cast (sd-server): +284% throughput time (+182ns), Red-Black tree iterator operations
make_move_iterator (sd-cli): +216% throughput time (+169ns), move semantics utility
__negate (sd-server): +190% throughput time (+176ns), HTTP validation predicate
_M_destroy (sd-cli): +180% throughput time (+189ns), RMSNorm shared_ptr cleanup
ggml_time_us (sd-cli): +113% throughput time (+141ns), timing utility overhead doubled

Improvements:

std::vector::end (sd-cli): -75% throughput time (-183ns), better iterator optimization
make_move_iterator (sd-server): -68% throughput time (-169ns), improved code generation

Other analyzed functions showed minor changes in error handling and JSON deserialization paths.

Additional Findings

Performance changes stem primarily from compiler/toolchain differences rather than application code. The VAE tiling enhancement provides intelligent memory management for large image generation while avoiding overhead on small images, justifying the minimal 0.64-0.68% power increase. RMSNorm destruction regression may accumulate during model cleanup but occurs outside core inference loops. Most regressions affect supporting infrastructure (STL containers, timing utilities) with sub-microsecond absolute impacts.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

feat: option to enable vae tiling automatically for large images

e4a2256

loci-dev temporarily deployed to stable-diffusion-cpp-prod February 18, 2026 04:21 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 74d69ae to 10ea7dd Compare February 20, 2026 04:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

UPSTREAM PR #1274: feat: option to enable vae tiling automatically for large images#60

UPSTREAM PR #1274: feat: option to enable vae tiling automatically for large images#60
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1274-sd_vae_tiling_threshold

loci-dev commented Feb 18, 2026

Uh oh!

loci-review bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

loci-dev commented Feb 18, 2026

Uh oh!

loci-review bot commented Feb 18, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants