diffusion model: support stable-diffusion-3-medium-diffusers #13422

IPostYellow · 2025-11-17T10:51:54Z

Motivation

This PR introduces support for Stable Diffusion 3 Medium (stabilityai/stable-diffusion-3-medium-diffusers) text-to-image (t2i) generation in SGLang.

run with cli:

sglang generate --model-path=/your path/stabilityai/stable-diffusion-3-medium-diffusers    --prompt='A dreamy twilight scene of a small village floating among soft clouds, its rooftops adorned with glowing iridescent tiles that shimmer in hues of pearl and lavender. The winding streets are paved with translucent crystal, reflecting the warm glow of lanterns shaped like hot air balloons drifting gently into the sky. In the distance, layered floating mountains rise into the atmosphere, crowned with an ancient library made of marble and stained glass, where fluttering pages transform into flocks of luminous birds. In the foreground, a massive cherry blossom tree stretches across the frame, its petals falling like stardust, trailing soft light as they drift downward. The art style blends the hand-painted charm of Studio Ghibli with the refined lighting and depth of digital painting—vibrant yet ethereal colors, delicate linework, and a sense of quiet wonder. No people present, evoking serenity, magic, and infinite imagination.'  --width=720 --height=720 --save-output --dit-cpu-offload false --text-encoder-cpu-offload false --image-encoder-cpu-offload false --vae-cpu-offload false --pin-cpu-memory false

Output:

or starts a model inference server and generates an image via API call.
Start the server:

sglang serve --model-path /your path/stabilityai/stable-diffusion-3-medium-diffusers  --num-gpus 2 --tp-size 2 --save-output --dit-cpu-offload false --text-encoder-cpu-offload false --image-encoder-cpu-offload false --vae-cpu-offload false --pin-cpu-memory false

Send generation request:

import requests

url = "http://localhost:3000/v1/images/generations"
data = {
    'prompt': 'A curious raccoon',
    'size':'720x720',
    "output_format": "png",
    "response_format": "b64_json"
}
headers = {
    "Content-Type": "application/json"
}
response = requests.post(url,headers=headers,json=data)
print("Status Code:", response.status_code)
print("Response Body:", response.text)

Output:

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Work with maintainers to merge your PR. See the PR Merge Process

…ediffusion3medium

fix spelling error Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…edium' into support_stablediffusion3medium

[MultiModal]Support stable-diffusion-3-medium-diffusers

fix bug

…tablediffusion3medium

fix bug

…edium' into support_stablediffusion3medium

Format code with pre-commit

gemini-code-assist · 2025-11-17T10:52:18Z

Summary of Changes

Hello @IPostYellow, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Stable Diffusion 3 Medium into SGLang, expanding its multimodal generation capabilities to include state-of-the-art text-to-image synthesis. The changes involve adding new model configurations, implementing a dedicated pipeline, and adapting core runtime components to support SD3's complex multi-text-encoder architecture and VAE processing, ensuring seamless operation and high-quality image generation.

Highlights

Stable Diffusion 3 Medium Support: This pull request introduces comprehensive support for Stable Diffusion 3 Medium (SD3) text-to-image generation within the SGLang framework, enabling users to leverage this advanced model for image synthesis.
New Configuration Files: Dedicated configuration files have been added for the SD3 Transformer (DiT), VAE, and the overall pipeline, defining their specific architectures, parameters, and operational settings.
Multi-Text-Encoder Integration: The system now supports SD3's unique architecture, which utilizes three distinct text encoders (two CLIP and one T5) for processing prompts, with corresponding adjustments in text encoding and conditioning stages.
Runtime Stage Enhancements: Modifications were made to the conditioning, latent_preparation, and text_encoding pipeline stages to correctly handle SD3's multi-encoder outputs, VAE scaling factors, and specific tokenizer settings.
Dynamic VAE Loading: The VAE component loader has been updated to intelligently prioritize fp16 or full precision safetensors files when loading the SD3 VAE, optimizing for performance and compatibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for Stable Diffusion 3 Medium for text-to-image generation. The changes are comprehensive, touching configuration, model implementation, and pipeline stages. The implementation correctly handles the three text encoders required by SD3. I've identified a few issues, including a bug in the SD3 transformer's forward pass return value and some brittle file loading logic. Additionally, I've provided suggestions to improve code clarity and maintainability by removing dead code and simplifying some expressions. Overall, this is a great contribution.

gemini-code-assist · 2025-11-17T11:02:01Z

python/sglang/multimodal_gen/runtime/loader/component_loader.py

+        if isinstance(server_args.pipeline_config, StableDiffusion3PipelineConfig):
+            precision = server_args.pipeline_config.vae_precision
+            base_name = "diffusion_pytorch_model"
+
+            # Priority: fp16 > full precision > any matching file
+            if precision == "fp16":
+                fp16_path = os.path.join(
+                    str(model_path), f"{base_name}.fp16.safetensors"
+                )
+                target_files = [fp16_path] if os.path.exists(fp16_path) else []
+            else:
+                full_path = os.path.join(str(model_path), f"{base_name}.safetensors")
+                target_files = [full_path] if os.path.exists(full_path) else []
+            safetensors_list = target_files


The current logic for finding the VAE's safetensors file is brittle. If the specific precision file (.fp16.safetensors or .safetensors) is not found, it results in an empty list, which will cause the assertion on line 491 to fail with a generic message. The comment on line 480 suggests a priority-based fallback, which is not fully implemented. I suggest a more robust implementation that correctly applies the priority and provides a better fallback.

if isinstance(server_args.pipeline_config, StableDiffusion3PipelineConfig): precision = server_args.pipeline_config.vae_precision base_name = "diffusion_pytorch_model" # Priority: fp16 > full precision > any matching file fp16_path = os.path.join(str(model_path), f"{base_name}.fp16.safetensors") full_path = os.path.join(str(model_path), f"{base_name}.safetensors") if precision == "fp16" and os.path.exists(fp16_path): safetensors_list = [fp16_path] elif os.path.exists(full_path): safetensors_list = [full_path] elif os.path.exists(fp16_path): safetensors_list = [fp16_path] else: # Fallback to any safetensors file if specific ones are not found safetensors_list = glob.glob(os.path.join(str(model_path), f"{base_name}*.safetensors"))

gemini-code-assist · 2025-11-17T11:02:02Z

python/sglang/multimodal_gen/runtime/models/dits/stablediffusion3.py

+        if not return_dict:
+            return (output,)
+
+        return output


When return_dict is True, the function should return a Transformer2DModelOutput object, but it currently returns a raw tensor. This can lead to AttributeError if the caller expects an object with a .sample attribute. Please wrap the output tensor in Transformer2DModelOutput.

Suggested change

if not return_dict:

return (output,)

return output

if not return_dict:

return (output,)

return Transformer2DModelOutput(sample=output)

gemini-code-assist · 2025-11-17T11:02:02Z

python/sglang/multimodal_gen/runtime/models/registry.py

 _IMAGE_ENCODER_MODELS: dict[str, tuple] = {
    # "HunyuanVideoTransformer3DModel": ("image_encoder", "hunyuanvideo", "HunyuanVideoImageEncoder"),
    "CLIPVisionModelWithProjection": ("encoders", "clip", "CLIPVisionModel"),
+    "CLIPTextModelWithProjection": ("encoders", "clip", "CLIPTextModel"),


The variable _IMAGE_ENCODER_MODELS is misleading as it now contains a text model (CLIPTextModelWithProjection). To improve code clarity and maintainability, consider renaming it to something more generic, such as _ENCODER_MODELS.

gemini-code-assist · 2025-11-17T11:02:02Z

python/sglang/multimodal_gen/runtime/pipelines/stages/conditioning.py

+            # if batch.do_classifier_free_guidance:
+            #     prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
+            #     pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds], dim=0)
+            #     batch.prompt_embeds = [prompt_embeds]
+            #     batch.pooled_embeds = [pooled_prompt_embeds]
+


This block of commented-out code appears to be dead code. Please remove it to improve code clarity.

gemini-code-assist · 2025-11-17T11:02:02Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/latent_preparation.py

+            vae_scale_factor = (
+                server_args.pipeline_config.vae_config.get_vae_scale_factor()
+                if server_args.pipeline_config.vae_config.get_vae_scale_factor()
+                else 8
+            )


This expression is a bit verbose and calls get_vae_scale_factor() twice. It can be simplified for better readability and to avoid the redundant call.

scale_factor = server_args.pipeline_config.vae_config.get_vae_scale_factor() vae_scale_factor = scale_factor or 8

fix bug & add test

remove no used package

mickqian · 2025-11-17T12:13:32Z

awesome job, thanks! we'll get back to this PR once necessary CI tests and refactors are added

Remove unnecessary comments

Format code with pre-commit

Compatible with the latest code

…on3medium_fn2 # Conflicts: # python/sglang/multimodal_gen/configs/pipeline_configs/__init__.py # python/sglang/multimodal_gen/configs/pipeline_configs/stablediffusion3.py # python/sglang/multimodal_gen/registry.py # python/sglang/multimodal_gen/runtime/pipelines_core/stages/conditioning.py # python/sglang/multimodal_gen/runtime/pipelines_core/stages/text_encoding.py

IPostYellow · 2025-11-25T02:07:45Z

awesome job, thanks! we'll get back to this PR once necessary CI tests and refactors are added
Hi , thanks for the feedback!
Just wanted to let you know that:
--

All required CI tests are now passing
I've merged the latest architectural changes from main into this PR
The branch is ready for review whenever you have time

Let me know if there's anything specific you'd like me to address!
Thanks for your time.

mickqian · 2025-11-27T03:10:32Z

python/sglang/multimodal_gen/test/cli/test_generate_t2i_perf.py

 logger = init_logger(__name__)


+class TestStableDiffusionT2Image(TestGenerateBase):


Sorry, cli test is deprecated. Could we add it to test_server_a.py? Thanks

IPostYellow and others added 15 commits November 14, 2025 18:12

support stablediffusion3-medium-diffusers model for t2i

cbc4a75

Merge remote-tracking branch 'ant_sglang_out/main' into support_stabl…

10eebdf

…ediffusion3medium

Update python/sglang/multimodal_gen/configs/pipelines/__init__.py

cf47623

fix spelling error Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix spell error

94b3aa0

Merge remote-tracking branch 'my_ant_sglang/support_stablediffusion3m…

5681927

…edium' into support_stablediffusion3medium

Improve robustness

c3f4cb4

Merge pull request #7 from IPostYellow/support_stablediffusion3medium

eba1393

[MultiModal]Support stable-diffusion-3-medium-diffusers

fix bug

25d44b9

Merge pull request #8 from IPostYellow/support_stablediffusion3medium

34ef035

fix bug

fix spell error

3da829b

Merge branch 'antgroup:support_stablediffusion3medium' into support_s…

87b921e

…tablediffusion3medium

Merge pull request #9 from IPostYellow/support_stablediffusion3medium

99ddb54

fix bug

Format code with pre-commit

a53b508

Merge remote-tracking branch 'my_ant_sglang/support_stablediffusion3m…

7daad26

…edium' into support_stablediffusion3medium

Merge pull request #10 from IPostYellow/support_stablediffusion3medium

18c99df

Format code with pre-commit

IPostYellow requested a review from mickqian as a code owner November 17, 2025 10:51

IPostYellow changed the title ~~[MultiModal]Support stable-diffusion-3-medium-diffusers for t2i~~ [MultiModal]Support stable-diffusion-3-medium-diffusers Nov 17, 2025

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

huangtingwei9988 added the run-ci label Nov 17, 2025

IPostYellow and others added 5 commits November 17, 2025 19:41

fix bug & add test

7a9ae53

fix bug & add test

a5b2457

Merge pull request #11 from IPostYellow/support_stablediffusion3medium

259bd6a

fix bug & add test

remove no used package

4ea55fb

Merge pull request #12 from IPostYellow/support_stablediffusion3medium

9e01161

remove no used package

IPostYellow and others added 4 commits November 17, 2025 23:12

Remove unnecessary comments

dad83c3

Merge pull request #13 from IPostYellow/support_stablediffusion3medium

e438261

Remove unnecessary comments

Format code with pre-commit

c2bed17

Merge pull request #14 from IPostYellow/support_stablediffusion3medium

752c24e

Format code with pre-commit

IPostYellow and others added 7 commits November 18, 2025 08:11

Merge branch 'sgl-project:main' into support_stablediffusion3medium

ce14580

Merge branch 'main' into support_stablediffusion3medium

14261db

Compatible with the latest code

47e9857

Compatible with the latest code

05e20cf

Compatible with the latest code

Merge branch 'main' into support_stablediffusion3medium

00d08db

Merge branch 'main' into support_stablediffusion3medium

ff01f6e

Merge branch 'main' into support_stablediffusion3medium

713cc29

mickqian changed the title ~~[MultiModal]Support stable-diffusion-3-medium-diffusers~~ diffusion model: support stable-diffusion-3-medium-diffusers Nov 18, 2025

IPostYellow and others added 6 commits November 18, 2025 20:55

Merge branch 'main' into support_stablediffusion3medium

6a151d0

Merge branch 'main' into support_stablediffusion3medium

9cdf9a9

resolve conflicts

418218c

Format code with pre-commit && fix bug

4720542

Format code with pre-commit && fix bug

8a52f14

github-actions bot added the diffusion SGLang Diffusion label Nov 21, 2025

mickqian reviewed Nov 27, 2025

View reviewed changes

IPostYellow force-pushed the support_stablediffusion3medium branch from ca55334 to 8a52f14 Compare November 27, 2025 06:02

Merge branch 'main' into support_stablediffusion3medium

c13b353

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

diffusion model: support stable-diffusion-3-medium-diffusers #13422

diffusion model: support stable-diffusion-3-medium-diffusers #13422

Uh oh!

IPostYellow commented Nov 17, 2025

Uh oh!

gemini-code-assist bot commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

mickqian commented Nov 17, 2025

Uh oh!

IPostYellow commented Nov 25, 2025

Uh oh!

mickqian Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		logger = init_logger(__name__)


		class TestStableDiffusionT2Image(TestGenerateBase):

diffusion model: support stable-diffusion-3-medium-diffusers #13422

Are you sure you want to change the base?

diffusion model: support stable-diffusion-3-medium-diffusers #13422

Uh oh!

Conversation

IPostYellow commented Nov 17, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian commented Nov 17, 2025

Uh oh!

IPostYellow commented Nov 25, 2025

Uh oh!

mickqian Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants