Skip to content

Commit 6c21efc

Browse files
authored
feat: support cache for ovis-image (#530)
* support cache for ovis-image * support cache for ovis-image * support cache for ovis-image * support cache for ovis-image
1 parent 814d792 commit 6c21efc

File tree

5 files changed

+142
-4
lines changed

5 files changed

+142
-4
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,9 @@ The comparison between **cache-dit** and other algorithms shows that within a sp
114114

115115
| 📚Model | Cache | CP | TP | 📚Model | Cache | CP | TP |
116116
|:---|:---|:---|:---|:---|:---|:---|:---|
117+
| **🔥[Z-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✔️🔥 | ✔️🔥 | **🔥[Ovis-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✖️ | ✖️ |
117118
| **🔥[FLUX.2: 56B](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✔️🔥 | ✔️🔥 | **🔥[HuyuanVideo 1.5](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✖️ | ✖️ |
118-
| **🔥[Z-Image-Turbo](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✔️🔥 | ✔️🔥 | **🎉[FLUX.1 `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
119-
| **🎉[FLUX.1](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[FLUX.1-Fill `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
119+
| **🎉[FLUX.1](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[FLUX.1 `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
120120
| **🎉[FLUX.1-Fill](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[Qwen-Image `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
121121
| **🎉[Qwen-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[Qwen...Edit `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
122122
| **🎉[Qwen...Edit](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[Qwen...E...Plus `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |

docs/User_Guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,9 @@ Currently, **cache-dit** library supports almost **Any** Diffusion Transformers
8181

8282
| 📚Model | Cache | CP | TP | 📚Model | Cache | CP | TP |
8383
|:---|:---|:---|:---|:---|:---|:---|:---|
84+
| **🔥[Z-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✔️🔥 | ✔️🔥 | **🔥[Ovis-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✖️ | ✖️ |
8485
| **🔥[FLUX.2: 56B](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✔️🔥 | ✔️🔥 | **🔥[HuyuanVideo 1.5](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✖️ | ✖️ |
85-
| **🔥[Z-Image-Turbo](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️🔥 | ✔️🔥 | ✔️🔥 | **🎉[FLUX.1 `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
86-
| **🎉[FLUX.1](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[FLUX.1-Fill `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
86+
| **🎉[FLUX.1](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[FLUX.1 `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
8787
| **🎉[FLUX.1-Fill](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[Qwen-Image `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
8888
| **🎉[Qwen-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[Qwen...Edit `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
8989
| **🎉[Qwen...Edit](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✔️ | **🎉[Qwen...E...Plus `Q`](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline)** | ✔️ | ✔️ | ✖️ |
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
import os
2+
import sys
3+
4+
sys.path.append("..")
5+
6+
import time
7+
import torch
8+
from diffusers import OvisImagePipeline, OvisImageTransformer2DModel
9+
from utils import get_args, strify, cachify, MemoryTracker, create_profiler_from_args
10+
import cache_dit
11+
12+
args = get_args()
13+
print(args)
14+
15+
pipe = OvisImagePipeline.from_pretrained(
16+
(
17+
args.model_path
18+
if args.model_path is not None
19+
else os.environ.get(
20+
"OVIS_IMAGE_DIR",
21+
"AIDC-AI/Ovis-Image-7B",
22+
)
23+
),
24+
torch_dtype=torch.bfloat16,
25+
)
26+
27+
if args.cache:
28+
cachify(args, pipe)
29+
30+
assert isinstance(pipe.transformer, OvisImageTransformer2DModel)
31+
if args.quantize:
32+
pipe.transformer = cache_dit.quantize(
33+
pipe.transformer,
34+
quant_type=args.quantize_type,
35+
exclude_layers=[
36+
"embedder",
37+
"embed",
38+
],
39+
)
40+
pipe.text_encoder = cache_dit.quantize(
41+
pipe.text_encoder,
42+
quant_type=args.quantize_type,
43+
)
44+
45+
pipe.to("cuda")
46+
47+
if args.attn is not None:
48+
if hasattr(pipe.transformer, "set_attention_backend"):
49+
pipe.transformer.set_attention_backend(args.attn)
50+
print(f"Set attention backend to {args.attn}")
51+
52+
53+
if args.compile:
54+
cache_dit.set_compile_configs()
55+
pipe.transformer = torch.compile(pipe.transformer)
56+
pipe.text_encoder = torch.compile(pipe.text_encoder)
57+
pipe.vae = torch.compile(pipe.vae)
58+
59+
60+
prompt = 'A creative 3D artistic render where the text "OVIS-IMAGE" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist\'s canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail.'
61+
if args.prompt is not None:
62+
prompt = args.prompt
63+
64+
65+
def run_pipe():
66+
steps = args.steps if args.steps is not None else 28
67+
if args.profile and args.steps is None:
68+
steps = 3
69+
image = pipe(
70+
prompt,
71+
negative_prompt="",
72+
height=1024 if args.height is None else args.height,
73+
width=1024 if args.width is None else args.width,
74+
num_inference_steps=steps,
75+
guidance_scale=5.0, # has separate cfg for ovis image
76+
generator=torch.Generator("cpu").manual_seed(0),
77+
).images[0]
78+
return image
79+
80+
81+
# warmup
82+
_ = run_pipe()
83+
84+
memory_tracker = MemoryTracker() if args.track_memory else None
85+
86+
if memory_tracker:
87+
memory_tracker.__enter__()
88+
89+
start = time.time()
90+
if args.profile:
91+
profiler = create_profiler_from_args(args, profile_name="ovis_image_inference")
92+
with profiler:
93+
image = run_pipe()
94+
print(f"Profiler traces saved to: {profiler.output_dir}/{profiler.trace_path.name}")
95+
else:
96+
image = run_pipe()
97+
end = time.time()
98+
99+
if memory_tracker:
100+
memory_tracker.__exit__(None, None, None)
101+
memory_tracker.report()
102+
103+
cache_dit.summary(pipe)
104+
105+
time_cost = end - start
106+
save_path = f"ovis_image.{strify(args, pipe)}.png"
107+
print(f"Time cost: {time_cost:.2f}s")
108+
print(f"Saving image to {save_path}")
109+
image.save(save_path)

src/cache_dit/caching/block_adapters/__init__.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -833,3 +833,31 @@ def zimage_adapter(pipe, **kwargs) -> BlockAdapter:
833833
"ZImageTransformer2DModel is not available in the current diffusers version. "
834834
"Please upgrade diffusers>=0.36.dev0 to use this adapter."
835835
)
836+
837+
838+
@BlockAdapterRegister.register("OvisImage")
839+
def ovis_image_adapter(pipe, **kwargs) -> BlockAdapter:
840+
try:
841+
from diffusers import OvisImageTransformer2DModel
842+
843+
_relaxed_assert_transformer(pipe.transformer, OvisImageTransformer2DModel)
844+
return BlockAdapter(
845+
pipe=pipe,
846+
transformer=pipe.transformer,
847+
blocks=[
848+
pipe.transformer.transformer_blocks,
849+
pipe.transformer.single_transformer_blocks,
850+
],
851+
forward_pattern=[
852+
ForwardPattern.Pattern_1,
853+
ForwardPattern.Pattern_1,
854+
],
855+
check_forward_pattern=True,
856+
has_separate_cfg=True,
857+
**kwargs,
858+
)
859+
except ImportError:
860+
raise ImportError(
861+
"OvisImageTransformer2DModel is not available in the current diffusers version. "
862+
"Please upgrade diffusers>=0.36.dev0 to use this adapter."
863+
)

src/cache_dit/caching/block_adapters/block_registers.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ class BlockAdapterRegister:
2525
"Kandinsky5",
2626
"ChronoEdit",
2727
"HunyuanVideo15",
28+
"OvisImage",
2829
]
2930

3031
@classmethod

0 commit comments

Comments
 (0)