[mimi] mimi conv cache edge case #43130

eustlb · 2026-01-06T15:07:14Z

What does this PR do?

two things;

factorize the cache init so that it can be overwritten
handle the specific edge case with short hidden state: if lenght is < from padding, then the cache init should be taking into account in the padding states. This edge case is never matched with mimi but happens with other models that would use this cache approach (Implement VibeVoice #40546 )

HuggingFaceDocBuilderDev · 2026-01-06T15:15:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eustlb · 2026-01-06T15:16:52Z

src/transformers/models/kyutai_speech_to_text/modeling_kyutai_speech_to_text.py

        self.per_layer_padding = per_layer_padding
        self.per_layer_padding_mode = per_layer_padding_mode
        self.per_layer_in_channels = per_layer_in_channels
-        self.per_layer_is_init = [True] * num_layers


this is not used anywhere, let's remove

ebezzam · 2026-01-09T04:16:26Z

src/transformers/models/kyutai_speech_to_text/modeling_kyutai_speech_to_text.py

+                torch.ones(batch_size, in_channels, padding, device=device, dtype=dtype) * hidden_states[..., :1]
+            )
+        else:
+            raise NotImplementedError(f"Padding mode {padding_mode} not supported")


Good idea to add _cache_init so that different models can initialize differently if needed

How about removing the check here inside init? Since you have a similar one in this new method?

Agree we could remove, but also it can be convenient to have raise an error at init to avoid having to run the full forward

hum actually you're right, overwise it will be inconvenient and we'll have to also overwrite the init when working with other modes! thanks, removing

github-actions · 2026-01-09T10:38:33Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: kyutai_speech_to_text, mimi

eustlb added 2 commits January 6, 2026 16:02

cache init + edge case handling

5c1af13

modular

8b80805

eustlb commented Jan 6, 2026

View reviewed changes

eustlb mentioned this pull request Jan 6, 2026

Implement VibeVoice #40546

Open

ebezzam reviewed Jan 9, 2026

View reviewed changes

remove check at init

d7e71df

ebezzam approved these changes Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mimi] mimi conv cache edge case #43130

[mimi] mimi conv cache edge case #43130

eustlb commented Jan 6, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 6, 2026

Uh oh!

eustlb Jan 6, 2026

Uh oh!

ebezzam Jan 9, 2026 •

edited

Loading

Uh oh!

eustlb Jan 9, 2026

Uh oh!

eustlb Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[mimi] mimi conv cache edge case #43130

Are you sure you want to change the base?

[mimi] mimi conv cache edge case #43130

Conversation

eustlb commented Jan 6, 2026

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 6, 2026

Uh oh!

eustlb Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

ebezzam Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eustlb Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

eustlb Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ebezzam Jan 9, 2026 •

edited

Loading