Skip to content

Conversation

@eustlb
Copy link
Contributor

@eustlb eustlb commented Jan 6, 2026

What does this PR do?

two things;

  1. factorize the cache init so that it can be overwritten
  2. handle the specific edge case with short hidden state: if lenght is < from padding, then the cache init should be taking into account in the padding states. This edge case is never matched with mimi but happens with other models that would use this cache approach (Implement VibeVoice  #40546 )

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

self.per_layer_padding = per_layer_padding
self.per_layer_padding_mode = per_layer_padding_mode
self.per_layer_in_channels = per_layer_in_channels
self.per_layer_is_init = [True] * num_layers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used anywhere, let's remove

@eustlb eustlb mentioned this pull request Jan 6, 2026
torch.ones(batch_size, in_channels, padding, device=device, dtype=dtype) * hidden_states[..., :1]
)
else:
raise NotImplementedError(f"Padding mode {padding_mode} not supported")
Copy link
Contributor

@ebezzam ebezzam Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to add _cache_init so that different models can initialize differently if needed

How about removing the check here inside init? Since you have a similar one in this new method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we could remove, but also it can be convenient to have raise an error at init to avoid having to run the full forward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum actually you're right, overwise it will be inconvenient and we'll have to also overwrite the init when working with other modes! thanks, removing

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: kyutai_speech_to_text, mimi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants