Make unigram cache optional by wangrunji0408 · Pull Request #1763 · huggingface/tokenizers

wangrunji0408 · 2025-04-18T09:30:20Z

We observed that the Unigram model's cache introduces significant contention overhead under multi-threaded scenarios. To mitigate this, we made the Unigram cache optional following BPE model's approach. Users can now disable it via resize_cache(0). Additionally, we added TokenizerImpl::get_model_mut to obtain a mutable reference of model.

ArthurZucker

Sorry for my late review! happy to merge, can you also update the python bindings to expose this in python as well! Can come in handy!

ArthurZucker · 2025-05-27T06:30:51Z

+    /// Get a mutable reference to the model
+    pub fn get_model_mut(&mut self) -> &mut M {
+        &mut self.model
+    }
+
    /// Set the added vocabulary.


as this is unused here I would not put it in this PR!

…-cache-optional

ArthurZucker

Ah actually I am not sure I understand the point here, you can just already resize to 0 should not make. a difference

wangrunji0408 added 3 commits April 18, 2025 15:01

make cache optional in unigram model

dbf25f8

add TokenizerImpl::get_model_mut

a35e6e1

disable cache if capacity is 0 for BPE

beba382

ArthurZucker reviewed May 27, 2025

View reviewed changes

Merge branch 'main' of github.com:huggingface/tokenizers into unigram…

1f43684

…-cache-optional

ArthurZucker reviewed Aug 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make unigram cache optional#1763

Make unigram cache optional#1763
wangrunji0408 wants to merge 4 commits intohuggingface:mainfrom
wangrunji0408:unigram-cache-optional

wangrunji0408 commented Apr 18, 2025 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker May 27, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wangrunji0408 commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker May 27, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangrunji0408 commented Apr 18, 2025 •

edited

Loading