-
Notifications
You must be signed in to change notification settings - Fork 97
Add KVCompose #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add KVCompose #137
Conversation
|
/ok to test 84f63f2 |
|
Hi @Akulyat, thanks for opening this PR! 🙂 I had a first glance, we will provide a more detailed review in the coming days. Some initial comments:
|
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
|
Hi @alessiodevoto, thank you for the quick feedback! Addressing your comments:
I have a question regarding the results: should I wait for this PR to be merged, or should I already open a PR to the Leaderboard? |
|
/ok to test b0b3715 |
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
|
Following yesterday’s testing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your code, I added some additional comments w.r.t. the review above.
The most important part is to avoid the potential copying of the cache, as this is in direct conflict with memoery reduction.
|
@Akulyat thank you for citing our work on kvpress in https://arxiv.org/abs/2509.05165. We recently published a reference paper for kvpress and updated the citation section of our README accordingly. We’d appreciate it if you could update your paper |
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
|
Thank you very much for all the suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response and thanks a lot for your changes!
We've done a second round of review and have the following comments:
Please reset the press to it's default state after compression.
There are some merge conflicts.
Your method will create a kv-cache of size 2x of the context length. As this creates a significant memory overhead, we need this to be apparent both in the press and in the README. I'd thus kindly ask you to
- Add a logger.warning in the
__call__method, informing the user about it. - Adding it as a note to the main README.
As a remark: I obtained mean score of 57.2% on 50% CR on ruler, using the default settings in the repo, otherwise. IDK if this is expected with your method.
Details:
2025-12-02 01:39:17,814 - INFO - Metrics saved to results/ruler__4096__meta-llama--Meta-Llama-3.1-8B-Instruct__kvcompose_unstructured__0.50/2/metrics.json
2025-12-02 01:39:17,814 - INFO - Average compression ratio: 0.50
2025-12-02 01:39:17,814 - INFO - Metrics:
{
"cwe": {
"string_match": 17.3
},
"fwe": {
"string_match": 52.0
},
"niah_multikey_1": {
"string_match": 93.0
},
"niah_multikey_2": {
"string_match": 78.6
},
"niah_multikey_3": {
"string_match": 3.0
},
"niah_multiquery": {
"string_match": 55.1
},
"niah_multivalue": {
"string_match": 86.25
},
"niah_single_1": {
"string_match": 98.2
},
"niah_single_2": {
"string_match": 95.4
},
"niah_single_3": {
"string_match": 62.6
},
"qa_1": {
"string_match": 56.0
},
"qa_2": {
"string_match": 25.0
},
"vt": {
"string_match": 21.92
}
}
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
|
Thank you for the review and the detailed comments. I have addressed the requested changes and left a separate reply regarding the last point on eager attention. Additionally, the metrics you provided do not match what we got and plan to submit to the leaderboard. Could you please share the full config file you used to run this experiment? |
|
Thanks @Akulyat a lot for the updates and the fixes! Before merging, I kindly ask you to:
I will rerun the experiment and double check there wasn't an issue on my side. I'll update you in this thread. |
PR description
KVComposePress (source, paper) is a structured KV cache compression method that uses attention-guided composite tokens. Our method aggregates per-head importance scores, aligns them into composite tokens to preserve cache structure, and allocates retention budgets across layers.
For the RULER benchmark, KVCompose achieves the best performance among structured methods and competitive results vs KVZip for unstructured settings.
Checklist
Tests are working (make test)make style, on errors try fix withmake format)git commit -smypress_press.pyis in thepressesdirectoryMyPressis in__init__.pyREADME.mdis updated with a 1 liner about the new press in the Available presses sectiondefault_presseslist intests/default_presses.py