Skip to content

Conversation

@Akulyat
Copy link

@Akulyat Akulyat commented Sep 19, 2025

PR description

KVComposePress (source, paper) is a structured KV cache compression method that uses attention-guided composite tokens. Our method aggregates per-head importance scores, aligns them into composite tokens to preserve cache structure, and allocates retention budgets across layers.
For the RULER benchmark, KVCompose achieves the best performance among structured methods and competitive results vs KVZip for unstructured settings.

Checklist

  • Tests are working (make test)
    • (couldn't make it work locally, I have certain limitations, hope they run and work here in the PR)
  • Code is formatted correctly (make style, on errors try fix with make format)
  • Copyright header is included
  • All commits are signed-off using git commit -s
  • (new press) mypress_press.py is in the presses directory
  • (new press) MyPress is in __init__.py
  • (new press) README.md is updated with a 1 liner about the new press in the Available presses section
  • (new press) New press is in the default_presses list in tests/default_presses.py
  • (new press) A docstring is provided that follows the same structure as the existing ones

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@alessiodevoto
Copy link
Contributor

/ok to test 84f63f2

@alessiodevoto
Copy link
Contributor

Hi @Akulyat, thanks for opening this PR! 🙂 I had a first glance, we will provide a more detailed review in the coming days. Some initial comments:

  • Please make sure the commits are signed off (git commit -s) and the style checks work (you can refer to the PR template). You can fix this by running git rebase --exec 'git commit --amend --no-edit -s' HEAD~2 or by starting an interactive rebase
  • Thanks for sharing the results as well! However, they should not go in KVPress but rather in our HuggingFace Leaderboard. Please remove them from this PR and feel free to submit them by opening a PR on the Hugging Face Sapce, once the code here is stable!
  • Could you add the new press is in the default_presses list in tests/default_presses.py ? This way the press will tested against the library and make sure it is compatible
  • In general, some comments about the method might help readability.

Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
@Akulyat
Copy link
Author

Akulyat commented Sep 23, 2025

Hi @alessiodevoto, thank you for the quick feedback!

Addressing your comments:

  • Commits are signed off, style checks work.
  • The results are removed from this PR.
  • Added the press to tests/default_presses.py.
  • I couldn’t run the tests locally due to some limitations, so I’m relying on them being validated here.
  • Main functions in the press are commented. I can update or expand the comments after the detailed review.

I have a question regarding the results: should I wait for this PR to be merged, or should I already open a PR to the Leaderboard?

@Akulyat Akulyat changed the title Add KVCompose results Add KVCompose Sep 23, 2025
@alessiodevoto
Copy link
Contributor

/ok to test b0b3715

Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
@Akulyat
Copy link
Author

Akulyat commented Sep 30, 2025

Following yesterday’s testing:

  • Fixed style errors.
  • Concerning two failed tests:
    • One was a real bug, fixed.
    • The other was the ComposedPress compatibility test. Our implementation requires attentions and is not currently compatible with ComposedPress. I haven't properly thought about whether it can be fixed, this limitation is open for discussion.
      Yet, as for KVzip, I added an exception for KVComposePress.

@maxjeblick maxjeblick requested a review from Jack-Yu-815 October 1, 2025 07:12
Copy link
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your code, I added some additional comments w.r.t. the review above.

The most important part is to avoid the potential copying of the cache, as this is in direct conflict with memoery reduction.

@SimJeg
Copy link
Collaborator

SimJeg commented Nov 4, 2025

@Akulyat thank you for citing our work on kvpress in https://arxiv.org/abs/2509.05165. We recently published a reference paper for kvpress and updated the citation section of our README accordingly. We’d appreciate it if you could update your paper

Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
@Akulyat
Copy link
Author

Akulyat commented Nov 15, 2025

Thank you very much for all the suggestions.
I’ve fixed the issues, and I left a few replies where I need clarification(statelessness, hack with eager attention).

Copy link
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response and thanks a lot for your changes!
We've done a second round of review and have the following comments:

Please reset the press to it's default state after compression.
There are some merge conflicts.

Your method will create a kv-cache of size 2x of the context length. As this creates a significant memory overhead, we need this to be apparent both in the press and in the README. I'd thus kindly ask you to

  • Add a logger.warning in the __call__ method, informing the user about it.
  • Adding it as a note to the main README.

As a remark: I obtained mean score of 57.2% on 50% CR on ruler, using the default settings in the repo, otherwise. IDK if this is expected with your method.

Details:

2025-12-02 01:39:17,814 - INFO - Metrics saved to results/ruler__4096__meta-llama--Meta-Llama-3.1-8B-Instruct__kvcompose_unstructured__0.50/2/metrics.json
2025-12-02 01:39:17,814 - INFO - Average compression ratio: 0.50
2025-12-02 01:39:17,814 - INFO - Metrics:
{
"cwe": {
"string_match": 17.3
},
"fwe": {
"string_match": 52.0
},
"niah_multikey_1": {
"string_match": 93.0
},
"niah_multikey_2": {
"string_match": 78.6
},
"niah_multikey_3": {
"string_match": 3.0
},
"niah_multiquery": {
"string_match": 55.1
},
"niah_multivalue": {
"string_match": 86.25
},
"niah_single_1": {
"string_match": 98.2
},
"niah_single_2": {
"string_match": 95.4
},
"niah_single_3": {
"string_match": 62.6
},
"qa_1": {
"string_match": 56.0
},
"qa_2": {
"string_match": 25.0
},
"vt": {
"string_match": 21.92
}
}

Signed-off-by: Dmitry Akulov <akulyats@gmail.com>
@Akulyat
Copy link
Author

Akulyat commented Jan 16, 2026

Thank you for the review and the detailed comments.

I have addressed the requested changes and left a separate reply regarding the last point on eager attention.

Additionally, the metrics you provided do not match what we got and plan to submit to the leaderboard. Could you please share the full config file you used to run this experiment?

@maxjeblick
Copy link
Collaborator

Thanks @Akulyat a lot for the updates and the fixes!

Before merging, I kindly ask you to:

  • Reset state after compression - Add a _reset_state() method (similar to KVzipPress._reset_internal_parameters()) and call it in the finally block
  • Resolve merge conflicts
  • There's probably a type in logger.warning( "KVComposePress temporarily creates a KV cache of ~2x the context length during prefill; " )

I will rerun the experiment and double check there wasn't an issue on my side. I'll update you in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants