Skip to content

Conversation

@noooop
Copy link
Contributor

@noooop noooop commented Dec 24, 2025

MTEB testing helps align vllm with the implementation of sentence-transformers, and identifying potential numerical precision issues is becoming increasingly important.

It's great to have MTEB support vLLM as a backend.

Fix #3747

noooop and others added 2 commits December 23, 2025 18:29
Signed-off-by: wang.yuqi <[email protected]>
@noooop
Copy link
Contributor Author

noooop commented Dec 24, 2025

@Samoed

I’m not very familiar with the MTEB framework—I’m unsure where the relevant code should be placed and how the interface should be designed. So for now, I’ve implemented a very, very dirtyversion in order to get feedback as early as possible.

VLLM has many, many parameters. I have highlighted the most commonly used ones for MTEB users and added a lot of comments, which may require optimization.

Signed-off-by: wang.yuqi <[email protected]>
Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good! I think we can extend and change further if some models would need to update something. Thank you for adding!

Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
noooop and others added 3 commits December 24, 2025 18:37
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
@noooop
Copy link
Contributor Author

noooop commented Dec 25, 2025

How should documents, examples, and tests be organized?

@Samoed
Copy link
Member

Samoed commented Dec 25, 2025

I don't think you need to add tests, because this is wrapper and for now no models use it.

As for docs, you can keep as docstrings without additional documentation

Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapper looks good. You can add vllm as optional dependency to pyproject and add check to VllmWrapperBase like here

requires_package(
self,
"faiss",
"FAISS-based search",
install_instruction="pip install mteb[faiss-cpu]",
)

Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Dec 25, 2025

I don't think you need to add tests, because this is wrapper and for now no models use it.

I think we should at least write one test wrapping a smallish model (even if it is not the official reference implementation)
This test can also act as the example used in the documentation for the vLLM wrapper.

@Samoed
Copy link
Member

Samoed commented Dec 25, 2025

Yes, but I'm not sure if it would install and run correctly in CI

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this! I would love to add the tests + documentation.

It would even be great to add a blog post entry on comparing the speed between the sentence transformers implementation and the vLLM implementation, however that is def. optional.

noooop and others added 3 commits December 26, 2025 09:51
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Comment on lines 1 to 10
## vLLM Wrapper

!!! note
vLLM currently only supports a limited number of models, with many model implementations having subtle differences compared to the default implementations in mteb. We are working on it.

## Installation

Reference: https://docs.vllm.ai/en/latest/getting_started/installation/

=== "uv"
Copy link
Contributor Author

@noooop noooop Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add these two examples to the documentation. My suggestion would be here:

https://embeddings-benchmark.github.io/mteb/usage/defining_the_model/#using-a-sentence-transformer-model

under the title ## using vLLM

how about docs/advanced_usage/vllm_wrapper.md

I'm not very good at naming things or writing documents (as I'm a non-native English speaker). It would be great if someone could help rename things and optimize the documentation. Make the content of this PR fit naturally with the other parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go over the docs once we have settled on content

@noooop noooop changed the title Add vLLM wrapper support [1/n] Add vLLM wrapper support Dec 26, 2025
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Comment on lines 52 to 57
print(
f"Batchsize {batchsize}, Input_len {input_len} Throughput: "
f"{len(prompts) / elapsed_time:.4f} requests/s, "
f"{len(prompts * input_len) / elapsed_time:.4f} tokens/s, "
f"Latency {delay * 1000:0.2f} ms, n_step {n_step}"
)
Copy link
Contributor Author

@noooop noooop Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the comparison is fair and the methodology is reasonable, the gap between the ST implementation and the VLLM implementation is not significant.

image

X-axis: Throughput (request/s)
Y-axis: Latency, Time needed for one step (ms) <- logarithmic scale
The curve lower right is better ↘

  • The throughput using float16 is approximately four times that of float32. VLLM has almost no optimization for float32 inference.
  • VLLM does not use padding, while ST uses padding. For a batch of requests, if there is a significant disparity in sequence lengths, ST's performance will undoubtedly be very poor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great illustration! I think you can add labels to axis and add to docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add this as a blog post (I think I would prefer that) - so it is more of a "how to compare" rather that "this is how it compares", which is harder to maintain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blog post

Do you mean as HF post?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of doing a blog post like this:

https://squidfunk.github.io/mkdocs-material/blog/2022/09/12/blog-support-just-landed/

(easy to copy to HF if we want to)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice! we can add this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it is a pretty good way to e.g. announce stuff or showcase an example, while not enforcing us to keep updating it

noooop and others added 2 commits December 26, 2025 16:15
Signed-off-by: wang.yuqi <[email protected]>
Comment on lines 52 to 57
print(
f"Batchsize {batchsize}, Input_len {input_len} Throughput: "
f"{len(prompts) / elapsed_time:.4f} requests/s, "
f"{len(prompts * input_len) / elapsed_time:.4f} tokens/s, "
f"Latency {delay * 1000:0.2f} ms, n_step {n_step}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add this as a blog post (I think I would prefer that) - so it is more of a "how to compare" rather that "this is how it compares", which is harder to maintain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself for when I rewrite this:

  1. add link to speeding up mteb section
  2. split into API docs and usage documentation
  3. remove function scope
  4. rewrite install section into admonition
  5. refactor heading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve submitted anything I found useful. It’s best to just update this PR to get it ready.

Signed-off-by: wang.yuqi <[email protected]>
@noooop noooop marked this pull request as ready for review December 29, 2025 12:02
@noooop
Copy link
Contributor Author

noooop commented Dec 29, 2025

I think the content of this PR is pretty solid. Let’s polish it up so it’s ready to merge.

@Samoed
Copy link
Member

Samoed commented Dec 29, 2025

Too many conflicts in dependencies. Probably we can move out model specific tests to different actions to simplify everything

@noooop
Copy link
Contributor Author

noooop commented Dec 29, 2025

Too many conflicts in dependencies. Probably we can move out model specific tests to different actions to simplify everything

vLLM has very complex dependencies; that’s why I haven’t added tests to CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add vLLM wrapper support

3 participants