[1/n] Add vLLM wrapper support #3794

noooop · 2025-12-24T09:28:48Z

MTEB testing helps align vllm with the implementation of sentence-transformers, and identifying potential numerical precision issues is becoming increasingly important.

It's great to have MTEB support vLLM as a backend.

Fix #3747

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-12-24T09:38:51Z

@Samoed

I’m not very familiar with the MTEB framework—I’m unsure where the relevant code should be placed and how the interface should be designed. So for now, I’ve implemented a very, very dirtyversion in order to get feedback as early as possible.

VLLM has many, many parameters. I have highlighted the most commonly used ones for MTEB users and added a lot of comments, which may require optimization.

Signed-off-by: wang.yuqi <[email protected]>

mteb/models/vllm_wrapper.py

Samoed

Overall, this looks good! I think we can extend and change further if some models would need to update something. Thank you for adding!

mteb/models/vllm_wrapper.py

Signed-off-by: wang.yuqi <[email protected]>

test/embed.py

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-12-25T10:30:57Z

How should documents, examples, and tests be organized?

Samoed · 2025-12-25T10:37:42Z

I don't think you need to add tests, because this is wrapper and for now no models use it.

As for docs, you can keep as docstrings without additional documentation

Samoed

Wrapper looks good. You can add vllm as optional dependency to pyproject and add check to VllmWrapperBase like here

mteb/mteb/models/search_encoder_index/search_indexes/faiss_search_index.py

Lines 28 to 33 in d3bc4cc

    
           requires_package( 
        
               self, 
        
               "faiss", 
        
               "FAISS-based search", 
        
               install_instruction="pip install mteb[faiss-cpu]", 
        
           )

mteb/models/vllm_wrapper.py

Signed-off-by: wang.yuqi <[email protected]>

KennethEnevoldsen · 2025-12-25T16:16:06Z

I don't think you need to add tests, because this is wrapper and for now no models use it.

I think we should at least write one test wrapping a smallish model (even if it is not the official reference implementation)
This test can also act as the example used in the documentation for the vLLM wrapper.

Samoed · 2025-12-25T16:19:02Z

Yes, but I'm not sure if it would install and run correctly in CI

KennethEnevoldsen

Great work on this! I would love to add the tests + documentation.

It would even be great to add a blog post entry on comparing the speed between the sentence transformers implementation and the vLLM implementation, however that is def. optional.

demo/embed.py

mteb/models/vllm_wrapper.py

pyproject.toml

demo/embed.py

mteb/models/vllm_wrapper.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-12-26T03:31:02Z

docs/advanced_usage/vllm_wrapper.md

+## vLLM Wrapper
+
+!!! note 
+    vLLM currently only supports a limited number of models, with many model implementations having subtle differences compared to the default implementations in mteb. We are working on it.
+
+## Installation
+
+Reference: https://docs.vllm.ai/en/latest/getting_started/installation/
+
+=== "uv"


I think we should add these two examples to the documentation. My suggestion would be here:

https://embeddings-benchmark.github.io/mteb/usage/defining_the_model/#using-a-sentence-transformer-model

under the title ## using vLLM

how about docs/advanced_usage/vllm_wrapper.md

I'm not very good at naming things or writing documents (as I'm a non-native English speaker). It would be great if someone could help rename things and optimize the documentation. Make the content of this PR fit naturally with the other parts.

I can go over the docs once we have settled on content

Signed-off-by: wang.yuqi <[email protected]>

demo/benchmark_vllm.py

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-12-26T08:04:44Z

demo/benchmark_vllm2.py

+            print(
+                f"Batchsize {batchsize}, Input_len {input_len} Throughput: "
+                f"{len(prompts) / elapsed_time:.4f} requests/s, "
+                f"{len(prompts * input_len) / elapsed_time:.4f} tokens/s, "
+                f"Latency {delay * 1000:0.2f} ms, n_step {n_step}"
+            )


If the comparison is fair and the methodology is reasonable, the gap between the ST implementation and the VLLM implementation is not significant.

X-axis: Throughput (request/s)
Y-axis: Latency, Time needed for one step (ms) <- logarithmic scale
The curve lower right is better ↘

The throughput using float16 is approximately four times that of float32. VLLM has almost no optimization for float32 inference.

VLLM does not use padding, while ST uses padding. For a batch of requests, if there is a significant disparity in sequence lengths, ST's performance will undoubtedly be very poor.

Great illustration! I think you can add labels to axis and add to docs

Do we want to add this as a blog post (I think I would prefer that) - so it is more of a "how to compare" rather that "this is how it compares", which is harder to maintain

blog post

Do you mean as HF post?

I was thinking of doing a blog post like this:

https://squidfunk.github.io/mkdocs-material/blog/2022/09/12/blog-support-just-landed/

(easy to copy to HF if we want to)

Oh, nice! we can add this

Yeah - it is a pretty good way to e.g. announce stuff or showcase an example, while not enforcing us to keep updating it

Co-authored-by: Roman Solomatin <[email protected]>

Signed-off-by: wang.yuqi <[email protected]>

KennethEnevoldsen · 2025-12-26T09:06:33Z

demo/benchmark_vllm2.py

+            print(
+                f"Batchsize {batchsize}, Input_len {input_len} Throughput: "
+                f"{len(prompts) / elapsed_time:.4f} requests/s, "
+                f"{len(prompts * input_len) / elapsed_time:.4f} tokens/s, "
+                f"Latency {delay * 1000:0.2f} ms, n_step {n_step}"
+            )


Do we want to add this as a blog post (I think I would prefer that) - so it is more of a "how to compare" rather that "this is how it compares", which is harder to maintain

KennethEnevoldsen · 2025-12-26T09:12:22Z

docs/advanced_usage/vllm_wrapper.md

Note to myself for when I rewrite this:

add link to speeding up mteb section

split into API docs and usage documentation

remove function scope

rewrite install section into admonition

refactor heading

I’ve submitted anything I found useful. It’s best to just update this PR to get it ready.

Signed-off-by: wang.yuqi <[email protected]>

docs/advanced_usage/vllm_wrapper.md

noooop · 2025-12-29T12:06:14Z

I think the content of this PR is pretty solid. Let’s polish it up so it’s ready to merge.

docs/advanced_usage/vllm_wrapper.md

Samoed · 2025-12-29T13:55:37Z

Too many conflicts in dependencies. Probably we can move out model specific tests to different actions to simplify everything

noooop · 2025-12-29T14:57:13Z

Too many conflicts in dependencies. Probably we can move out model specific tests to different actions to simplify everything

vLLM has very complex dependencies; that’s why I haven’t added tests to CI.

demo/benchmark_st.py

Signed-off-by: wang.yuqi <[email protected]>

noooop and others added 2 commits December 23, 2025 18:29

init

33176ce

init

5ab3de0

Signed-off-by: wang.yuqi <[email protected]>

ruff

9ae5c58

Signed-off-by: wang.yuqi <[email protected]>

Samoed reviewed Dec 24, 2025

View reviewed changes

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

Samoed reviewed Dec 24, 2025

View reviewed changes

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

noooop added 2 commits December 24, 2025 18:04

- vllm_loader

d7e42f4

Signed-off-by: wang.yuqi <[email protected]>

+ TYPE_CHECKING

a7f6873

Signed-off-by: wang.yuqi <[email protected]>

noooop commented Dec 24, 2025

View reviewed changes

test/embed.py Outdated Show resolved Hide resolved

noooop and others added 3 commits December 24, 2025 18:37

Make vLLM exit properly.

0c1c606

Signed-off-by: wang.yuqi <[email protected]>

rename

373d638

Signed-off-by: wang.yuqi <[email protected]>

support rerank

880b08b

Signed-off-by: wang.yuqi <[email protected]>

Samoed reviewed Dec 25, 2025

View reviewed changes

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

noooop added 2 commits December 25, 2025 21:11

refine

d97b2ce

Signed-off-by: wang.yuqi <[email protected]>

refine

9c12067

Signed-off-by: wang.yuqi <[email protected]>

KennethEnevoldsen reviewed Dec 25, 2025

View reviewed changes

demo/embed.py Outdated Show resolved Hide resolved

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

mteb/models/vllm_wrapper.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

demo/embed.py Outdated Show resolved Hide resolved

KennethEnevoldsen reviewed Dec 25, 2025

View reviewed changes

mteb/models/vllm_wrapper.py Show resolved Hide resolved

noooop and others added 3 commits December 26, 2025 09:51

Update mteb/models/vllm_wrapper.py

c501253

Co-authored-by: Kenneth Enevoldsen <[email protected]>

refine

b9d4cf2

Signed-off-by: wang.yuqi <[email protected]>

+ docs

6bac26d

Signed-off-by: wang.yuqi <[email protected]>

noooop commented Dec 26, 2025

View reviewed changes

noooop changed the title ~~Add vLLM wrapper support~~ [1/n] Add vLLM wrapper support Dec 26, 2025

+ benchmark

003bd20

Signed-off-by: wang.yuqi <[email protected]>

noooop commented Dec 26, 2025

View reviewed changes

demo/benchmark_vllm.py Outdated Show resolved Hide resolved

+ more benchmark

9b24f24

Signed-off-by: wang.yuqi <[email protected]>

noooop commented Dec 26, 2025

View reviewed changes

noooop and others added 2 commits December 26, 2025 16:15

Update docs/advanced_usage/vllm_wrapper.md

6ef8943

Co-authored-by: Roman Solomatin <[email protected]>

refine docs

be0c5bb

Signed-off-by: wang.yuqi <[email protected]>

KennethEnevoldsen reviewed Dec 26, 2025

View reviewed changes

refine docs

2d90268

Signed-off-by: wang.yuqi <[email protected]>

noooop commented Dec 29, 2025

View reviewed changes

docs/advanced_usage/vllm_wrapper.md Show resolved Hide resolved

noooop marked this pull request as ready for review December 29, 2025 12:02

Samoed added 5 commits December 29, 2025 17:10

Merge branch 'main' into mtebXvllm

930b782

fix typing

e699013

move type ignore

17a68ec

doc upd

5ca5c40

add test

577eac5

Samoed reviewed Dec 29, 2025

View reviewed changes

docs/advanced_usage/vllm_wrapper.md Outdated Show resolved Hide resolved

Update Makefile

e6141e2

noooop commented Dec 29, 2025

View reviewed changes

demo/benchmark_st.py Outdated Show resolved Hide resolved

Samoed and others added 6 commits December 30, 2025 00:46

add support for prompts

1e40fa1

add support for prompts

69084e1

- demo

870ee87

Signed-off-by: wang.yuqi <[email protected]>

make mypy happy

f2641fb

Signed-off-by: wang.yuqi <[email protected]>

fix typehints

019aadf

Merge branch 'main' into mtebXvllm

b2534d1

Samoed requested a review from KennethEnevoldsen January 5, 2026 06:48

Samoed and others added 6 commits January 5, 2026 11:52

update pyproject

1cd85e7

update pyproject

98bcdb7

update pyproject

0f6db42

The pooling + dp fails to run.

5134c3f

fix uv lock

2528ede

fix docs

cfed518

	requires_package(
	self,
	"faiss",
	"FAISS-based search",
	install_instruction="pip install mteb[faiss-cpu]",
	)

[1/n] Add vLLM wrapper support #3794

Are you sure you want to change the base?

[1/n] Add vLLM wrapper support #3794

Conversation

noooop commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Dec 24, 2025

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noooop commented Dec 25, 2025

Uh oh!

Samoed commented Dec 25, 2025

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Dec 25, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noooop Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

noooop Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

noooop commented Dec 29, 2025

Uh oh!

Uh oh!

Samoed commented Dec 29, 2025

Uh oh!

noooop commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

noooop commented Dec 24, 2025 •

edited

Loading

KennethEnevoldsen commented Dec 25, 2025 •

edited

Loading

noooop Dec 26, 2025 •

edited

Loading

noooop Dec 26, 2025 •

edited

Loading