[MAEB] Wav2Clip Text Encoder by AdnanElAssadi56 · Pull Request #3781 · embeddings-benchmark/mteb

AdnanElAssadi56 · 2025-12-22T07:04:33Z

Related to #3545

@Samoed I've double-checked the paper (arXiv:2110.11499v2), and it seem to confirm that we can use the standard CLIP text encoder.

The CLIP model is not tuned. The paper explicitly states in Section 2: "Throughout distillation, the original CLIP model weights are kept frozen".
The authors note that "we get image and text modality for free".
In their own evaluation (Section 3.2), they describe the process as extracting "CLIP text and Wav2CLIP audio embeddings".

Since the text encoder is identical to the standard CLIP encoder, I think we can safely get the text embeddings from the original CLIP model, and they will be mathematically aligned with the audio embeddings from wav2clip.

Samoed · 2025-12-22T08:17:00Z

-        # text side (CLIP)
-        self.clip = CLIPModel.from_pretrained(model_name, revision=revision).to(device)
+        # text side (CLIP) - we use the standard OpenAI CLIP model as mentioned in paper
+        clip_model_name = "openai/clip-vit-base-patch32"


Can we process this without loading other model?

If you mean this way:

self.clip = CLIPModel.from_pretrained(model_name, revision=revision).to(device)

Then, no, because there is no huggingface revision for the model, and it was giving a 404 error during evaluation.

I mean to retrieve from existing model somehow, but they're using this method for encoding and this is hard to tell how to do this with this model https://github.com/descriptinc/lyrebird-wav2clip/blob/1864b3924be5a785e2d49d975b8a26ff93f62951/wav2clip/__init__.py#L34

https://github.com/descriptinc/lyrebird-wav2clip/blob/1864b3924be5a785e2d49d975b8a26ff93f62951/wav2clip/model/encoder.py#L117-L122

github-actions · 2026-01-06T02:17:13Z

This pull request has been automatically marked as stale due to inactivity.

Use standard CLIP model for text encoder

6c5f6ee

AdnanElAssadi56 requested a review from Samoed December 22, 2025 07:04

fix numpy.ndarray mismatch

85ac699

Samoed reviewed Dec 22, 2025

View reviewed changes

Comment thread mteb/models/model_implementations/wav2clip_model.py

Samoed added the audio Audio extension label Dec 23, 2025

github-actions Bot added the stale label Jan 6, 2026

Samoed mentioned this pull request Jan 6, 2026

[MAEB] Add proper wav2clip text encoder #3849

Merged

AdnanElAssadi56 closed this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAEB] Wav2Clip Text Encoder#3781

[MAEB] Wav2Clip Text Encoder#3781
AdnanElAssadi56 wants to merge 2 commits intoembeddings-benchmark:maebfrom
AdnanElAssadi56:maeb-model-wav2clip_fix

AdnanElAssadi56 commented Dec 22, 2025

Uh oh!

Samoed Dec 22, 2025

Uh oh!

AdnanElAssadi56 Dec 22, 2025

Uh oh!

Samoed Dec 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdnanElAssadi56 commented Dec 22, 2025

Uh oh!

Samoed Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

AdnanElAssadi56 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Samoed Dec 22, 2025 •

edited

Loading