Skip to content

Add SAID-LAM-v1 results#387

Closed
SAIDHOME wants to merge 26 commits intoembeddings-benchmark:mainfrom
SAIDHOME:main
Closed

Add SAID-LAM-v1 results#387
SAIDHOME wants to merge 26 commits intoembeddings-benchmark:mainfrom
SAIDHOME:main

Conversation

@SAIDHOME
Copy link
Copy Markdown

@SAIDHOME SAIDHOME commented Jan 3, 2026

Add results for SAID-LAM-v1, featuring Linear Attention Memory (LAM) with
SAID Crystalline Attention (SCA) - BETA.

Results Summary

  • Main MTEB (eng, v2): 41 tasks, 48.99% average
  • LongEmbed: 6 tasks, 75.36% average
  • Perfect recall: 100% on LEMBNeedleRetrieval & LEMBPasskeyRetrieval

Model Details

  • Architecture: LAM with SAID Crystalline Attention (SCA)
  • Context: 32K tokens (licensed), 12K free tier
  • Embedding dimensions: 384
  • Organization: SAID Research (SaidHome.ai)

Checklist

  • My model has a model sheet, report, or similar
  • No, but there is an existing PR #3836
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

- Main MTEB (eng, v2): 41 tasks, 48.99% average
- LongEmbed: 6 tasks, 75.36% average
- Perfect recall: 100% on LEMBNeedleRetrieval & LEMBPasskeyRetrieval
- SAID Crystalline Attention (SCA) - BETA
- Organization: SAID Research / SaidHome.ai
- Added 'mteb_model_name': 'Said-Research/SAID-LAM-v1' to all 47 task files
- Required field for MTEB v2 submission validation
- Added 'mteb_model_name': 'Said-Research/SAID-LAM-v1' to all task files
- Required field for MTEB v2 submission validation
- Fixed files from SAID-LAM-v1 folder
- Set model_revision to 'main' to match metadata
- Ensures mteb_model_name: 'Said-Research/SAID-LAM-v1'
- Perfect match between metadata and results files
- Required for MTEB bot validation
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 4, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: Said-Research/SAID-LAM-v1
Tasks: AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, AskUbuntuDupQuestions, BIOSSES, Banking77Classification, BiorxivClusteringP2P.v2, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, ClimateFEVERHardNegatives, FEVERHardNegatives, FiQA2018, HotpotQAHardNegatives, ImdbClassification, MTOPDomainClassification, MassiveIntentClassification, MassiveScenarioClassification, MedrxivClusteringP2P.v2, MedrxivClusteringS2S.v2, MindSmallReranking, SCIDOCS, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSBenchmark, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, SummEvalSummarization.v2, TRECCOVID, Touche2020Retrieval.v3, ToxicConversationsClassification, TweetSentimentExtractionClassification, TwentyNewsgroupsClustering.v2, TwitterSemEval2015, TwitterURLCorpus

Results for Said-Research/SAID-LAM-v1

task_name Said-Research/SAID-LAM-v1 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AmazonCounterfactualClassification 0.6302 0.8820 0.6965 0.9696 GeoGPT-Research-Project/GeoEmbedding False
ArXivHierarchicalClusteringP2P 0.5192 0.6492 0.5569 0.6869 NovaSearch/jasper_en_vision_language_v1 False
ArXivHierarchicalClusteringS2S 0.5034 0.6384 0.5367 0.6548 Qwen/Qwen3-Embedding-8B False
ArguAna 0.2968 0.8644 0.5436 0.8979 voyageai/voyage-3-m-exp False
AskUbuntuDupQuestions 0.5403 0.6424 0.5924 0.7528 IEITYuan/Yuan-embedding-2.0-en False
BIOSSES 0.745 0.8897 0.8457 0.9692 Gameselo/STS-multilingual-mpnet-base-v2 False
Banking77Classification 0.7187 0.9427 0.7492 0.9427 google/gemini-embedding-001 False
BiorxivClusteringP2P.v2 0.3381 0.5386 0.372 0.8417 codefuse-ai/F2LLM-4B False
CQADupstackGamingRetrieval 0.3234 0.7068 0.587 0.8161 IEITYuan/Yuan-embedding-2.0-en False
CQADupstackUnixRetrieval 0.1765 0.5369 0.3988 0.7198 voyageai/voyage-3-m-exp False
ClimateFEVERHardNegatives 0.1108 0.3106 0.26 0.5905 IEITYuan/Yuan-embedding-2.0-en False
FEVERHardNegatives 0.1538 0.8898 0.8379 0.9453 ByteDance-Seed/Seed1.5-Embedding False
FiQA2018 0.1533 0.6178 0.4381 0.8206 ai-sage/Giga-Embeddings-instruct False
HotpotQAHardNegatives 0.2673 0.8701 0.7055 0.8701 google/gemini-embedding-001 False
ImdbClassification 0.643 0.9498 0.8867 0.9737 Qwen/Qwen3-Embedding-8B False
MTOPDomainClassification 0.5355 0.9796 0.9024 0.9995 voyageai/voyage-3-m-exp False
MassiveIntentClassification 0.2445 0.8192 0.6025 0.9194 voyageai/voyage-3-m-exp False
MassiveScenarioClassification 0.2923 0.8730 0.6509 0.9930 voyageai/voyage-3-m-exp False
MedrxivClusteringP2P.v2 0.3178 0.4716 0.3431 0.7199 codefuse-ai/F2LLM-4B False
MedrxivClusteringS2S.v2 0.2982 0.4501 0.3152 0.7023 codefuse-ai/F2LLM-4B False
MindSmallReranking 0.2941 0.3295 0.3024 0.3437 Kingsoft-LLM/QZhou-Embedding False
SCIDOCS 0.115 0.2515 0.1745 0.5986 IEITYuan/Yuan-embedding-2.0-en False
SICK-R 0.7352 0.8275 0.8023 0.9465 Gameselo/STS-multilingual-mpnet-base-v2 False
STS12 0.7502 0.8155 0.8002 0.9546 Gameselo/STS-multilingual-mpnet-base-v2 False
STS13 0.8566 0.8989 0.8155 0.9776 Gameselo/STS-multilingual-mpnet-base-v2 False
STS14 0.8278 0.8541 0.7772 0.9753 Gameselo/STS-multilingual-mpnet-base-v2 False
STS15 0.8779 0.9044 0.8931 0.9811 Gameselo/STS-multilingual-mpnet-base-v2 False
STS17 0.3171 0.8858 0.8214 0.9342 infgrad/Jasper-Token-Compression-600M False
STS22.v2 0.3338 0.7169 0.643 0.7718 Kingsoft-LLM/QZhou-Embedding False
STSBenchmark 0.819 0.8908 0.8729 0.9504 Kingsoft-LLM/QZhou-Embedding False
SprintDuplicateQuestions 0.896 0.9690 0.9318 0.9838 Kingsoft-LLM/QZhou-Embedding False
StackExchangeClustering.v2 0.4918 0.9207 0.4643 0.9207 google/gemini-embedding-001 False
StackExchangeClusteringP2P.v2 0.356 0.5091 0.3854 0.5510 Kingsoft-LLM/QZhou-Embedding False
SummEvalSummarization.v2 0.2474 0.3828 0.3141 0.3893 annamodels/LGAI-Embedding-Preview False
TRECCOVID 0.2941 0.8631 0.7115 0.9833 IEITYuan/Yuan-embedding-2.0-en False
Touche2020Retrieval.v3 0.2951 0.5239 0.4959 0.7465 Qwen/Qwen3-Embedding-4B False
ToxicConversationsClassification 0.6514 0.8875 0.6601 0.9759 voyageai/voyage-3-m-exp False
TweetSentimentExtractionClassification 0.5712 0.6988 0.628 0.8823 voyageai/voyage-3-m-exp False
TwentyNewsgroupsClustering.v2 0.3013 0.5737 0.3921 0.8758 GeoGPT-Research-Project/GeoEmbedding False
TwitterSemEval2015 0.5843 0.7917 0.7528 0.8946 voyageai/voyage-large-2-instruct False
TwitterURLCorpus 0.8399 0.8705 0.8583 0.9571 TencentBAC/Conan-embedding-v2 False
Average 0.4698 0.7290 0.6175 0.8385 nan -

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

related to: embeddings-benchmark/mteb#3836

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Jan 4, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale waiting for review of implementation This PR is waiting for an implementation review before merging the results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants