-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Is this a new bug?
- I believe this is a new bug
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
when I try to get sparse vectors using encode_documents and encode_queries for the same piece of text is gives different values.
piece to text : "the lazy dog"
encode_documents values : 0.58, 0.58
encode_queries: 0.5
Expected Behavior
Getting different values for encode_documents and encode encode_queries for the same piece of text. expecting values should be 0.5 for both right but there is ~0.08 difference, am I missing something?
Steps To Reproduce
from pinecone_text.sparse import BM25Encoder
corpus = ["The quick brown fox jumps over the lazy dog", "The lazy dog is brown"]
bm25 = BM25Encoder()
bm25.fit(corpus)
print(bm25.encode_documents("the lazy dog"))
### Output: {'indices': [226376294, 2982218203], 'values': [0.5882352941176472, 0.5882352941176472]}
print(bm25.encode_queries("the lazy dog"))
### Output: {'indices': [226376294, 2982218203], 'values': [0.5, 0.5]}Relevant log output
No response
Environment
OS: Ubuntu 20.04
Python 3.9.12
pinecone-text==0.9.0Additional Context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working