A comprehensive .NET solution for text embeddings and semantic similarity search using the E5 Multilingual model with ONNX Runtime.
This project provides a complete vector search implementation that includes:
- Text embedding generation using E5 Multilingual model
- Semantic similarity search capabilities
- SQLite database for storing embeddings
- REST API for easy integration
- Batch processing for dataset preparation
DotNetVectorSearch/
├── DotNetVectorSearch.Core/ # Core library with embedding services
│ ├── Embeddings/ # Embedding service implementations
│ ├── RuntimeProvider/ # ONNX runtime providers
│ └── Onnx/ # Model files (see setup requirements)
├── DotNetVectorSearch.Prepare/ # Dataset preparation tool
├── DotNetVectorSearch.WebAPI/ # REST API service
└── DotNetVectorSearch.sln # Solution file
- .NET 9.0 SDK
- Model files (see setup requirements below)
DotNetVectorSearch.Core/Onnx/ directory:
model_O4.onnx- The E5 Multilingual ONNX model filesentencepiece.bpe.model- The SentencePiece tokenizer model
Download the model files from: https://huggingface.co/intfloat/multilingual-e5-small/tree/main/onnx
These files are required for the embedding service to function properly. The application will throw an exception if these files are missing.
The WebAPI project requires an SQLite database file named embeddings.db in the DotNetVectorSearch.WebAPI/ directory. This file is generated by running the Prepare project first.
git clone <repository-url>
cd DotNetVectorSearchPlace the ONNX model files in the correct location:
DotNetVectorSearch.Core/Onnx/
├── model_O4.onnx
└── sentencepiece.bpe.model
dotnet buildIf you have a dataset to process, place your dataset.csv file in the DotNetVectorSearch.Prepare/ directory and run:
cd DotNetVectorSearch.Prepare
dotnet runThis will:
- Read the CSV dataset
- Generate embeddings for each text entry
- Create/update the
embeddings.dbSQLite database - Copy the database to the WebAPI project directory
cd DotNetVectorSearch.WebAPI
dotnet runThe API will be available at https://localhost:7000 (or the port specified in your launch settings).
POST /api/embeddings
Content-Type: application/json
{
"text": "Your text here"
}POST /api/embeddings/batch
Content-Type: application/json
{
"texts": ["Text 1", "Text 2", "Text 3"]
}POST /api/similarity
Content-Type: application/json
{
"text1": "First text",
"text2": "Second text"
}POST /api/search
Content-Type: application/json
{
"queryText": "Your search query",
"topK": 10,
"threshold": 0.7
}GET /health- E5 Multilingual Support: Uses the state-of-the-art E5 multilingual embedding model
- ONNX Runtime: Optimized inference using ONNX Runtime
- SQLite Storage: Efficient storage and retrieval of embeddings
- REST API: Easy integration with web applications
- Batch Processing: Support for processing large datasets
- Swagger Documentation: Interactive API documentation
- CORS Support: Cross-origin resource sharing enabled
- Health Checks: Built-in health monitoring
The API can be configured through appsettings.json:
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning"
}
},
"AllowedHosts": "*"
}The embedding service is configured in the E5MultilingualEmbeddings class:
- Maximum sequence length: 512 tokens
- Model path:
DotNetVectorSearch.Core/Onnx/model_O4.onnx - Tokenizer path:
DotNetVectorSearch.Core/Onnx/sentencepiece.bpe.model
- DotNetVectorSearch.Core: Core embedding functionality
- DotNetVectorSearch.Prepare: Depends on Core
- DotNetVectorSearch.WebAPI: Depends on Core
- .NET 9.0
- Microsoft.ML.OnnxRuntime
- Microsoft.ML.Tokenizers
- SQLite
- ASP.NET Core Web API
- Swagger/OpenAPI
- Model files not found: Ensure
model_O4.onnxandsentencepiece.bpe.modelare in theDotNetVectorSearch.Core/Onnx/directory - Database not found: Run the Prepare project first to generate the
embeddings.dbfile - Memory issues: The ONNX model requires sufficient RAM; consider adjusting batch sizes for large datasets
The application uses Microsoft.Extensions.Logging for comprehensive logging. Check the console output for detailed error messages and debugging information.
- Author: PatrickChoDev
- Email: [email protected]
- GitHub: https://github.com/PatrickChoDev
- E5 Multilingual model for text embeddings
- Microsoft ONNX Runtime team
- .NET community