To set up the environment automatically, run:
bash scripts/setup_environment.shThis will create a conda environment named sembench with Python 3.12 and all required dependencies.
Custom environment name:
bash scripts/setup_environment.sh my_custom_env_nameIf you prefer to set up manually, follow these steps:
conda create -n sembench python=3.12 -yconda install -n sembench -c conda-forge -y \
numpy pandas scikit-learn scipy matplotlib seaborn pillowconda install -n sembench -c pytorch -c nvidia \
pytorch torchvision torchaudio pytorch-cuda=12.4 -yconda activate sembench
pip install -r requirements.txtEnvironment Name: sembench
Python Version: 3.12.12
Installation Strategy: Conda for base packages + Pip for specialty packages
- Conda packages provide optimized builds for scientific computing (NumPy, SciPy, PyTorch)
- CUDA support is handled automatically through conda's PyTorch installation
- System-specific packages (lotus, palimpzest, thalamusdb) are only available on PyPI
| Package | Version | Purpose |
|---|---|---|
| Python | 3.12.12 | Base interpreter |
| NumPy | 2.3.4 | Numerical computing |
| Pandas | 2.3.3 | Data manipulation |
| PyTorch | 2.6.0 | Deep learning |
| Scikit-learn | 1.7.2 | Machine learning |
| SciPy | 1.16.3 | Scientific computing |
| Matplotlib | 3.10.7 | Plotting |
| Seaborn | 0.13.2 | Statistical visualization |
Plus: CUDA 12.4 libraries (libcublas, libcufft, libcurand, etc.)
| System | Package | Version |
|---|---|---|
| Lotus | lotus-ai | 1.1.3 |
| Palimpzest | palimpzest | 0.8.2 |
| ThalamusDB | thalamusdb | 0.1.15 |
| BigQuery | google-cloud-bigquery | 3.38.0 |
google-cloud-bigquery3.38.0google-cloud-storage3.5.0google-cloud-bigquery-storage2.34.0db-dtypes1.4.3
transformers4.57.1sentence-transformers3.4.1openai2.7.1litellm1.75.9anthropic0.72.0chromadb1.3.4faiss-cpu1.12.0
smolagents1.22.0 (for palimpzest)cdlib0.4.0 (for evaluation)gradio5.49.1 (for UI)datasets4.4.1 (for data loading)evaluate0.4.6 (for metrics)opencv-python4.12.0.88 (for image processing)- And 100+ other dependencies...
conda activate sembench# Test Lotus
python3 src/run.py --systems lotus --use-cases movie --queries 1 --model gemini-2.5-flash --scale-factor 2000
# Test Palimpzest
python3 src/run.py --systems palimpzest --use-cases movie --queries 1 --model gemini-2.5-flash --scale-factor 2000
# Test ThalamusDB
python3 src/run.py --systems thalamusdb --use-cases movie --queries 1 --model gemini-2.5-flash --scale-factor 2000
# Test BigQuery
python3 src/run.py --systems bigquery --use-cases movie --queries 1 --model gemini-2.5-flash --scale-factor 2000conda deactivateThe following packages were discovered to be required but were not initially in requirements.txt:
google-cloud-storage- Required for BigQuery storage operationsgoogle-cloud-bigquery-storage- Optimizes BigQuery data fetchingdb-dtypes- Required for BigQuery data type handlingcdlib- Required for evaluation metricssmolagents- Required by palimpzestprettytable- Palimpzest dependencypsutil- Palimpzest dependencyPyLD- Palimpzest dependencytabulate- Palimpzest dependencytogether- Palimpzest dependency
All of these have been added to the final requirements.txt.