This official repository of the paper "GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search", allowing reproduction of experiments and the presented ablation studies presented. The repository contains code for attacking retrieval models via crafting of passages (with GASLITE method) to poison the used retrieval corpus, and evaluating these attacks.
ℹ️ Currently the repo is meant for reproduction of the attack and evaluation in the original work. We intend to continue to develop this attack and to make a useful framework for retrieval attacks in this repo.
For a quick demonstration of the attack, see the demo notebook, or run it in Colab, showcasing the attack on concept-specific queries with a single adversarial passage.
The project requires Python 3.8.5 and on and the installation of pip install -r requirements.txt (preferably in an isolated env: conda create -n gaslite-env python=3.8.5).
- For recording metrics to
wandb, you may need to login to wandb. - When cloning the project make sure to load included submodules, e.g. by:
git clone --recurse-submodules. For details on dependencies see Dependencies section.
Run the attack script (on of the following) to craft adversarial passage(s). Results will be saved to a JSON file in ./results/.
attack0_know-all.sh: attack a single query with GASLITE.attack1_know-what.sh: attack a specific concept (e.g., all the queries related to Harry Potter queries) with GASLITE.attack2_know-nothing.sh: attack a whole dataset (e.g., MSMARCO's eval set) with GASLITE.
For further modifying the attack parameters, refer to the configuration files in (./config/) and use Hydra's CLI override syntax.
Evaluation with covering.py's API: This module can be used to evaluate retrieval corpus poisoning attacks (such as GASLITE). In particular, the method evaluate_retrieval(..), which given with a set of adversarial passages, evaluates the poisoning attack on a retrieval model with common measures (including visibility of these passages in the top-10 retrieved passages) w.r.t. the targeted queries.
- Cache Retrieval Results with
cache-retrieval.sh. This script caches the similarities of a model on a dataset (to a json,./data/cached_evals/). It is a prerequisite for the following steps, to not repeat on this heavy logic per attack evaluation. For models evaluated in the paper this caching was uploaded to HuggingFace, so it is not required. Script can be used with any BEIR-supported dataset and SentenceTransformer embedding model. - Cache Query Partitions with
cache-q-partitions.sh. This script caches the partition of queries, using a specified method that defaults to k-means. Resulted parition is saved into a JSON,./data/cached_clustering/. These partitions can be used for exploration, to simulate the perfect attack, or to run a multi-budget attack.
This project utilizes the following projects:
- retrieval-datasets-similarities: cached similarities of various retrieval datasets and models, downloaded and used within the project for evaluating the attack (to avoid recalculating these per attack run).
- GPT2 on BERT's tokenizer: a nanoGPT fork and weight that uses BERT's tokenizer (instead of GPT2's); used for crafting fluent adversarial passages.
Some code snippets are loosely inspired by the following codebases:
- Corpus Poisoning Attack for Dense Retrievers
- BEIR Benchmark
- AutoPrompt Attack (Automatic Prompt Construction for Masked Language Models)
- GCG Attack (Universal and Transferable Attacks on Aligned Language Models)
- ARCA Attack (Automatically Auditing Large Language Models via Discrete Optimization)
If you find this work useful, please cite our paper as follows:
@article{bentov2024gasliteingretrievalexploringvulnerabilities,
title={{GASLITE}ing the {R}etrieval: {E}xploring {V}ulnerabilities in {D}ense {E}mbedding-based {S}earch},
author={Matan Ben-Tov and Mahmood Sharif},
year={2024},
eprint={2412.20953},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2412.20953},
}
