Config-first RAG starter kit that pairs OpenAI File Search, Chatkit, and Evals with a pluggable ingestion, retrieval, and evaluation toolkit.
Launch a working knowledge retrieval app in a few commands:
# 1. Backend environment
python3 -m venv .venv
source .venv/bin/activate
make dev # or: make install
# 2. Credentials
cp .env.example .env
echo "OPENAI_API_KEY=sk-..." >> .env
# 3. Update configuration
# Edit the configuration in configs/default.openai.yaml to point to your documents.
# 4. Ingest corpus
make ingest
# 5. Run backend + frontend (in separate terminals)
make run-app-backend
# 6. Run frontend (in another terminal)
make run-app-frontend
# 7. Open <http://localhost:5172> to chat with the assistant. All answers are grounded with citations that link back to the ingested documents.
# 8. Evals: Generate synthetic data for evaluation and upload to OpenAI Evals
make eval- YAML-first configuration – define ingestion, retrieval, and synthesis behavior without editing code.
- Multiple vector stores – ship with OpenAI File Search or go custom with a local Qdrant DB.
- Typed pipelines – ingestion, retrieval, and synthesis flows built with Pydantic models and Typer/FastAPI services.
- ChatKit UI – Leverage Chatkit components for interacting with the knowledge base.
- Evaluation harness – Generate synthetic data and run evals with OpenAI Evals or local grading.
cli/– Typer CLI (rag) entrypoint and config parsing.ingestion/– loaders, chunkers, preprocessors, and orchestrated pipelines.retrieval/– query expansion, filtering, reranking, and response assembly.stores/– adapters for OpenAI File Search and custom vector stores (Qdrant example included).app/backend/– FastAPI application that powers the chat API.app/frontend/– Vite + React UI using Tailwind CSS and ChatKit components.configs/– starter default YAMLs you can copy and customize.evals/– evaluation harness, dataset generation, and reporters.templates/– additional templates for different vector stores, chunking strategies, and retrieval methods.prompts/– prompts for the different aspects of the knowledge retrieval workflow (query, expansion, reranking, local judging, etc.)
To do a more custom implementation beyond the quickstart, you can follow the steps below.
- Python 3.10 or later
- Node.js 18.18 or later (for the optional web UI)
- An OpenAI API key (
OPENAI_API_KEY) with access to File Search and Evals
python3 -m venv .venv
source .venv/bin/activate
make dev # or: make installCopy .env.example to .env and fill in the values that apply to your environment. At minimum set
OPENAI_API_KEY. Optional variables:
OPENAI_ORG_ID=...
OPENAI_PROJECT_ID=...
VECTOR_STORE_ID=vs_...Use the CLI to copy a YAML config template into configs/ (replace my with a project name):
rag init --template openai > configs/my.openai.yamlTemplates include openai and custom-qdrant, along with different options for chunking data. Adjust data.paths to point at the documents you
want to ingest. Example datasets live in data/example_data/.
Add your config to the RAG_CONFIG environment variable.
make qdrant-upThis will start a local Qdrant instance on port 6333.
rag ingest --config configs/my.openai.yamlThe CLI will create or reuse a vector store, upload documents, and print the resulting
vector_store_id. Persist that value in your .env.
Start the FastAPI backend (reads RAG_CONFIG, defaulting to configs/default.openai.yaml):
make run-app-backendStart the web UI in another terminal:
make run-app-frontendThe UI runs at http://localhost:5172 by default and communicates with the backend at http://localhost:8000.
rag eval --config configs/my.openai.yamlThe harness can either use an existing dataset (evals.mode: user) or synthesize one (evals.mode: auto)
When evals.openai_evals.enabled: true, results are mirrored to the OpenAI Evals dashboard.
Every aspect of the pipeline is driven by YAML. Use rag init --template <name> to scaffold a
config, then customize the sections described below.
openai– Default template backed by OpenAI File Search. Works out of the box with the supplied example dataset.custom-qdrant– Connects to a locally hosted Qdrant instance. Pair withmake qdrant-upto start Docker services automatically.
Add --chunking <strategy> when scaffolding to pick a predefined chunking approach (see next
section).
Chunking is controlled under chunking in the config:
recursive– Recursively backs off from headings to paragraphs, sentences, then tokens to hit the target window.heading– Uses heading hierarchy (#, ALL CAPS, numbered headings) to keep coherent sections.hybrid– Falls back to recursive when headings are missing (default).xml_aware– Purpose-built for XML manuals; splits on semantic tags and converts tables to Markdown.custom– Load a custom chunker viacustom_chunker.{module_path,class_name,init_args}.
Common parameters:
target_token_range: [min, max]– Token window to aim for before chunking stops.overlap_tokens– Overlap between adjacent chunks to maintain context.rules– Strategy-specific options (heading detection hints, XML tag boundaries, etc.).
Select your store with vector_store.backend:
-
openai_file_search– Managed vector store that handles uploads, dedupe, and metadata. Configure:vector_store_namevector_store_id(reuse an existing store instead of creating a new one)expiry_days,chunking.max_chunk_size_tokens,chunking.chunk_overlap_tokens
-
custom– Bring your own implementation. The included Qdrant adapter is enabled with:vector_store: backend: custom custom: kind: qdrant qdrant: url: http://localhost:6333 collection: my_collection distance: cosine ef: 64 m: 32
You can reference your own module by setting custom.kind: plugin and providing
plugin.{module_path,class_name,init_args}. This will allow you to supply any vector db of your choice.
Four optional stages live under query:
query.expansion.enabled– Uses an LLM to generate alternate phrasings. Configuremodel,prompt_path,variants, andstyle.query.hyde.enabled– Adds HyDE (Hypothetical Document Embeddings); an LLM writes a synthetic passage that is embedded alongside the query.query.similarity_filter.enabled– Drop results below a similaritythresholdbefore reranking.query.rerank.enabled– Apply an LLM cross-encoder reranker; configuremodel,prompt_path,max_candidates, andscore_threshold.
Prompts live under prompts/expansion/, prompts/hyde/, and prompts/rerank/. Edit those files to
change the guidance without touching code.
Under synthesis you can set:
model– Response model (e.g.,gpt-4o-mini,o4-mini-high, etc.).system_prompt– Inline string or a path to a prompt file (default lives atprompts/system/assistant.md).structured_outputs– Enable schema-based responses.reasoning_effort– Hint the model on effort (low,medium,high).
Use the env block to specify default values for credentials such as OPENAI_API_KEY,
OPENAI_PROJECT_ID, VECTOR_STORE_ID, etc. Actual environment variables override these when set.
The UI is built with Vite, React 19, Tailwind CSS, and @openai/chatkit-react. Helpful commands:
npm run dev # start dev server with hot reload
npm run build # create production bundle
npm run preview # preview the production build locally
npm run lint # eslint + typescript checksThe evaluation harness supports both local grading and hosted OpenAI Evals.
Set evals.mode: auto to synthesize evaluation data from your corpus. The harness:
- Samples ingested chunks.
- Asks an LLM to create realistic user questions.
- Stores the dataset under
evals/datasets/(JSONL format).
Re-run the command to expand coverage or refresh stale questions.
Switch to evals.mode: user and provide evals.dataset_path. Each JSONL record should include:
id– unique identifier.question– the user query.citation_text– supporting passage used for groundedness.correct_answer: ideal answer used for EM/F1 scoring (required)- Metadata (e.g.,
source_id,page,char_start,char_end,difficulty,tags).
See evals/datasets/schema.py for the full schema; the CLI validates these fields before each run and canonicalizes answers internally for consistent scoring.
Keep evals.openai_evals.enabled: true to sync local runs to the hosted platform. Configure:
graders– choose groundedness, relevance, or custom rubrics.run_name– friendly label for dashboards.organization_id/project_id– override defaults when needed.
Results are printed locally and reports are written to evals/reports/ (Markdown, HTML, and any OpenAI Evals link stubs).
We welcome community contributions! Please read CONTRIBUTING.md for guidance on
how to file issues, propose changes, and follow our coding standards.
If you believe you have discovered a vulnerability, please refer to SECURITY.md for
responsible disclosure instructions. Do not open a public issue for security reports.
This project is licensed under the MIT License.
- FastAPI for the backend framework.
- ChatKit components for the chat UI.
- Lucide icons (MIT licensed).
- Tailwind CSS utilities for styling.