A production-ready, microservices RAG stack for ingesting PDFs/DOCX/Web pages, chunking them hierarchically, vectorizing with bge-small-en-v1.5, storing in Weaviate, tracking parents in MongoDB, and answering questions via a FastAPI query engine + OpenRouter LLM.
- End-to-end pipeline: ingest → hierarchical chunking (parent/child) → embeddings → Weaviate + MongoDB → RAG query
- Strict local embeddings using
BAAI/bge-small-en-v1.5(pull once, then served locally) - Three microservices:
- API Gateway (file uploads, routing)
- ETL Service (load, split, embed, upload)
- Query Engine (retrieve+generate via OpenRouter)
- Typed FastAPI endpoints, robust logging, health checks, and test scripts
- Docker Compose for one-command spin up
- Python 3.10+
- Docker + Docker Compose (recommended)
- Weaviate (Cloud or Self-hosted) & API key
- MongoDB instance (Atlas or local)
- OpenRouter API key (or compatible OpenAI API endpoint)
- Disk space for local HF model cache (
bge-small-en-v1.5)
git clone https://github.com/Treespunking/End_To_End_RAG_System.git
cd End_To_End_RAG_Systempython download_model.py
# caches BAAI/bge-small-en-v1.5 to ./models# External services
WEAVIATE_URL=https://your-weaviate-endpoint
WEAVIATE_API_KEY=your_weaviate_key
MONGODB_URI=mongodb+srv://<user>:<pass>@<cluster>/rag_system
# LLM
OPENROUTER_API_KEY=your_openrouter_key
OPENROUTER_MODEL=mistralai/mistral-7b-instruct:free
# App metadata (optional)
APP_URL=http://localhost
APP_NAME=RAG-Query-Enginedocker compose up -d --buildServices
- API Gateway:
http://localhost:8000 - ETL Service:
http://localhost:8001 - Query Engine:
http://localhost:8002
Health checks are built in; API Gateway depends on the other two being healthy.
Recommended only if you’re comfortable installing system deps (Poppler, Tesseract, etc.).
-
Create three virtualenvs or reuse one (order matters due to deps):
- ETL:
pip install -r requirements-etl.txt - Query:
pip install -r requirements-query.txt - Gateway:
pip install -r requirements-gateway.txt
- ETL:
-
Ensure
typing_extensions>=4.7.0is installed first if needed. -
Export environment variables (see
.envabove). Also set:HF_CACHE_DIR=./models(or absolute path)
-
Run services separately:
# Terminal 1
uvicorn etl_service:app --host 0.0.0.0 --port 8001
# Terminal 2
uvicorn query_engine:app --host 0.0.0.0 --port 8002
# Terminal 3
uvicorn api_gateway:app --host 0.0.0.0 --port 8000curl -F "file=@path/to/file.pdf" http://localhost:8000/ingest/pdfDOCX
curl -F "file=@path/to/file.docx" http://localhost:8000/ingest/docxWeb URL
curl -X POST -H "Content-Type: application/x-www-form-urlencoded" \
-d "url=https://example.com/page" \
http://localhost:8000/ingest/webcurl -X POST -H "Content-Type: application/x-www-form-urlencoded" \
-d "question=What is Retrieval-Augmented Generation?" \
http://localhost:8000/querycurl http://localhost:8000/health
curl http://localhost:8000/status.
├── api_gateway.py # FastAPI gateway (uploads, routing)
├── etl_service.py # ETL: load → split → embed → store
├── query_engine.py # Retriever + LLM generation
├── docker-compose.yml # Three-service orchestration
├── Dockerfile.api
├── Dockerfile.etl
├── Dockerfile.query
├── requirements-gateway.txt
├── requirements-etl.txt
├── requirements-query.txt
├── download_model.py # Pull BAAI/bge-small-en-v1.5 to ./models
├── test_etl.py # Ingestion tester (PDF/DOCX/Web)
├── test_query.py # Query tester
└── uploads/ # Mounted upload dir (created at runtime)
POST /ingest/pdf— multipart upload (file) → starts ETL for PDFPOST /ingest/docx— multipart upload (file) → starts ETL for DOCXPOST /ingest/web— form field (url) → starts ETL for web pagePOST /query— form field (question) → answers via RAGGET /health— gateway health pingGET /status— reports health of ETL & Query Engine
ETL exposes
POST /processandGET /health; Query Engine exposesPOST /askandGET /health.
-
Load: PDFs via
UnstructuredPDFLoader, DOCX viaDocx2txtLoader, and pages viaWebBaseLoader. -
Chunk: Parent chunks with
RecursiveCharacterTextSplitter(token-aware), then child chunks via NLTK sentence splits + KMeans clustering to group semantically related sentences. -
Embed:
HuggingFaceEmbeddingsusing locally cachedbge-small-en-v1.5(384-dim). -
Store:
- Weaviate: child chunks + vectors in collection
ChildChunkwith rich metadata. - MongoDB: parent chunks/materialized parent store for lineage.
- Weaviate: child chunks + vectors in collection
-
Query: Retriever (similarity + score threshold) → RAG prompt → OpenRouter LLM → answer.
python test_etl.py
# or specific:
python test_etl.py pdf ./docs/sample.pdf
python test_etl.py docx ./docs/sample.docx
python test_etl.py web https://example.com/pagepython test_query.py "How does RAG improve answer accuracy?"- Models: keep
HF_CACHE_DIR=/app/modelsin Docker (volume-mount./models:/app/models). - OpenRouter: set
OPENROUTER_API_KEYand (optionally)OPENROUTER_MODEL. - Weaviate: set
WEAVIATE_URLandWEAVIATE_API_KEY. Ensure the cluster is reachable. - MongoDB: set
MONGODB_URI. The ETL creates/uses DBrag_system, collectionparents.
- Embeddings model not found: run
python download_model.pybefore starting the stack. - Weaviate not ready: verify credentials and network; the gateway waits for health but check
docker compose logs query-engine/etl-service. - Tokenization errors: ensure
tiktokeninstalled andtyping_extensions>=4.7.0. - Large files: adjust parent/child chunk sizes or ETL timeouts in code if needed.