🛡️ OWASP Security Assistant (RAG)

🛡️ OWASP Security Assistant (RAG)

Retrieval-Augmented Generation Chatbot for Web Security

Features • Architecture • Installation • Usage

A specialized cybersecurity chatbot that combines Google's Gemini 1.5 Flash model with a Retrieval-Augmented Generation (RAG) engine. Unlike standard AI models, this tool can ingest your specific security documents (PDFs, text files, OWASP guides) into a Pinecone vector database to provide accurate, source-cited answers about vulnerabilities, remediation, and company-specific policies.

🚀 Key Features

🧠 RAG Intelligence Custom Knowledge Base: Ingest your own .txt files (policies, cheat sheets) so the bot knows your context. Vector Search: Uses `all-MiniLM-L6-v2` embeddings and Pinecone for high-accuracy retrieval. Source Citing: The bot tells you exactly which document it used to answer the question.	🤖 Powerful LLM Backend Gemini 1.5 Flash: High-speed, large context window (8k+ tokens) for detailed explanations. Safety Filter Bypass: Configured to allow "Dangerous Content" discussions (essential for explaining attacks like SQLi/XSS). Context Aware: Remembers previous turns in the conversation.
💻 Interactive UI Streamlit Interface: Clean, web-based chat interface. Real-time formatting: Renders code blocks, markdown, and security alerts beautifully. Session Management: Clear history and restart sessions easily.	🛡️ Security Specialized OWASP Knowledge: Pre-prompted with specific instructions for OWASP Top 10 categories. Dynamic Guidance: Automatically adjusts advice based on the vulnerability type (e.g., specific Python code for SQLi).

🏗️ Architecture

The system moves away from purely local models to a hybrid cloud approach for performance and scalability:

Ingestion: Python script (ingest.py) reads text files -> Converts to Vectors (SentenceTransformers) -> Uploads to Pinecone.
Retrieval: User asks a question -> System finds top 5 relevant chunks from Pinecone.
Generation: Relevant chunks + User Question are sent to Gemini 1.5 Flash to generate a unified, accurate answer.

📁 Project Structure

OWASP_BERT/
├── app.py                  # Main Streamlit Web Application
├── ingest.py               # Script to upload your documents to Pinecone
├── chatbot_modules/        # Core Logic
│   ├── chatbot.py          # Orchestrator
│   ├── components.py       # Initializes Pinecone/Models
│   ├── retrieval.py        # RAG Logic (Search & Rerank)
│   ├── llm_service.py      # Google Gemini API Handler
│   ├── config.py           # Settings (API Keys, Paths)
│   └── ...
├── data/                   # Folder where you put your .txt files to be ingested
├── Prompt_Templates/       # System prompts for the LLM
└── legacy_code/            # (Optional) Old notebooks/BERT experiments

⚙️ Installation

1. Prerequisites

Python 3.9+
A Google Gemini API Key (Free tier available at aistudio.google.com)
A Pinecone API Key (Free tier available at pinecone.io)

2. Setup

Clone the repository and install dependencies:

# Install required packages
pip install -r requirements.txt
# Ensure Pinecone and Streamlit are installed
pip install pinecone streamlit google-generativeai sentence-transformers python-dotenv

3. Environment Variables

Create a .env file in the root directory:

GEMINI_API_KEY=your_gemini_key_here
PINECONE_API_KEY=your_pinecone_key_here
PINECONE_ENVIRONMENT=us-east-1

🎯 Usage Workflow

Step 1: Add Knowledge (Ingestion)

The bot starts with an empty brain. You must feed it data.

Create a folder named data in the root directory.
Add .txt files containing security info (e.g., OWASP_Top_10.txt, Company_Policy.txt).
Run the ingestion script:

python ingest.py

Output: "✅ Ingestion Complete! Your bot now has knowledge."

Step 2: Run the Chatbot

Launch the web interface:

python -m streamlit run app.py

The app will open in your browser at http://localhost:8501.

🔧 Configuration (`config.py`)

You can tweak the bot's behavior in chatbot_modules/config.py:

CONFIG = {
    "USE_PINECONE": True,           # Enable RAG
    "USE_LOCAL_LLM": False,         # Keep False (We are using Gemini)
    "EMBEDDING_MODEL_PATH": "all-MiniLM-L6-v2",
    "MAX_CONTEXT_TOKENS": 2500,     # How much text to read from documents
    "TEMPERATURE": 0.7,             # Creativity (0.0 = Strict, 1.0 = Creative)
}

🤝 Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Prompt_Templates		Prompt_Templates
QA_Pairs		QA_Pairs
chatbot_modules		chatbot_modules
data		data
legacy_code		legacy_code
.gitignore		.gitignore
README.md		README.md
Test_Questions.txt		Test_Questions.txt
app.py		app.py
ingest.py		ingest.py
requirements.txt		requirements.txt
security_knowledge_graph.gml		security_knowledge_graph.gml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ OWASP Security Assistant (RAG)

Retrieval-Augmented Generation Chatbot for Web Security

🚀 Key Features

🧠 RAG Intelligence

🤖 Powerful LLM Backend

💻 Interactive UI

🛡️ Security Specialized

🏗️ Architecture

📁 Project Structure

⚙️ Installation

1. Prerequisites

2. Setup

3. Environment Variables

🎯 Usage Workflow

Step 1: Add Knowledge (Ingestion)

Step 2: Run the Chatbot

🔧 Configuration (`config.py`)

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ OWASP Security Assistant (RAG)

Retrieval-Augmented Generation Chatbot for Web Security

🚀 Key Features

🧠 RAG Intelligence

🤖 Powerful LLM Backend

💻 Interactive UI

🛡️ Security Specialized

🏗️ Architecture

📁 Project Structure

⚙️ Installation

1. Prerequisites

2. Setup

3. Environment Variables

🎯 Usage Workflow

Step 1: Add Knowledge (Ingestion)

Step 2: Run the Chatbot

🔧 Configuration (config.py)

🤝 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔧 Configuration (`config.py`)

Packages