Skip to content

Veer-Parikh/WebSec-Advisor

Repository files navigation

image image image

πŸ›‘οΈ OWASP Security Assistant (RAG)

Retrieval-Augmented Generation Chatbot for Web Security

Features β€’ Architecture β€’ Installation β€’ Usage

Python Streamlit Gemini Pinecone

A specialized cybersecurity chatbot that combines Google's Gemini 1.5 Flash model with a Retrieval-Augmented Generation (RAG) engine. Unlike standard AI models, this tool can ingest your specific security documents (PDFs, text files, OWASP guides) into a Pinecone vector database to provide accurate, source-cited answers about vulnerabilities, remediation, and company-specific policies.


πŸš€ Key Features

🧠 RAG Intelligence

  • Custom Knowledge Base: Ingest your own .txt files (policies, cheat sheets) so the bot knows your context.
  • Vector Search: Uses all-MiniLM-L6-v2 embeddings and Pinecone for high-accuracy retrieval.
  • Source Citing: The bot tells you exactly which document it used to answer the question.

πŸ€– Powerful LLM Backend

  • Gemini 1.5 Flash: High-speed, large context window (8k+ tokens) for detailed explanations.
  • Safety Filter Bypass: Configured to allow "Dangerous Content" discussions (essential for explaining attacks like SQLi/XSS).
  • Context Aware: Remembers previous turns in the conversation.

πŸ’» Interactive UI

  • Streamlit Interface: Clean, web-based chat interface.
  • Real-time formatting: Renders code blocks, markdown, and security alerts beautifully.
  • Session Management: Clear history and restart sessions easily.

πŸ›‘οΈ Security Specialized

  • OWASP Knowledge: Pre-prompted with specific instructions for OWASP Top 10 categories.
  • Dynamic Guidance: Automatically adjusts advice based on the vulnerability type (e.g., specific Python code for SQLi).

πŸ—οΈ Architecture

The system moves away from purely local models to a hybrid cloud approach for performance and scalability:

  1. Ingestion: Python script (ingest.py) reads text files -> Converts to Vectors (SentenceTransformers) -> Uploads to Pinecone.
  2. Retrieval: User asks a question -> System finds top 5 relevant chunks from Pinecone.
  3. Generation: Relevant chunks + User Question are sent to Gemini 1.5 Flash to generate a unified, accurate answer.

πŸ“ Project Structure

OWASP_BERT/
β”œβ”€β”€ app.py                  # Main Streamlit Web Application
β”œβ”€β”€ ingest.py               # Script to upload your documents to Pinecone
β”œβ”€β”€ chatbot_modules/        # Core Logic
β”‚   β”œβ”€β”€ chatbot.py          # Orchestrator
β”‚   β”œβ”€β”€ components.py       # Initializes Pinecone/Models
β”‚   β”œβ”€β”€ retrieval.py        # RAG Logic (Search & Rerank)
β”‚   β”œβ”€β”€ llm_service.py      # Google Gemini API Handler
β”‚   β”œβ”€β”€ config.py           # Settings (API Keys, Paths)
β”‚   └── ...
β”œβ”€β”€ data/                   # Folder where you put your .txt files to be ingested
β”œβ”€β”€ Prompt_Templates/       # System prompts for the LLM
└── legacy_code/            # (Optional) Old notebooks/BERT experiments

βš™οΈ Installation

1. Prerequisites

2. Setup

Clone the repository and install dependencies:

# Install required packages
pip install -r requirements.txt
# Ensure Pinecone and Streamlit are installed
pip install pinecone streamlit google-generativeai sentence-transformers python-dotenv

3. Environment Variables

Create a .env file in the root directory:

GEMINI_API_KEY=your_gemini_key_here
PINECONE_API_KEY=your_pinecone_key_here
PINECONE_ENVIRONMENT=us-east-1

🎯 Usage Workflow

Step 1: Add Knowledge (Ingestion)

The bot starts with an empty brain. You must feed it data.

  1. Create a folder named data in the root directory.
  2. Add .txt files containing security info (e.g., OWASP_Top_10.txt, Company_Policy.txt).
  3. Run the ingestion script:
python ingest.py

Output: "βœ… Ingestion Complete! Your bot now has knowledge."

Step 2: Run the Chatbot

Launch the web interface:

python -m streamlit run app.py

The app will open in your browser at http://localhost:8501.


πŸ”§ Configuration (config.py)

You can tweak the bot's behavior in chatbot_modules/config.py:

CONFIG = {
    "USE_PINECONE": True,           # Enable RAG
    "USE_LOCAL_LLM": False,         # Keep False (We are using Gemini)
    "EMBEDDING_MODEL_PATH": "all-MiniLM-L6-v2",
    "MAX_CONTEXT_TOKENS": 2500,     # How much text to read from documents
    "TEMPERATURE": 0.7,             # Creativity (0.0 = Strict, 1.0 = Creative)
}

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Open a Pull Request

About

A security-focused RAG chatbot that uses Gemini 2.5 Flash and Pinecone to provide accurate, context-grounded answers on OWASP Top 10 vulnerabilities using custom security documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors