Repair Advisor RAG System

A Retrieval-Augmented Generation (RAG) system for providing intelligent repair advice and assessments based on a large database of repair cases. The system uses OpenAI's GPT models for generation and Pinecone for efficient vector similarity search.

Overview

The system combines:

OpenAI GPT Models for generating repair advice
Pinecone Vector Database for efficient similarity search
OpenRepair Dataset as the knowledge source
Modern Web Interface for easy interaction

Quick Start

Clone & Install

git clone <repository-url>
cd repair_rag
pip install -r requirements.txt

Configure Environment Create a .env file in the repair_rag directory:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini-2024-07-18
EMBEDDING_MODEL=text-embedding-3-small

# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=gcp-starter
PINECONE_INDEX=repair-kb
PINECONE_NAMESPACE=repair-cases

Start the API Server
```
cd repair_rag
python run.py api
```
Access the Web Interface
- Open index.html in your browser
- Start asking repair-related questions!

Project Structure

repair_rag/
├── app/                    # Core application code
│   ├── main.py            # FastAPI server
│   ├── rag_engine.py      # RAG implementation
│   └── __init__.py        # Package marker
├── data/                   # Data storage
├── index.html             # Main web interface
├── results.html           # Results display
├── styles.css             # UI styling
├── run.py                 # Application launcher
├── ingest_data_to_pinecone.py  # Data ingestion
├── parts_url_generator.py      # Parts URL utilities
└── requirements.txt       # Python dependencies

Features

Intelligent Repair Analysis: Combines vector similarity search with GPT-powered analysis
Comprehensive Advice:
- Repair feasibility assessment
- Cost-benefit analysis
- Required tools and parts
- Step-by-step guidance
Multi-language Support: Automatically detects and responds in the user's language
Similar Case Analysis: Shows relevant past repair cases
Parts & Tools Links: Direct links to purchase necessary items

Adding Your Own Data

To add new repair cases to the existing database:

Prepare Your Data Create a JSON file with the following structure:

[
  {
    "brand": "Device brand",
    "product_category": "Category of the device",
    "problem": "Description of the problem",
    "repair_status": "Fixed/Not Fixed/etc.",
    "repair_barrier": "Any barriers to repair",
    "year_of_manufacture": "Year",
    "country": "Country",
    "tools_required": ["Tool1", "Tool2"],
    "parts_needed": ["Part1", "Part2"],
    "estimated_cost": "Cost estimate",
    "repair_description": "Detailed repair description"
  },
  // ... more cases
]

Ingest to Pinecone
```
python ingest_data_to_pinecone.py --data-path your_data.json
```
Options:
- --batch-size: Number of vectors to batch (default: 50)
- --embedding-model: OpenAI embedding model to use
- --index-name: Pinecone index name
- --namespace: Namespace within the index

Troubleshooting

Common Issues

API Key Errors
- Ensure both OpenAI and Pinecone API keys are in .env
- Check API key permissions and quotas
Connection Issues
- Verify the API server is running (python run.py api)
- Check the port (8000) isn't in use
No Results
- Ensure Pinecone index contains data
- Check query formatting and language

Rate Limits & Optimization

OpenAI has rate limits for both embeddings and completions
Pinecone has query per second (QPS) limits
Use batch processing for large data ingestion
Consider implementing caching for frequent queries

Technical Details

Vector Database

Uses Pinecone for vector similarity search
Dimension: 1536 (OpenAI text-embedding-3-small)
Metric: Cosine similarity
Namespace: Organizes different data sources

Language Models

Embeddings: text-embedding-3-small
Generation: gpt-4o-mini-2024-07-18
Temperature: 0.1 (for factual responses)

Web Interface

Pure HTML/CSS/JavaScript
No framework dependencies
Responsive design
Cross-browser compatible

Pricing Considerations

OpenAI Costs:
- Embeddings: $0.02 per million tokens
- GPT-4 Completions: Varies by model and usage
Pinecone Costs:
- Free tier available
- Pricing based on vector count and QPS

License & Attribution

Built on OpenRepair Dataset
Uses OpenAI's API
Pinecone Vector Database
[Add your license information]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
repair_rag		repair_rag
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eda.py		eda.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repair Advisor RAG System

Overview

Quick Start

Project Structure

Features

Adding Your Own Data

Troubleshooting

Common Issues

Rate Limits & Optimization

Technical Details

Vector Database

Language Models

Web Interface

Pricing Considerations

License & Attribution

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

EPFL-AI-Team/REPAIR

Folders and files

Latest commit

History

Repository files navigation

Repair Advisor RAG System

Overview

Quick Start

Project Structure

Features

Adding Your Own Data

Troubleshooting

Common Issues

Rate Limits & Optimization

Technical Details

Vector Database

Language Models

Web Interface

Pricing Considerations

License & Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages