Skip to content

Anassatti/Jira-Knowledge-Base-Chatbot-EN-AR-RAG-Citations-Permissions-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Jira Knowledge Base Chatbot (EN/AR, RAG, Citations, Permissions)

A private, bilingual (English + Arabic) knowledge base chatbot that answers employee questions using your Jira tickets as the single source of truth. It pulls and cleans issues from Jira (or a CSV/XLSX export), indexes them for hybrid retrieval (keyword + semantic embeddings), and returns grounded answers with ticket citations—respecting user permissions. Built to be simple to deploy and easy to trust.

*Highlights Ask in natural language (EN/AR): “Resume CR”, “What are the fees?”, “ما هي الرسوم؟”

Grounded answers with sources: Each reply cites the exact Jira keys (e.g., SW-9491) and links to Jira.

Hybrid search: BM25 keyword + semantic embeddings + (optional) re-ranker.

Acronyms & typos: Handles CR → change request, pluralization, minor misspellings.

Attachments (OCR): Optional OCR of PDFs/images so content is discoverable.

Permissions-aware: Enforces Jira-based visibility (no leaks).

Bilingual UI + RTL: Arabic interface with right-side menu; English with left-side menu.

Two ingestion modes:

CSV/XLSX import (fastest to start)

Direct Jira Sync backend (secure token; webhooks for freshness)

Why not just use Jira search? Chat answers vs. issue lists: The bot summarizes and cites; Jira lists require manual reading.

Smarter matching: Semantic understanding, synonyms, acronyms, typos, EN/AR.

Conversational refinement: “only resolved”, “filter to Vijandra”, “last 30 days”.

Broader coverage: Titles, descriptions, comments, and (optionally) attachment text. Architecture

image

Ingestion: Pull issues, descriptions, comments, labels, custom fields, (optional) attachments→OCR.

Indexing: Normalize Jira wiki markup, chunk text, create embeddings, hybrid searchable corpus.

Answering: Retrieve top-k, fuse scores, enforce grounding & citations, apply permissions.

UI: Next.js/React (or Base44), Arabic RTL with right-anchored menu; English LTR with left menu.

Data schema (CSV/XLSX import)

Required columns

jira_key (e.g., SW-9491)

project

summary

Optional

description, status, priority, assignee, labels

Cleaning rules

Remove Jira wiki image tags: ![...]!

Convert [text|url] → text (url)

Flatten newlines; trim whitespace.

Tip: Invalid keys (not matching ^[A-Z][A-Z0-9]+-\d+$) are skipped and logged.

🔐 Direct Jira Sync backend (optional)

For secure, automated syncing from Jira (pagination, retries, cleaning) use the included FastAPI backend:

Endpoints

POST /sync/start { jql?: string } → starts sync job

GET /sync/status?job_id=... → job state, rows, csv_url

GET /export/csv → latest export

POST /webhook/jira → accept Jira webhooks (for deltas)

Env

JIRA_BASE = https://your-domain.atlassian.net JIRA_EMAIL = [email protected] JIRA_TOKEN = <api_token> # keep secret; never expose to frontend EXPORT_CSV_PATH = ./data/export.csv EXPORT_CSV_FIELDS = jira_key,project,summary,description,status,priority,assignee,labels

Security: protect the backend (VPN/IP allowlist/API key), store tokens in secret manager, consider OAuth 3LO.

🌍 Internationalization & RTL

Language switch (EN/AR) persists (cookie/localStorage).

Sets for Arabic: side menu on the right; flips directional icons.

Logical CSS utilities (text-start/end, ps/pe, ltr:/rtl:) to avoid left/right bugs.

Numbers/dates localized via Intl (ar-EG/en-US).

Bot replies in the current UI language; citations (ticket keys) remain unchanged.

🧠 Retrieval & Answering

Hybrid search:

BM25 over summary, labels, description (field boosts, phrase queries).

Embeddings (OpenAI text-embedding-3-large or your choice).

Fused score: e.g., 0.6bm25 + 0.4cosine.

Query rewrite:

Acronyms & synonyms (e.g., “resume” → resume|reactivate|restore|reopen).

Stemming/pluralization (fee↔fees) and Arabic equivalents (e.g., رسوم ↔ fees).

Guardrails:

“Answer from snippets only; always cite keys”.

Confidence gating → if low, show “Closest matches” list (not a fake answer).

Per-user filtering via Jira permissions mapping / RLS.

🚀 Quick Start A) CSV/XLSX mode (fastest)

Prepare an export with the schema above.

Start the app (Base44/Next.js) and upload the file.

Ask: “Resume CR” → you should see top tickets with citations.

B) Direct Jira mode (secure & fresh)

Run the FastAPI backend (see /backend or provided zip).

From the UI, open Sync from Jira → enter JQL (e.g., ORDER BY updated DESC).

Poll progress; when ready, the UI imports the generated CSV automatically.

(Optional) Configure Jira webhook to /webhook/jira for near-real-time updates.

🧪 Pilot & Quality

Suggested pilot (2 weeks)

Users: Testing team.

Must-pass queries: “Resume CR”, “fees”, “UBO”, Arabic variants, typos.

Metrics (targets):

Top-3 hit rate ≥ 85%

“No info” rate ≤ 10%

P95 latency ≤ 2.0s

100% answers include ≥1 citation

0 permission leaks

RAG eval set Add verify/eval/qrels.jsonl with ~40 Q→expected keys and a script to measure Top-k accuracy.

🛠️ Local Development

UI (example Next.js)

pnpm install pnpm dev

Backend (FastAPI)

pip install -r backend/requirements.txt export JIRA_BASE=... export JIRA_EMAIL=... export JIRA_TOKEN=... uvicorn app.main:app --host 0.0.0.0 --port 8080

Config

BACKEND_URL = http://localhost:8080 EMBEDDING_MODEL = text-embedding-3-large TOP_K = 8

🧰 Troubleshooting

“Invalid Jira key format in row X” Row is missing/invalid jira_key or a multiline description broke parsing. Fix: clean with regex ![^!]*! (remove image tags), flatten newlines, ensure proper quoting.

“No information found” but ticket exists Add synonyms/acronyms, enable phrase queries, increase field boosts for summary, check language detection.

Arabic layout wrong Ensure is set and use logical CSS (ltr:/rtl: variants, text-start/end).

Permission leaks Verify per-user/project filters / RLS policies are applied before retrieval.

🗺️ Roadmap

Optional direct-to-vector mode (skip CSV; write to pgvector).

Re-ranker (cross-encoder) for tighter Top-1.

Admin synonym editor (EN/AR).

Per-project dashboards (usage, accuracy, gaps).

OAuth 3LO for delegated Jira access.

🤝 Contributing

Fork & branch from main.

Use clear commit messages.

Add tests for retrieval and Arabic/RTL.

Open a PR with a brief description and screenshots (if UI).

📄 License

MIT (or your preferred license)

📷 Screenshots (placeholders)

Chat (EN) – LTR with left menu

Chat (AR) – RTL with right menu

Sync from Jira modal

“Why shown” panel with field matches & scores

About

A private, bilingual (English + Arabic) knowledge base chatbot that answers employee questions using your Jira tickets

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published