Skip to content

๐Ÿค–๐ŸŽฅ Chat with your videos using Ollama models

License

Notifications You must be signed in to change notification settings

T0mLam/chrono-chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

33 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ChronoChat

ChronoChat is a video-RAG platform built on top of Ollama that enables users to chat with video content using non vision/video-language models (VLMs). It supports both YouTube and local uploads and uses retrieval-augmented generation (RAG) to answer questions using video transcripts, frames, and captions. Powered by local LLMs, ChronoChat streams real-time responses with additional support for images and PDF uploads.

demo.mp4

Note

ChronoChat is ideal for:

  • โœ… Interviews, tutorials, and educational content
  • โŒ Not suited for animations or silent videos

โš ๏ธ Requires a GPU for optimal performance

๐Ÿ Getting Started

1. ๐Ÿ“ฆ Set Up Python Environment

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

2. ๐Ÿ”จ Install Dependencies for ChronoChat

python cli.py install

3. โš™๏ธ Install PyTorch with CUDA (Recommended)

For GPU acceleration, install the CUDA-enabled version of PyTorch:
Visit https://pytorch.org/get-started/locally/ to get the correct command for your system.

๐Ÿ’ก If you donโ€™t have an NVIDIA GPU or donโ€™t want CUDA, skip this step

4. ๐ŸŽž๏ธ Install FFmpeg

ChronoChat requires ffmpeg for processing video and audio.
Download from: https://ffmpeg.org/download.html

5. ๐Ÿค– Install Ollama

If you havenโ€™t already, install Ollama

6. ๐Ÿ–ฅ๏ธ Start the Ollama Server

ollama serve

7. ๐Ÿš€ Launch ChronoChat

python cli.py start

Then open your browser at: http://localhost:3000

โœจ Key Features

  • ๐Ÿ” Video RAG: Uses CLIP, Whisper, and BLIP embeddings for frame, audio, and caption-based retrieval.
  • ๐Ÿง  LLM Planning: Models generate reasoning chains, plan actions, and adapt to single or multi-video chats.
  • ๐Ÿ”Œ Streaming Responses: Live WebSocket chat with markdown rendering and response progress updates.
  • ๐ŸŽฅ Multi-Video Support: Search and reason across multiple videos in a single conversation.
  • ๐Ÿ“Ž Attach Files: Supports uploading PDFs and images.

๐Ÿงฑ Architecture

---
config:
  look: handDrawn
  theme: neutral
---

graph TD
  subgraph "Frontend (Next.js)"
    Sidebar["๐Ÿ“‚ Chats & Videos"]
    UploadUI["๐Ÿ“ฆ Upload videos"]
    ChatUI["๐Ÿ’ฌ Chat interface"]
    APIClient["๐Ÿ”— REST client"]
    WSClient["๐Ÿ”„ WebSocket client"]
  end

  subgraph "Backend (FastAPI & Async Worker)"
    ChatRouter["๐Ÿ—จ๏ธ Chat router"]
    MediaRouter["๐ŸŽฌ Media router"]
    VideoRAG["๐Ÿง  VideoRAG engine"]
    ContextExtractor["๐Ÿ” Context extractor"]
    Retriever["๐Ÿ“ฆ ChromaDB retriever"]
    LLMClient["๐Ÿค– LLM client"]
    Worker["โš™๏ธ Ingestion worker"]
    MediaDB["๐Ÿ—„๏ธ ChromaDB"]
    MediaStorage["๐Ÿ“ Video and metadata storage"]
    VideoQueue["๐Ÿ“ฎ Processing queue"]
  end

  Sidebar --> ChatUI
  UploadUI --> APIClient
  ChatUI -- "File upload" --> APIClient
  ChatUI <--"Text query" --> WSClient

  APIClient <--> MediaRouter
  WSClient <--> ChatRouter
  ChatRouter --> VideoRAG
  VideoRAG <-- "Video query" --> ContextExtractor
  VideoRAG <-- "Other query" --> LLMClient
  ContextExtractor <--> Retriever
  ContextExtractor <--> LLMClient
  Retriever <--> MediaDB

  MediaRouter --> MediaStorage
  MediaRouter --> VideoQueue
  VideoQueue --> Worker
  Worker --> MediaDB
Loading

โš™๏ธ Tech Stack

Layer Tools
Frontend Next.js, TailwindCSS, Shadcn, TypeScript
Backend FastAPI, AsyncIO, SQLite, ChromaDB
Embeddings CLIP (frames), Whisper (audio), BLIP
LLM Ollama
Storage Local files, ChromaDB vectors, SQLite

๐Ÿง  How It Works

  1. Ingest Video: Extracts audio, frames, and captions from YouTube/local videos.
  2. Embed Content: Computes multimodal embeddings and stores them in ChromaDB.
  3. Chat Interaction: LLM receives the user query and selects a retrieval mode.
  4. RAG Flow: Relevant chunks are retrieved based on video context.
  5. Response Streaming: Final output is streamed to the user in real time.

About

๐Ÿค–๐ŸŽฅ Chat with your videos using Ollama models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published