Action Recognition with Vision-Language Models

This project demonstrates zero-shot and few-shot action recognition on the UCF101 dataset using OpenAI's CLIP model. It includes methods for evaluation, selective fine-tuning, and visualizing model embeddings to compare different approaches.

Features

Zero-Shot Classification: Evaluates a pre-trained CLIP model on action recognition without any prior training on the dataset.
Few-Shot Fine-Tuning: Implements a strategy to fine-tune the model on a small subset of underperforming classes.
Temporal Analysis: Compares the model's performance using single-frame input versus short-clip (temporal) input.
Visualization: Includes methods to visualize the model's embedding space with t-SNE and similarity scores with heatmaps.

Dataset

This project uses the UCF101 – Action Recognition Data Set. You can download it from the official website.

Setup and Installation

1. Create and Activate Virtual Environment

First, create a virtual environment.

python -m venv venv

Activate the environment.

On Windows:

.\venv\Scripts\activate

On macOS / Linux:

source venv/bin/activate

2. Install Dependencies

It's recommended to install PyTorch first to ensure GPU compatibility.

Activate GPU PyTorch (Windows/Linux, CUDA 12.1+):

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Note: The CUDA version (cu121) may need to be adjusted based on your system's NVIDIA driver.

Install remaining packages:

pip install pandas matplotlib seaborn scikit-learn opencv-python tqdm
pip install git+https://github.com/openai/CLIP.git

or you can just run

pip install -r requirements.txt

How to Run

Download the Dataset: Download the UCF101 dataset and the train/test split files and place them in the project directory.
Configure Paths: Open the Jupyter Notebook (.ipynb) and update the UCF101_VID_PATH and UCF101_SPLIT_PATH variables in the configuration cell to point to your dataset locations.
Execute Notebook: Run the notebook cells sequentially to perform the analysis, from data loading and zero-shot evaluation to fine-tuning and visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ExtractedFrames		ExtractedFrames
.gitignore		.gitignore
ActionRecognition.ipynb		ActionRecognition.ipynb
README.md		README.md
image-1.png		image-1.png
image-2.png		image-2.png
image-3.png		image-3.png
image-4.png		image-4.png
image.png		image.png
report.md		report.md
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Action Recognition with Vision-Language Models

Features

Dataset

Setup and Installation

1. Create and Activate Virtual Environment

2. Install Dependencies

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Action Recognition with Vision-Language Models

Features

Dataset

Setup and Installation

1. Create and Activate Virtual Environment

2. Install Dependencies

How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages