This project demonstrates zero-shot and few-shot action recognition on the UCF101 dataset using OpenAI's CLIP model. It includes methods for evaluation, selective fine-tuning, and visualizing model embeddings to compare different approaches.
- Zero-Shot Classification: Evaluates a pre-trained CLIP model on action recognition without any prior training on the dataset.
- Few-Shot Fine-Tuning: Implements a strategy to fine-tune the model on a small subset of underperforming classes.
- Temporal Analysis: Compares the model's performance using single-frame input versus short-clip (temporal) input.
- Visualization: Includes methods to visualize the model's embedding space with t-SNE and similarity scores with heatmaps.
This project uses the UCF101 – Action Recognition Data Set. You can download it from the official website.
First, create a virtual environment.
python -m venv venvActivate the environment.
On Windows:
.\venv\Scripts\activateOn macOS / Linux:
source venv/bin/activateIt's recommended to install PyTorch first to ensure GPU compatibility.
Activate GPU PyTorch (Windows/Linux, CUDA 12.1+):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Note: The CUDA version (cu121) may need to be adjusted based on your system's NVIDIA driver.
Install remaining packages:
pip install pandas matplotlib seaborn scikit-learn opencv-python tqdm
pip install git+https://github.com/openai/CLIP.gitor you can just run
pip install -r requirements.txt- Download the Dataset: Download the UCF101 dataset and the train/test split files and place them in the project directory.
- Configure Paths: Open the Jupyter Notebook (
.ipynb) and update theUCF101_VID_PATHandUCF101_SPLIT_PATHvariables in the configuration cell to point to your dataset locations. - Execute Notebook: Run the notebook cells sequentially to perform the analysis, from data loading and zero-shot evaluation to fine-tuning and visualization.