Skip to content

This repository contains the code and everything required to run my research project experiments.

Notifications You must be signed in to change notification settings

raideno/video-segmentation

Repository files navigation

Bouldering Video (Action) Segmentation

This repository contains the code related to my 120 hours research project that consist in Bouldering Video (Action) Segmentation.

Below you'll be able to find instructions related to how to train and do inference using the model / repository's code.

For a detailed report on the model and the approach that was used, please refer to the PDF report of the research project.

model-overview

Pre-Trained Models

A detailed list of pre-trained models and their weights can be found in models-weights/mlp.README.md or models-weights/lstm.README.md. A resumed list can also be found just below.

Backbone Backbone Type MLP Accuracy LSTM Accuracy #Parameters Weights
yolo FRAME_BY_FRAME 65.01% ± 4.41 69.94% ± 2.40 2.9M MLP / LSTM
dino FRAME_BY_FRAME 80.58% ± 4.56 83.20% ± 3.26 22.1M MLP / LSTM
r3d TEMPORAL 84.27% ± 5.03 85.68% ± 3.64 31.6M MLP / LSTM
i3d TEMPORAL 76.53% ± 8.81 79.38% ± 5.11 12.7M MLP / LSTM
clip FRAME_BY_FRAME 76.49% ± 2.36 79.92% ± 2.74 151.3M MLP / LSTM
x3d-xs TEMPORAL 82.11% ± 3.73 83.87% ± 2.68 3.0M MLP / LSTM
x3d-s TEMPORAL 85.28% ± 4.54 85.84% ± 3.63 3.0M MLP / LSTM
x3d-m TEMPORAL 85.01% ± 4.75 86.61% ± 2.32 3.0M MLP / LSTM
x3d-l TEMPORAL 84.65% ± 4.10 85.91% ± 2.32 5.3M MLP / LSTM
s3d-k TEMPORAL 78.08% ± 5.20 78.04% ± 3.33 7.9M MLP / LSTM
s3d-h TEMPORAL 59.37% ± 9.10 47.19% ± 6.22 9.7M MLP / LSTM
slowfast TEMPORAL 84.22% ± 3.07 82.49% ± 4.61 33.6M MLP / LSTM
vivit TEMPORAL 78.42% ± 3.70 81.41% ± 3.75 88.6M MLP / LSTM

Inference

Note: Beside the explanation below, a notebook version with a detailed step by step guidecan also be found in the example.inference.ipynb notebook.

First of all you need to install the package:

pip install git+https://github.com/raideno/bouldering-video-segmentation.git

Now you'll need to download the weights of the model you wish to use from the Pre-Trained Models section.

Once this is done you can use the code below in order to test the model on a video. An example video can be found here if necessary: example-bouldering-video.mp4.

Note: For a detailed inference example you can refer to one of bouldering_video_segmentation/inference.py or example.inference.ipynb.

from tas_helpers.visualization import SegmentationVisualizer

from bouldering_video_segmentation.inference import segment_video
from bouldering_video_segmentation.utils import LabelEncoderFactory
# --- --- --- ---

VIDEO_PATH = "./climbing-video.mp4"

# --- --- --- ---

frames_predictions = segment_video(VIDEO_PATH)

# --- --- --- ---

label_encoder = LabelEncoderFactory.get()

visualizer = SegmentationVisualizer(
    labels_values=label_encoder.transform(label_encoder.classes_),
    labels_names=label_encoder.classes_
)

visualizer.plot_segmentation(
    frames_labels=frames_predictions,
    fps=25
)

A more detailed usage example can be found below:

import tqdm
import torch
import numpy as np

from video_dataset.video import read_video
from bouldering_video_segmentation.utils import LabelEncoderFactory

# --- --- ---

SEGMENT_SIZE = 32
VIDEO_FILE_PATH = "./example-video.mp4"

# --- --- ---

extractor, classifier = torch.hub.load("raideno/bouldering-video-segmentation", "mlp", backbone_name="x3d-xs", pretrained=True)

video = read_video(VIDEO_FILE_PATH)

features = []
predictions = []

for i in tqdm.tqdm(iterable=range(0, len(video), SEGMENT_SIZE), desc="[processing-video-segments]:", disable=not verbose):
    segment = video[i:i+SEGMENT_SIZE]

    # NOTE: required to be transposed to (Channel, Time, Height, Width)
    segment = segment.transpose(3, 0, 1, 2)
    feature = extractor.transform_and_extract(segment)
    feature = feature.unsqueeze(0)
    prediction = model(feature)

    features.append(feature)
    predictions.append(prediction)

_, labels = torch.max(torch.stack(predictions), dim=2)

frames_predictions = np.repeat(labels.numpy(), SEGMENT_SIZE)

# --- --- ---

label_encoder = LabelEncoderFactory.get()

visualizer = SegmentationVisualizer(
    labels_values=label_encoder.transform(label_encoder.classes_),
    labels_names=label_encoder.classes_
)

visualizer.plot_segmentation(
    frames_labels=frames_predictions,
    fps=25
)

Training the Model

  1. Start by cloning the github repository: git clone git@github.com:raideno/bouldering-video-segmentation.git.
  2. Download the dataset available at https://google.com.
  3. Follow the instructions in the notebook experiments/preliminary.ipynb.
  4. Depending on which model you want to train on, follow the instructions in one of the two following notebooks:
    1. Mlp: experiments/mlp.experiments.ipynb.
    2. Lstm: experiments/lstm.experiments.ipynb.

Accessing the Dataset

As the dataset contains pictures and videos of climbers who didn't necessarily agree for their videos to be made publicly available, in order to access the dataset please contact me at: nadirkichou@hotmail.fr.

About

This repository contains the code and everything required to run my research project experiments.

Resources

Stars

Watchers

Forks