Bouldering Video (Action) Segmentation

This repository contains the code related to my 120 hours research project that consist in Bouldering Video (Action) Segmentation.

Below you'll be able to find instructions related to how to train and do inference using the model / repository's code.

For a detailed report on the model and the approach that was used, please refer to the PDF report of the research project.

Pre-Trained Models

A detailed list of pre-trained models and their weights can be found in models-weights/mlp.README.md or models-weights/lstm.README.md. A resumed list can also be found just below.

Backbone	Backbone Type	MLP Accuracy	LSTM Accuracy	#Parameters	Weights
yolo	`FRAME_BY_FRAME`	65.01% ± 4.41	69.94% ± 2.40	2.9M	MLP / LSTM
dino	`FRAME_BY_FRAME`	80.58% ± 4.56	83.20% ± 3.26	22.1M	MLP / LSTM
r3d	`TEMPORAL`	84.27% ± 5.03	85.68% ± 3.64	31.6M	MLP / LSTM
i3d	`TEMPORAL`	76.53% ± 8.81	79.38% ± 5.11	12.7M	MLP / LSTM
clip	`FRAME_BY_FRAME`	76.49% ± 2.36	79.92% ± 2.74	151.3M	MLP / LSTM
x3d-xs	`TEMPORAL`	82.11% ± 3.73	83.87% ± 2.68	3.0M	MLP / LSTM
x3d-s	`TEMPORAL`	85.28% ± 4.54	85.84% ± 3.63	3.0M	MLP / LSTM
x3d-m	`TEMPORAL`	85.01% ± 4.75	86.61% ± 2.32	3.0M	MLP / LSTM
x3d-l	`TEMPORAL`	84.65% ± 4.10	85.91% ± 2.32	5.3M	MLP / LSTM
s3d-k	`TEMPORAL`	78.08% ± 5.20	78.04% ± 3.33	7.9M	MLP / LSTM
s3d-h	`TEMPORAL`	59.37% ± 9.10	47.19% ± 6.22	9.7M	MLP / LSTM
slowfast	`TEMPORAL`	84.22% ± 3.07	82.49% ± 4.61	33.6M	MLP / LSTM
vivit	`TEMPORAL`	78.42% ± 3.70	81.41% ± 3.75	88.6M	MLP / LSTM

Inference

Note: Beside the explanation below, a notebook version with a detailed step by step guidecan also be found in the example.inference.ipynb notebook.

First of all you need to install the package:

pip install git+https://github.com/raideno/bouldering-video-segmentation.git

Now you'll need to download the weights of the model you wish to use from the Pre-Trained Models section.

Once this is done you can use the code below in order to test the model on a video. An example video can be found here if necessary: example-bouldering-video.mp4.

Note: For a detailed inference example you can refer to one of bouldering_video_segmentation/inference.py or example.inference.ipynb.

from tas_helpers.visualization import SegmentationVisualizer

from bouldering_video_segmentation.inference import segment_video
from bouldering_video_segmentation.utils import LabelEncoderFactory
# --- --- --- ---

VIDEO_PATH = "./climbing-video.mp4"

# --- --- --- ---

frames_predictions = segment_video(VIDEO_PATH)

# --- --- --- ---

label_encoder = LabelEncoderFactory.get()

visualizer = SegmentationVisualizer(
    labels_values=label_encoder.transform(label_encoder.classes_),
    labels_names=label_encoder.classes_
)

visualizer.plot_segmentation(
    frames_labels=frames_predictions,
    fps=25
)

A more detailed usage example can be found below:

import tqdm
import torch
import numpy as np

from video_dataset.video import read_video
from bouldering_video_segmentation.utils import LabelEncoderFactory

# --- --- ---

SEGMENT_SIZE = 32
VIDEO_FILE_PATH = "./example-video.mp4"

# --- --- ---

extractor, classifier = torch.hub.load("raideno/bouldering-video-segmentation", "mlp", backbone_name="x3d-xs", pretrained=True)

video = read_video(VIDEO_FILE_PATH)

features = []
predictions = []

for i in tqdm.tqdm(iterable=range(0, len(video), SEGMENT_SIZE), desc="[processing-video-segments]:", disable=not verbose):
    segment = video[i:i+SEGMENT_SIZE]

    # NOTE: required to be transposed to (Channel, Time, Height, Width)
    segment = segment.transpose(3, 0, 1, 2)
    feature = extractor.transform_and_extract(segment)
    feature = feature.unsqueeze(0)
    prediction = model(feature)

    features.append(feature)
    predictions.append(prediction)

_, labels = torch.max(torch.stack(predictions), dim=2)

frames_predictions = np.repeat(labels.numpy(), SEGMENT_SIZE)

# --- --- ---

label_encoder = LabelEncoderFactory.get()

visualizer = SegmentationVisualizer(
    labels_values=label_encoder.transform(label_encoder.classes_),
    labels_names=label_encoder.classes_
)

visualizer.plot_segmentation(
    frames_labels=frames_predictions,
    fps=25
)

Training the Model

Start by cloning the github repository: git clone git@github.com:raideno/bouldering-video-segmentation.git.
Download the dataset available at https://google.com.
Follow the instructions in the notebook experiments/preliminary.ipynb.
Depending on which model you want to train on, follow the instructions in one of the two following notebooks:
1. Mlp: experiments/mlp.experiments.ipynb.
2. Lstm: experiments/lstm.experiments.ipynb.

Accessing the Dataset

As the dataset contains pictures and videos of climbers who didn't necessarily agree for their videos to be made publicly available, in order to access the dataset please contact me at: nadirkichou@hotmail.fr.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
bouldering_video_segmentation		bouldering_video_segmentation
experiments		experiments
extractors-weights		extractors-weights
models-weights		models-weights
reports		reports
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
example.inference.ipynb		example.inference.ipynb
hubconf.py		hubconf.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bouldering Video (Action) Segmentation

Pre-Trained Models

Inference

Training the Model

Accessing the Dataset

About

Uh oh!

Uh oh!

Languages

raideno/video-segmentation

Folders and files

Latest commit

History

Repository files navigation

Bouldering Video (Action) Segmentation

Pre-Trained Models

Inference

Training the Model

Accessing the Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages