This repository contains the code for the TUM Social Computing team at the GermEval 2022 shared task. Our SVR models were trained with Python 3.9.
- Clone this repository
- Setup Python 3.9 environment
- Install requirements with
pip install -r utils/requirements.txt - Download the spacy pipeline with
python -m spacy download de_core_news_sm - Download the SVR models from here and place them in the models folder.
- The data used for training and evaluation is in the
data/directory. You can download it either from the competition homepage or the original Github repository. For the later one, use theratings.csvfile and adapt the sentence and label column in the settings. - Adap the paths and column names in
utils/settings.pyto your version of the data.
Our SVR models are uploaded in this reporsitory in the models folder. The fine-tuned DistilBERT model is uploaded to HuggingFace and can be found here.
To run the respective models, use these commands from the command line
python support_vector_regression.py
python support_vector_regression.py --only_statistics
python eval_distilbert.py
This will store .npy files with the embedding vectors of the training data in the data folder.
python eval_distilbert.py --embedding
To analyze the relevant features in the SVR models, use the feature_relevance_analysis.py module.
python feature_relevance_analysis.py -s
The -s flag samples to data to speed up the SHAP value calculation. If you want to evaluate the combined model, add the -c parameter.
To recreate the SVR models, run the following command.
python support_vector_regression.py --training_mode
To retrain the DistilBERT fine-tuning, use the finetune_distilbert.py module.