Text Complexity Assessment of German Text

This repository contains the code for the TUM Social Computing team at the GermEval 2022 shared task. Our SVR models were trained with Python 3.9.

Setup

Clone this repository
Setup Python 3.9 environment
Install requirements with pip install -r utils/requirements.txt
Download the spacy pipeline with python -m spacy download de_core_news_sm
Download the SVR models from here and place them in the models folder.
The data used for training and evaluation is in the data/ directory. You can download it either from the competition homepage or the original Github repository. For the later one, use the ratings.csv file and adapt the sentence and label column in the settings.
Adap the paths and column names in utils/settings.py to your version of the data.

Use pretrained models

Our SVR models are uploaded in this reporsitory in the models folder. The fine-tuned DistilBERT model is uploaded to HuggingFace and can be found here. To run the respective models, use these commands from the command line

Combination of neural embedding with text statistics

python support_vector_regression.py

SVR with statistics only

python support_vector_regression.py --only_statistics

Fine-tuned DistilBERT

python eval_distilbert.py

Create neural embeddingt with fine-tuned DistilBERT

This will store .npy files with the embedding vectors of the training data in the data folder.

python eval_distilbert.py --embedding

Generate explanations

To analyze the relevant features in the SVR models, use the feature_relevance_analysis.py module.

python feature_relevance_analysis.py -s

The -s flag samples to data to speed up the SHAP value calculation. If you want to evaluate the combined model, add the -c parameter.

Train models

To recreate the SVR models, run the following command.

python support_vector_regression.py --training_mode

To retrain the DistilBERT fine-tuning, use the finetune_distilbert.py module.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
development_notebooks		development_notebooks
models		models
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
KONVENS2022.pdf		KONVENS2022.pdf
LICENSE		LICENSE
README.md		README.md
eval_distilbert.py		eval_distilbert.py
feature_relevance_analysis.py		feature_relevance_analysis.py
finetune_distilbert.py		finetune_distilbert.py
support_vector_regression.py		support_vector_regression.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Complexity Assessment of German Text

Setup

Use pretrained models

Combination of neural embedding with text statistics

SVR with statistics only

Fine-tuned DistilBERT

Create neural embeddingt with fine-tuned DistilBERT

Generate explanations

Train models

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MiriUll/text_complexity

Folders and files

Latest commit

History

Repository files navigation

Text Complexity Assessment of German Text

Setup

Use pretrained models

Combination of neural embedding with text statistics

SVR with statistics only

Fine-tuned DistilBERT

Create neural embeddingt with fine-tuned DistilBERT

Generate explanations

Train models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages