GitHub - EBConlin/NLP_finalproject: Full-Text Scientific Argumentation Mining

The model infrastructure is there, but the model itself is underperforming. These are the new changes. We need to test to see what the effects are!!

Adding an LSTM over the top of frozen pretrained SciBert embeddings, which has been shown to be an effective, cheap way to finetune in these cases.
Implementing negative sampling of the negative instances.
intelligently duplicating labels. Right now there is only one relation label per relation. Since "parts of same" and "contradiction" are recipricol relationships, we can add a label to both the tokens involved instead of keeping it as one. While "supports" is not recipricol, we can cheat by adding a new recipricol relation "supported-by." Between downsampling the negative instances by (2) and doubling the number of positive instances. It is our intention to deal with the severe class imbalance shown here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
NLP_final_project (1).ipynb		NLP_final_project (1).ipynb
NLP_poster_econlin.pptx (3).pdf		NLP_poster_econlin.pptx (3).pdf
Paper Figures.pdf		Paper Figures.pdf
Parsing.ipynb		Parsing.ipynb
README.md		README.md
clean_data_list (2).pkl		clean_data_list (2).pkl
labeled_directory.pkl		labeled_directory.pkl

Provide feedback