Given a corpus of scientific documents, extract entities related to a scientific domain. Select any 50 papers from the CORD-19 dataset https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
- Use an ontology to define a vocabulary of entities you want to extract (you can use a domain ontology, e. g. Human Disease Ontology http://www.obofoundry.org/ontology/doid.html)
- Extract scientific entities from the documents. Focus only on those entities that are present in the ontology
- Link entities to the ontology
- Extract relationships between entities