-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Using the NIPS dataset's corpus, train a LDA model.
There are already implementations for LDA:
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
https://radimrehurek.com/gensim/models/ldamodel.html
Create scripts (src/papers/models/) exposing a function that using the packages, to train a model for the given corpus (as a parameter).
Expose a function for extracting topics for new, unseen, documents.
Create a notebook for the process - loading the NIPS corpus and calling the train and predict functions. Remember to divide the dataset before training, and testing the prediction part on the unseen documents.
The notebook should print the extracted topics for the preprocessed documents, compared to the non processed ones.