Skip to content

Train a basic LDA model using the NIPS corpusΒ #8

@liadmagen

Description

@liadmagen

Using the NIPS dataset's corpus, train a LDA model.

There are already implementations for LDA:
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
https://radimrehurek.com/gensim/models/ldamodel.html

Create scripts (src/papers/models/) exposing a function that using the packages, to train a model for the given corpus (as a parameter).
Expose a function for extracting topics for new, unseen, documents.

Create a notebook for the process - loading the NIPS corpus and calling the train and predict functions. Remember to divide the dataset before training, and testing the prediction part on the unseen documents.

The notebook should print the extracted topics for the preprocessed documents, compared to the non processed ones.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions