This project involves training and validating models to predict user interactions with the Otto Recommender System dataset. The steps below detail data downloading, model training, validation, and generating predictions.
Make sure the following Python packages are installed before running the code:
pip install pandas numpy lightgbm cudf gensim- Memory: At least 128 GB of RAM is recommended.
- Runtime: The code may take up to 48 hours to run fully.
- GPU: It is highly recommended to use Kaggle's GPU environment for faster processing.
Download the required datasets into the input folder from the following links:
- OTTO Chunk Data in Parquet Format
- OTTO Validation Data
- OTTO Recommender System Competition Data
- OTTO Full Optimized Memory Footprint
Train the Word2Vec model using the word2vec-train.ipynb notebook. It is recommended to use Kaggle's GPU for faster training.
Run the following validation steps sequentially:
-
Recall Program:
code/recall_valid.ipynb
- Recommendation: Use Kaggle’s GPU for this task.
-
Feature Preparation:
code/feature_prepare_valid.ipynb
-
Ranking Model:
code/rank_model_valid.ipynb
- The model can be run for three types of interactions:
clicks,carts, andorders. - You can modify the parameter
tto specify the type (t=clicks/carts/orders). - The default recall quantity is 50, but you can increase it up to 250 for better results (more recall generally improves the score).
- The model can be run for three types of interactions:
Run the following steps sequentially for test data to generate the final submission file:
-
Recall Program:
code/recall_test.ipynb
- Recommendation: Use Kaggle’s GPU for this task.
-
Feature Preparation:
code/feature_prepare_test.ipynb
-
Ranking Model:
code/rank_model_test.ipynb
- This will generate the final submission file:
submission.csv.
- This will generate the final submission file: