This project, implemented in MATLAB, explores two key machine learning concepts: anomaly detection using Gaussian distributions and building a movie recommender system with collaborative filtering. It's designed as a hands-on exercise to understand and implement these algorithms.
The project is divided into two main parts:
- Anomaly Detection: This section focuses on identifying unusual data points by modeling the data with a multivariate Gaussian distribution and flagging those with a low probability.
- Movie Recommender System: This part implements a collaborative filtering algorithm to predict movie ratings and provide personalized recommendations.
Below is a breakdown of the key files in this project and their roles:
ex8.m: The main script that guides you through both the anomaly detection and recommender system exercises.ex8.mlx,ex8_companion1.mlx,ex8_companion2.mlx: Interactive MATLAB live scripts that provide a more detailed, step-by-step walkthrough of the concepts.cofiCostFunc.m: Computes the cost and gradient for the collaborative filtering algorithm.checkCostFunction.m: A utility to verify that the gradient computation incofiCostFunc.mis correct.computeNumericalGradient.m: A helper function forcheckCostFunction.mthat computes a numerical approximation of the gradient.estimateGaussian.m: Estimates the mean and variance for each feature in the dataset.multivariateGaussian.m: Computes the probability density function of a multivariate Gaussian distribution.selectThreshold.m: Finds the best threshold (epsilon) for anomaly detection using the F1 score.visualizeFit.m: A utility to visualize the dataset and the fitted Gaussian distribution.loadMovieList.m: Loads the list of movies frommovie_ids.txt.normalizeRatings.m: Normalizes movie ratings for the recommender system.fmincg.m: A function minimizer used to train the collaborative filtering model.movie_ids.txt: A text file containing the list of all movies and their corresponding IDs.ex8_movies.mat,ex8_movieParams.mat: MATLAB data files containing the movie ratings data and pre-trained model parameters.ex8data1.mat,ex8data2.mat: Datasets for the anomaly detection exercise.
To run this project, simply execute the main script in MATLAB:
ex8The script is divided into sections, and it will pause at each stage. You can press Enter to proceed to the next part of the exercise. The script will output visualizations and relevant metrics for both the anomaly detection and recommender system tasks.