Movie Recommendation System

This project suggests movies similar to a user's favorite. Using TF-IDF and cosine similarity, it analyzes movie titles, years, ratings, genres, and languages to provide list of recommendations.

Built at Stellar Indiathon

Created on 15th June 2024

•

Movie Recommendation System

This project suggests movies similar to a user's favorite. Using TF-IDF and cosine similarity, it analyzes movie titles, years, ratings, genres, and languages to provide list of recommendations.

The problem Movie Recommendation System solves

The Movie Recommender System is designed to enhance the movie-watching experience by providing personalized recommendations. It leverages advanced natural language processing techniques to analyze and understand movie features, enabling it to suggest films that closely match a user's preferences.

Key Uses
Personalized Movie Recommendations:

Discover New Films: Users can explore new movies similar to their favorites, expanding their viewing horizons.
Diverse Options: By integrating both Indian and international movie datasets, the system offers a wide range of suggestions, catering to varied tastes and preferences.
Efficient Decision-Making:

Time-Saving: Users can quickly find movies they are likely to enjoy without sifting through endless options.
Reduced Choice Overload: The system narrows down choices to the top 30 recommendations, making the decision process more manageable.
Enhanced User Experience:

Tailored Suggestions: The system takes into account various movie attributes such as genre, cast, director, and keywords, ensuring that recommendations are well-suited to user preferences.
User-Friendly Interface: Simple input of a favorite movie title leads to immediate and relevant recommendations.
How It Works
Data Integration:

Combines features from both Indian and international movies, including genres, keywords, taglines, cast, and directors.
Uses a comprehensive dataset to provide a rich set of options.
Advanced Processing:

Utilizes TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert movie features into numerical vectors.
Employs cosine similarity to measure the closeness between the user's favorite movie and other movies in the dataset.
Recommendation Generation:

Finds the closest match to the user's input movie title.
Computes similarity scores and ranks movies based on these scores.
Presents the top 30 unique movie recommendations without duplicates, ensuring a variety of choices.

Challenges I ran into

During the development of the Movie Recommender System, one significant challenge I encountered was related to indexing and slicing feature vectors after TF-IDF vectorization.

Issue:
After vectorizing the combined movie features using TF-IDF, I needed to split these feature vectors back into their respective datasets (Indian movies and other international movies). However, managing the indexing correctly between these datasets proved tricky, especially when computing cosine similarities later on.

Resolution:
To overcome this challenge:

Double-Checking Indices: I carefully double-checked the indices used to split the feature vectors. This involved ensuring that the slicing operation correctly aligned with the number of rows in each dataset after concatenation.

Debugging with Print Statements: I used print statements extensively to inspect the dimensions and contents of the feature vectors at various stages of processing. This helped me pinpoint where the indexing issues were arising.

Consulting Documentation and Forums: I referred to the documentation of libraries like scipy.sparse and pandas to clarify how slicing and indexing operations should be performed correctly with sparse matrices and dataframes.

Incremental Testing: I implemented incremental testing, starting from small subsets of the data to larger datasets. This approach allowed me to catch indexing errors early and gradually scale up the processing.

By carefully managing indices and thoroughly testing each step of the data processing pipeline, I successfully resolved the indexing issue and ensured that the Movie Recommender System could accurately compute cosine similarities and provide reliable movie recommendations based on user input. This experience highlighted the importance of meticulous data handling and systematic debugging techniques in developing robust data-driven applications.

Technologies used

pandas

Python

Jupyter Notebook

Scikit-learn (including TfidfVectorizer and cosine_similarity)

SciPy (specifically for sparse matrix operations)

Discussion

Builders also viewed

See more projects on Devfolio