Amazon Product Recommendation System

Matrix factorization-based recommender engine for Amazon Electronics, built from 7.8 million user ratings and leveraging collaborative filtering for personalized suggestions.

View Jupyter Notebook

Context

With millions of electronics products and user reviews, Amazon's default recommendation engine often surfaces the most popular items, overwhelming shoppers and missing nuanced personal preferences. The business need: help users discover the right products for them, while increasing conversion rates for sellers by providing smarter, data-driven recommendations.

Dataset: 7.8 million Amazon Electronics ratings (user_id, product_id, rating, timestamp)
Problem: High sparsity and rating skew; most users rate only a few items, and most ratings are 4.0 or 5.0

Action

Data preprocessing: Filtered for positive ratings (≥ 4.0), removed inactive users/products (threshold ≥ 50 interactions) to reduce noise and cold-start issues
Exploratory analysis: Visualized rating distribution and long-tail patterns with Seaborn/Matplotlib; analyzed user and item engagement
User-item matrix construction: Pivoted filtered data into a sparse matrix (828 active users × 38 popular products)
Collaborative filtering: Applied Singular Value Decomposition (SVD) for matrix factorization using SciPy and NumPy
Recommendation logic: For each user, ranked unrated products by predicted score to generate top-N personalized recommendations
Code example: Designed a reusable Python function to retrieve each user’s top product recommendations based on predicted preferences

Result

Successfully generated individualized Top-5 recommendations for each engaged user
Achieved substantial improvement in recommendation relevance, simulating higher click-through and conversion rates
Provided sellers with actionable insights into user tastes and item popularity, informing marketing and inventory decisions
Delivered a scalable, interpretable codebase for future hybrid or deep learning enhancements

Learning

Learned the critical importance of data filtering—especially around activity thresholds and positive ratings—for both performance and interpretability in large-scale recommender systems
Gained hands-on experience with collaborative filtering and dimensionality reduction, and how SVD reveals hidden patterns in user-product interactions
Appreciated the need for hybrid models and web deployment (e.g., with Flask/Streamlit) to move from prototype to real-world impact
Deepened my skills in data wrangling (Pandas), visualization (Seaborn/Matplotlib), and designing production-ready recommendation pipelines

Jupyter Notebook

Explore the full notebook with code, EDA, and modeling below: