Amazon Product Recommendation System
Matrix factorization-based recommender engine for Amazon Electronics, built from 7.8 million user ratings and leveraging collaborative filtering for personalized suggestions.
View Jupyter NotebookContext
With millions of electronics products and user reviews, Amazon's default recommendation engine often surfaces the most popular items, overwhelming shoppers and missing nuanced personal preferences. The business need: help users discover the right products for them, while increasing conversion rates for sellers by providing smarter, data-driven recommendations.
- Dataset: 7.8 million Amazon Electronics ratings (user_id, product_id, rating, timestamp)
- Problem: High sparsity and rating skew; most users rate only a few items, and most ratings are 4.0 or 5.0
Action
- Data preprocessing: Filtered for positive ratings (≥ 4.0), removed inactive users/products (threshold ≥ 50 interactions) to reduce noise and cold-start issues
- Exploratory analysis: Visualized rating distribution and long-tail patterns with Seaborn/Matplotlib; analyzed user and item engagement
- User-item matrix construction: Pivoted filtered data into a sparse matrix (828 active users × 38 popular products)
- Collaborative filtering: Applied Singular Value Decomposition (SVD) for matrix factorization using SciPy and NumPy
- Recommendation logic: For each user, ranked unrated products by predicted score to generate top-N personalized recommendations
- Code example: Designed a reusable Python function to retrieve each user’s top product recommendations based on predicted preferences
Result
- Successfully generated individualized Top-5 recommendations for each engaged user
- Achieved substantial improvement in recommendation relevance, simulating higher click-through and conversion rates
- Provided sellers with actionable insights into user tastes and item popularity, informing marketing and inventory decisions
- Delivered a scalable, interpretable codebase for future hybrid or deep learning enhancements
Learning
- Learned the critical importance of data filtering—especially around activity thresholds and positive ratings—for both performance and interpretability in large-scale recommender systems
- Gained hands-on experience with collaborative filtering and dimensionality reduction, and how SVD reveals hidden patterns in user-product interactions
- Appreciated the need for hybrid models and web deployment (e.g., with Flask/Streamlit) to move from prototype to real-world impact
- Deepened my skills in data wrangling (Pandas), visualization (Seaborn/Matplotlib), and designing production-ready recommendation pipelines
Jupyter Notebook
Explore the full notebook with code, EDA, and modeling below: