Fundamentals of the Recommendation System

Based on the [Advanced Machine Learning with TensorFlow on Google Cloud Platform] (https://www.coursera.org/learn/recommendation-models-gcp), this article looks up fundamental recommendation system models, content-based, collaborative-filtering and knowledge-based.


Content-Based Model

Using attributes of items to recommend new item to an user

How it works?

Step 1. User-Item Rating Matrix

🙎‍♀️ / Movie Rating
Minions 7
A Star is Born 4
Aladdin 10

​ Let’s just assume that we are given the table above. In this table, the user🙎‍♀️ gave 7-star ratings to “Minions”, four-star ratings to “A Star is Born” and 10-ratings to “Aladdin” but we do not have information about other movies.

Step 2. Item Feature Matrix

Movie\Genre Fantasy Action Cartoon Drama Comedy
Minions 0 0 1 0 1
A Star is Born 0 0 0 1 0
Aladdin 1 0 0 0 1

​ For those rated movies, we now build an item feature matrix of genres. We can also consider themes, actors/directors, professional ratings, movie summary text, stills from movie, movie trailer as features. In the table above, “Minions” belongs to Cartoon and Comedy categories, “A Star is Born” is Drama and “Aladdin” is Fantasy and Comedy movie.

Step 3. User Feature Vector

​ Then apply dot-product to the user-item rating matrix and the item feature matrix and sum up the matrix by column as below.

Fantasy Action Cartoon Drama Comedy
10 0 7 4 17

​ The above matrix is a five-dimensional embedded feature space that we use to represent movies. Normalizing the above matrix, now we get the user feature vector.

Fantasy Action Cartoon Drama Comedy
0.26 0 0.18 0.11 0.45

​ The above vector is the user feature vector. Note that “0” for the Action genre does not mean that the user dislikes it because none of the movies he/she has previously rated contains the Action feature.

Step 4. User Rating Prediction

​ So, how can we include those concepts to engineer the content-based recommendation system? Now we have a new item feature matrix as below. Movies below can be both seen and unseen movies by the user (Simply we can just predict all ratings and drop off seen movies at the last step of recommendation). In this case, however, we only take care of four unseen movies.

Movie\Genre Fantasy Action Cartoon Drama Comedy
Harry Potter 1 1 0 0 0
The Dark Knight Rises 1 1 0 1 0
Incredible 0 1 1 0 1
Memento 0 0 0 1 0

​ Then multiply the user feature vector component-wisely to the new movie feature vector for each movie and then sum row-wise to compute the dot product. This gives us the dot product similarity between the user and each of those four movies.

Harry Potter The Dark Knight Rises Incredible Memento
0.71 0.37 0.63 0.11

​ The above vector shows predicted ratings for each movie. Clearly, the higher the more likely (hopefully) the user likes the item. Thus, we should recommend “Harry Potter” and “Incredible” to this user but not other movies. Simple, isn’t it?


Collaborative-Filtering Model

Using attributes of items to recommend new item to an user

Matrix Factorization

​ Assume that we have user-item interaction matrix of muliple users denoting whether the user watches or not.

User\Movie Harry Potter Incredible Shrek Dark Knight Rises Memento
🙎‍♀️ O   O O  
🙎   O     O
🙎🏾‍♂️ O O O    
🙍🏼‍♀️       O O

​ As the more users and movies are collected, the user-item matrix gets more sparse; normally needs to be shrunk down to more attractable size through matrix factorization. The factorization splits this matrix into row factors and column factors that are essentially user and item embeddings. Let A to be the whole user-item interaction matrix, then it is decomposed into U (user embedding) and V (item embedding) like below (much smaller!). \(A \approx U \times V^T\) ​ Each user and item is a d-dimensional point within an embedding space. Embeddings can be learned from data; PCA, SVD, etc. Compress data to find the best generalities to rely on, called latent factors. This saves space as long as the number of latent factors, k, is smaller than the harmonic mean of the number of users and items, (U*V)/(2*(U+V)). In CF, however, the problem is when we do not have information of new users, called cold start. In this case, we can use averages from the other users/items for fresh users/items until sufficient interactions are made.


Summary

pros and cons

Method Content-Based Collborative Filtering Knowledge-Based
pros - No need for data about other users
- Can recommend niche items
- No domain knowledge
- Serendipity
- Great starting point
- No interaction data needed
- Usually high-fidelity data from user self-reporting
cons - Need domain knowledge
- Only safe recommendations
- Cold start
- Sparsity
- No context features
- Need user data
- Need to be careful with privacy concerns

structured/unstructured information that can be used

Type Content-Based Collborative Filtering Knowledge-Based
structured - Genres
- Themes
- Actors/directors involved
- Professional ratings
- User ratings
- User views
- User wishlist/cart history
- User purchase/return history
- Demographic information
- Location/country/language
- Genre preferences
- Global filters
Unstructured - Movie summary test
- Stills from moview
- Movie trailer
- Professional reviews
- User reviews
- User-answered questions
- User-submitted photos
- User-submitted videos
- User ‘about me’ snippets