AI Engineering 11 min read

Collaborative Filtering Deep Dive: Matrix Factorization, BPR, and NCF

From user-item matrix to learned embeddings. Matrix factorization geometry, BPR (Bayesian Personalized Ranking) for implicit feedback where you have clicks not ratings, Neural Collaborative Filtering, and when to use each.

The Core Insight: Users Who Agreed Before Will Agree Again

Collaborative filtering doesn't need to understand what an item is. It only needs to know who interacted with what. If user A and user B both bought the same 10 books, and user A bought an 11th book, collaborative filtering predicts user B will like the 11th book too. The intelligence is entirely in the co-occurrence structure of the interaction matrix, not in any understanding of content.

This is simultaneously collaborative filtering's greatest strength and its greatest weakness. Strength: it captures latent taste signals that content-based approaches miss entirely — the vibe of a movie, the aesthetic of a product, the feel of a restaurant that no metadata captures. Weakness: it requires interaction data. New users have no co-occurrence signal. New items have no co-occurrence signal. The cold start problem is structural, not incidental.

Matrix Factorization: The Classic Approach

The user-item interaction matrix R has shape (n_users × n_items). Most entries are missing (the user hasn't interacted with most items). Matrix factorization decomposes R into two low-rank matrices: R ≈ U × V^T, where U has shape (n_users × k) and V has shape (n_items × k). k is the latent dimension (typically 32-256). Each row of U is a user embedding. Each row of V is an item embedding. The predicted rating for user i and item j is the dot product of their embeddings: u_i · v_j.

import torch
import torch.nn as nn

class MatrixFactorization(nn.Module):
    def __init__(self, n_users, n_items, k=64):
        super().__init__()
        self.user_emb = nn.Embedding(n_users, k)
        self.item_emb = nn.Embedding(n_items, k)
        self.user_bias = nn.Embedding(n_users, 1)
        self.item_bias = nn.Embedding(n_items, 1)
        self.global_bias = nn.Parameter(torch.zeros(1))
        
        # Initialize with small values
        nn.init.normal_(self.user_emb.weight, std=0.01)
        nn.init.normal_(self.item_emb.weight, std=0.01)
    
    def forward(self, user_ids, item_ids):
        u = self.user_emb(user_ids)
        v = self.item_emb(item_ids)
        dot = (u * v).sum(dim=1)
        bias = self.user_bias(user_ids).squeeze() + self.item_bias(item_ids).squeeze() + self.global_bias
        return dot + bias

# Training: minimize MSE on observed ratings + L2 regularization
# Loss = sum over observed (r_ui - u_i · v_j)^2 + λ(||U||^2 + ||V||^2)

Implicit vs Explicit Feedback

Explicit feedback is ratings: a user explicitly rates an item 4 out of 5. Clean signal, but rare — most users don't rate things. Implicit feedback is behavioral: clicks, views, purchases, time spent, scroll depth. Abundant, but noisy. A click doesn't mean the user liked the item. Not clicking doesn't mean they wouldn't like it — they may never have seen it.

BPR (Bayesian Personalized Ranking) handles implicit feedback by learning that a user prefers clicked items over unclicked items. For each observed interaction (user u, item i), sample a negative item j that u hasn't interacted with. Optimize: P(score(u,i) > score(u,j)). This pairwise objective is more appropriate than pointwise regression for implicit data because you're learning preference ordering, not absolute ratings.

Neural Collaborative Filtering

Standard matrix factorization uses dot product as the interaction function. NCF replaces this with an MLP: instead of u_i · v_j, concatenate [u_i; v_j] and pass through several fully-connected layers. This allows the model to learn complex, non-linear user-item interactions that dot product can't express. The trade-off: slower inference, requires more data to train, doesn't benefit from FAISS for retrieval (since the interaction isn't a simple dot product anymore). In practice, NCF is used for ranking, not retrieval.

When Collaborative Filtering Fails

Cold start: new users and new items have no interaction history. No co-occurrence signal means random recommendations. Fix: fall back to content-based or popularity-based.
Popularity bias: popular items dominate the training signal. The model learns to recommend popular items to everyone, suppressing long-tail recommendations that might be exactly what a specific user wants.
Filter bubbles: if the model only recommends things similar to what the user has already seen, the user gets trapped in a feedback loop. Diversity-promoting objectives or exploration mechanisms (bandits) break the loop.
Sparsity: when most users have interacted with very few items, the co-occurrence signal is too sparse to learn meaningful embeddings. Minimum interaction thresholds (e.g., only train on users with ≥5 interactions) help but reduce coverage.

Interview trap: 'collaborative filtering doesn't need content features so it's simpler to build.' Wrong. Handling cold start, popularity bias, and data sparsity makes CF systems substantially more complex to operate than content-based systems. The interaction matrix is also expensive to store and update at scale.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →