AI Engineering 12 min read

Candidate Generation vs Ranking: The Two-Stage Recommender Pipeline

Why every production recommender is two-stage. Recall-optimized candidate generation (millions → thousands) then precision-optimized ranking (thousands → tens). Feature engineering for ranking, multi-objective loss (relevance + diversity + freshness), and the latency math.

Why One Model Can't Do Both Jobs

The recommender system at Swiggy needs to pick 20 restaurants from 200,000. The ideal model for this would score every restaurant with perfect accuracy and return the top 20. But a model accurate enough to do this takes 50ms per restaurant — 200,000 × 50ms = 10,000 seconds per request. That's not a production system, that's a science project.

The solution is a cascade: a fast, approximate first stage (candidate generation) that retrieves 500-2000 candidates, followed by a slow, accurate second stage (ranking) that scores only those candidates. The first stage optimizes for recall — get all the good items in the candidate set, even if it also retrieves some bad ones. The second stage optimizes for precision — within the candidate set, put the best items at the top.

Candidate Generation: Fast + High Recall

Multiple retrieval signals feed candidate generation in parallel. Each signal retrieves candidates independently, then results are merged and deduplicated. Typical signals at a food delivery company:

Two-tower retrieval: user embedding → ANN search over restaurant embeddings. High-quality personalized recall. ~1-5ms.
Collaborative filtering: users similar to you liked these restaurants. Adds diversity beyond personal history.
Content-based: restaurants matching the user's stated cuisine preferences or recent search queries.
Popularity/trending: top restaurants in the user's delivery zone right now. Cold start fallback.
Re-engagement: restaurants the user has ordered from before but not recently. High conversion, easy personalization.
Contextual: it's Sunday morning → breakfast places get boosted. Rainy day → comfort food. Temporal signals.

Merging strategies: union of all candidates (deduplicated), or weighted union where signals with better historical precision get more candidate slots. Typically 500-2000 candidates reach the ranking stage.

Ranking: Slow + High Precision

The ranking model scores each candidate given rich features that were too expensive to use during retrieval. Cross-features between user and item: has this specific user ordered from this specific restaurant before? Cross-features with context: is it lunchtime and is this restaurant typically ordered for lunch? Real-time features: what's the current delivery time estimate for this restaurant given current driver availability?

Ranking models are typically gradient-boosted trees (LightGBM, XGBoost) or DNNs with wide-and-deep architecture. They take hundreds of features and produce a score per candidate. The feature set is where the real ML engineering work happens — feature engineering, point-in-time correct joins from the feature store, ensuring no leakage from future data.

# Ranking model feature categories
features = {
    # User features (from user profile store)
    "user_avg_order_value": float,
    "user_cuisine_preferences": list,  # top-5 historical cuisines
    "user_price_sensitivity": float,   # derived from historical choices
    
    # Item features (from restaurant catalog)  
    "restaurant_avg_rating": float,
    "restaurant_delivery_time_p50": float,
    "restaurant_order_volume_7d": int,
    
    # Cross features (user × item interactions)
    "user_ordered_from_restaurant_count": int,  # personal history
    "user_ordered_cuisine_match_score": float,  # cuisine preference alignment
    
    # Context features (real-time)
    "current_delivery_eta_minutes": float,  # live, from dispatch system
    "time_since_last_order_hours": float,
    "is_lunch_hour": bool,
    
    # Position/session features (for de-biasing)
    "candidate_position_in_list": int,  # for position bias correction
}

The Objective: What Are You Actually Ranking For?

Click-through rate (CTR) is easy to measure but optimizing for CTR produces clickbait. A restaurant with a great photo gets clicks; a restaurant that reliably delivers good food gets repeat orders. Most mature recommenders optimize for a business metric — order completion rate, gross merchandise value, or long-term retention — not raw clicks. This requires delayed labels (order completed 45 minutes later) and careful attribution.

The most important question in any ranking interview: 'what metric are you optimizing and why?' If the answer is CTR, ask what happens to user satisfaction. If the answer is GMV, ask what happens to new restaurant discovery. Every objective has a shadow metric it sacrifices. Knowing this is the senior signal.

Re-ranking: Business Rules Layer

After ranking, a re-ranking layer applies business constraints the ML model shouldn't learn: sponsored restaurants get boosted by contracted position, restaurants currently out of delivery range are filtered, restaurants that would result in very long ETAs during peak periods get penalized, diversity constraints ensure at least 2 cuisine types in the top 10. Re-ranking is where the business talks to the model.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →