GenAI Systems Lab Open interactive version →
AI Engineering 12 min read

The No-Spec System Design Round: A Framework for Ambiguous ML Problems

Most candidates fail ambiguous system design by jumping to boxes immediately. The 4-phase framework: clarification (5 non-negotiable minutes), explicit assumption surfacing, one clear architecture argument with defended tradeoffs, preemptive failure mode analysis. With the ML design question that separates senior from staff.

The No-Spec System Design Round: A Framework for Ambiguous Problems

The highest-signal system design questions have no clean spec. 'Design a recommendation system for 500M users.' 'Build a search ranking system from scratch.' 'Design an eval framework for our LLM product.' The interviewer is not withholding information to be difficult — they're watching how you handle the thing that's normal in their job: ambiguity.

Why Candidates Fail This Round

Most candidates start drawing architecture boxes immediately. This is the wrong move. You're designing a system for a problem you don't understand yet. The first 5 minutes should produce no boxes — only questions.

The failure mode: candidate jumps to 'okay so we'd have a two-tower retrieval model, then a ranking model, then a reranker' before knowing the product type, traffic pattern, latency requirement, or what 'recommendation' means in this context. The interviewer mentally downgrades the candidate immediately.

Phase 1: Clarification (5 minutes, non-negotiable)

Before any design work, surface the constraints that will define the architecture. Four categories:

Phase 2: State Your Assumptions Explicitly

After clarification, some things will still be unknown. Don't skip them — name them and make a bet.

// Good assumption surfacing:
"I'll assume:
- Implicit feedback only (clicks/purchases), no explicit ratings
- Latency budget: 150ms p99 end-to-end
- Cold start is a real problem: 20% of users are <1 week old
- We're optimizing for 7-day retention, not single-session CTR

If any of these are wrong, the architecture changes significantly.
Should I proceed on these or correct them?"

// What most candidates do:
"Okay so I'll design this system..."

Phase 3: One Clear Architecture Argument

Don't present options. Make a decision and defend it. Interviewers at high-TC companies want to see you have taste — the ability to look at constraints and arrive at a design, not present a menu.

Phase 4: Preempt the Follow-Ups

After presenting the design, proactively identify the two or three things most likely to fail in production.

The ML System Design Question That Separates Senior from Staff

'How do you evaluate whether the system is actually getting better?' This is the question most candidates answer wrong at senior level and right at staff level.

Senior answer: 'We'd track CTR, conversion, NDCG@10.' Staff answer: 'Offline metrics are necessary but not sufficient — they optimize for the metric, not the user outcome. We'd run an A/B test with a proper power analysis before assuming any offline improvement is real. For the online experiment: primary metric is 7-day retention, guardrail metrics are session length and return rate, secondary metrics are CTR and conversion. We'd run for 2 weeks minimum to avoid novelty effect. And we'd build a holdout group at 5% to measure the cumulative effect of model improvements over quarters.'

Common Ambiguous Design Questions and the First Clarification to Ask

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →