A Framework for AI System Design Interviews (Staff+ Level)
6-axis characterisation, architecture shape selection, reliability budgets, and how to structure a 45-minute system design answer.
AI system design interviews are different from traditional system design. You're not just designing a scalable service — you're designing a system with probabilistic components, uncertain quality, and failure modes that don't show up in unit tests. Interviewers at staff+ level expect you to handle this difference explicitly.
The 6-axis characterisation (do this first)
Before drawing any boxes, characterise the problem on 6 axes. This forces precision and signals experience:
| Axis | Questions to answer |
|---|---|
| Quality vs. speed | What's the latency SLA? Can we afford streaming? Does quality trump speed? |
| Scale | QPS, document count, context length, user count — order of magnitude |
| Data freshness | Does knowledge need to be real-time? Daily? How stale is acceptable? |
| Personalisation | Per-user context? Multi-tenant? Global shared context? |
| Failure tolerance | What's the blast radius of a wrong answer? Is hallucination an incident? |
| Regulatory/compliance | PII handling? Data residency? Audit trails? |
Choosing your architecture shape
Based on the 6-axis characterisation, you'll land on one of four shapes:
- Simple RAG: knowledge retrieval, low real-time requirements, single-hop questions
- Agentic RAG: complex queries, multi-hop reasoning, tool use needed
- Fine-tuned model: behaviour change needed, not just knowledge; stable task definition
- Hybrid pipeline: different query types route to different sub-systems
The components every AI system needs
| Component | Why it matters | Common mistake |
|---|---|---|
| Eval pipeline | You can't measure quality without one | Skipping it until something breaks |
| Observability | You can't debug what you can't see | Only logging errors, not quality signals |
| Fallback strategy | LLMs fail — you need a graceful degradation | Hard-coding one path with no fallback |
| Rate limiting | Runaway agents burn budget fast | No per-user or per-session limits |
| Human-in-the-loop | High-consequence actions need approval gates | Automating actions with blast radius |
Structuring your 45-minute answer
- 0–5 min: clarify requirements, do the 6-axis characterisation out loud
- 5–15 min: high-level architecture — name the shape, draw the data flow
- 15–30 min: deep-dive on the hardest component (usually retrieval or eval)
- 30–40 min: failure modes — what breaks, how you detect it, how you recover
- 40–45 min: scale and cost — rough numbers, bottlenecks, how it changes at 10× traffic
The question interviewers are really asking: do you think about AI systems like a production engineer or like someone who's only built demos? Talking about eval pipelines, failure modes, and cost budgets unprompted is the signal that separates principal engineers from senior ones.
Practice system design →: Work through AI system design scenarios in the Systems module with structured feedback.
- Chip Huyen: Building LLM applications for production
- ML System Design Interview — Stanford CS329S
- Lilian Weng: LLM-Powered Autonomous Agents (architecture section)
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →