AI Engineering 14 min read

A Framework for AI System Design Interviews (Staff+ Level)

6-axis characterisation, architecture shape selection, reliability budgets, and how to structure a 45-minute system design answer.

AI system design interviews are different from traditional system design. You're not just designing a scalable service — you're designing a system with probabilistic components, uncertain quality, and failure modes that don't show up in unit tests. Interviewers at staff+ level expect you to handle this difference explicitly.

The 6-axis characterisation (do this first)

Before drawing any boxes, characterise the problem on 6 axes. This forces precision and signals experience:

Axis	Questions to answer
Quality vs. speed	What's the latency SLA? Can we afford streaming? Does quality trump speed?
Scale	QPS, document count, context length, user count — order of magnitude
Data freshness	Does knowledge need to be real-time? Daily? How stale is acceptable?
Personalisation	Per-user context? Multi-tenant? Global shared context?
Failure tolerance	What's the blast radius of a wrong answer? Is hallucination an incident?
Regulatory/compliance	PII handling? Data residency? Audit trails?

Choosing your architecture shape

Based on the 6-axis characterisation, you'll land on one of four shapes:

Simple RAG: knowledge retrieval, low real-time requirements, single-hop questions
Agentic RAG: complex queries, multi-hop reasoning, tool use needed
Fine-tuned model: behaviour change needed, not just knowledge; stable task definition
Hybrid pipeline: different query types route to different sub-systems

The components every AI system needs

Component	Why it matters	Common mistake
Eval pipeline	You can't measure quality without one	Skipping it until something breaks
Observability	You can't debug what you can't see	Only logging errors, not quality signals
Fallback strategy	LLMs fail — you need a graceful degradation	Hard-coding one path with no fallback
Rate limiting	Runaway agents burn budget fast	No per-user or per-session limits
Human-in-the-loop	High-consequence actions need approval gates	Automating actions with blast radius

Structuring your 45-minute answer

0–5 min: clarify requirements, do the 6-axis characterisation out loud
5–15 min: high-level architecture — name the shape, draw the data flow
15–30 min: deep-dive on the hardest component (usually retrieval or eval)
30–40 min: failure modes — what breaks, how you detect it, how you recover
40–45 min: scale and cost — rough numbers, bottlenecks, how it changes at 10× traffic

The question interviewers are really asking: do you think about AI systems like a production engineer or like someone who's only built demos? Talking about eval pipelines, failure modes, and cost budgets unprompted is the signal that separates principal engineers from senior ones.

Practice system design →: Work through AI system design scenarios in the Systems module with structured feedback.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →