LLM Interview Question Patterns: What Senior Engineers Actually Ask
The 10 question categories, common traps, and how to structure 4-layer answers. From 'explain self-attention' to 'design a RAG evaluation pipeline'.
LLM engineering interviews have converged on a set of question categories that show up consistently across Google, Meta, Anthropic, OpenAI, and AI-native startups. Knowing the categories lets you prepare efficiently rather than guessing what might come up.
The 8 question categories
| Category | What they're testing | Example questions |
|---|---|---|
| Architecture fundamentals | Do you understand the mechanics? | Explain self-attention. What is positional encoding for? |
| RAG design | Can you build a production retrieval system? | Design a RAG pipeline for a 10M-document corpus. How do you handle stale docs? |
| Evaluation | Do you know how to measure quality? | How would you evaluate a RAG system? What's faithfulness vs. answer relevance? |
| Failure modes | Have you shipped things that broke? | What fails in a RAG pipeline? How do you debug a hallucinating agent? |
| Agent systems | Can you build multi-step systems? | Design a ReAct agent for X. How do you prevent infinite loops? |
| Cost/latency | Do you think about production economics? | How would you reduce inference cost by 50%? What's TTFT and why does it matter? |
| System design | Can you architect at scale? | Design an LLM-powered search for an e-commerce site with 1M products. |
| Trade-offs | Can you reason about decisions? | RAG vs. fine-tuning for domain adaptation — when would you choose each? |
The 4-layer answer structure
For technical questions, structure answers in 4 layers. This signals depth without rambling:
- Layer 1 — Definition: what is it? One sentence. Precise.
- Layer 2 — Mechanism: how does it work? Two to three sentences, no hand-waving.
- Layer 3 — Trade-offs: when does it fail? What's the cost? What's the alternative?
- Layer 4 — Production experience: when have you used it or seen it break?
Most candidates answer at Layer 1 or 2 and stop. The interview is won at Layer 3 and 4. If you don't have production experience, use the labs here to generate real examples — "I reproduced the missing context failure on a 500-chunk corpus and measured a 23% precision drop" is far better than a textbook definition.
The traps interviewers use
- "Just explain it simply" — they want to see if you can explain clearly, not if you'll drop all precision
- "What would you do differently?" after you answer — they're testing whether you can self-critique
- Giving you a system with no eval — they're waiting to see if you notice and call it out
- Asking about a technique and then asking when you wouldn't use it — they want the failure mode
- "How would you debug that?" — they want a systematic process, not guessing
Top 10 questions to prepare cold
- Explain self-attention and why it works better than RNNs for long sequences
- Design a RAG system for a customer support bot. What metrics would you track?
- What is RAGAS and what does faithfulness actually measure?
- How does prompt caching work and when does it pay off?
- What are the failure modes of a ReAct agent in production?
- Fine-tuning vs. RAG: give me a concrete scenario where you'd choose each
- How would you detect hallucinations in a RAG system at scale?
- What is positional encoding and what problem does it solve?
- Design a model routing system that reduces inference cost by 60%
- How do you build an eval pipeline before you have ground truth labels?
Drill these questions in Fluency →: Practice timed answers to LLM interview questions with structured feedback.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →