GenAI Systems Lab Open interactive version →
AI Engineering 13 min read

LLMOps: What Production AI Actually Needs That Tutorials Skip

Observability, prompt versioning, latency budgets, cost tracking, model routers, A/B testing, and rollback strategies. The full production checklist.

Every production AI system needs the same set of infrastructure that tutorial content skips. This is the checklist. If you can't check every box, your system is not production-ready — it's a demo that's somehow in production.

Before you deploy

Observability (what to instrument)

Prompt management

Prompts are code. They have versions, they cause regressions, and they need to be deployed safely. At minimum: store prompts in version control with semantic versioning, run your eval suite before promoting a new prompt version, and maintain the ability to rollback to a previous prompt in under 5 minutes.

The most common LLMOps failure: a well-intentioned prompt tweak that ships without running evals and degrades the model's behaviour on edge cases that weren't manually tested. Eval gates before promotion are non-negotiable.

Ongoing operations

CadenceWhat to review
DailyCost vs. budget, error rate, P99 latency, flagged outputs
WeeklyQuality signal trends, eval score vs. baseline, top failure patterns
MonthlyFull eval suite run, prompt performance review, model upgrade consideration
QuarterlyRAG index freshness audit, eval set expansion, cost optimisation review
We had no eval pipeline. We had no prompt versioning. We shipped. Costs went up 40% after a well-intentioned system prompt rewrite that nobody tested. Good intentions aren't a deployment strategy.

Model upgrade strategy

When a new model version drops, don't assume it's a drop-in replacement. Always run your full eval suite against the new model before promoting, compare on your tail distribution (not just average quality), and check latency and cost deltas. A model that's 10% better on average but 30% worse on your P99 tail is not an upgrade.

Build your LLMOps stack →: Configure observability, prompt versioning, and eval pipelines in the Systems module.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →