The AI Launch Checklist: What to Verify Before Going Live
Eval gate, fallback behaviour, latency SLA, cost guardrails, safety review, legal sign-off, monitoring setup — the full pre-launch checklist for AI features.
Every AI feature launch that goes badly wrong shares one thing in common: someone said 'it seems to work' and skipped the checklist. This is the checklist.
Not theory. Not aspirational best practices. Things that will actually save you from the specific failures that happen to real teams shipping real AI features.
Before you write a line of code
- ✓ Define what success looks like: not 'users like it' but specific, measurable criteria (task completion rate, quality score, latency SLA)
- ✓ Define the failure mode you care most about: hallucination? misuse? discriminatory outputs? Your mitigation strategy depends on knowing this
- ✓ Data privacy review complete: you know where user data goes, you have a DPA with your model provider if needed, PII handling is documented
- ✓ Legal review for your domain: healthcare, finance, legal, and HR uses of AI have specific regulatory requirements — know them before you build
Before you ship to production
- ✓ Eval suite exists: at least 100 examples with expected outputs and a judge — run it, know your baseline score
- ✓ Red team done: 30+ adversarial prompts tested manually — prompt injection, jailbreaks, edge cases specific to your domain
- ✓ Hallucination handling: you've tested what the model does when it doesn't know the answer — does it hallucinate confidently, or admit uncertainty?
- ✓ Guardrails live: input/output filtering for your specific content policy is implemented and tested
- ✓ Cost estimate approved: monthly token cost at expected volume modelled, reviewed, and budgeted
- ✓ Latency verified in staging: P50 and P99 measured under realistic load, not just manual testing
- ✓ Fallback path tested: what happens when the model API is down, rate-limited, or returns an error? Test the fallback.
- ✓ Prompt versioned: system prompt is in version control, not hardcoded — you can roll back in under 5 minutes
- ✓ Rate limits set: per-user token quotas to prevent abuse and runaway costs
Launch day
- ✓ Monitoring live: error rate, P99 latency, LLM quality sampling, and cost alerts are all configured and tested
- ✓ Canary deployment: 5–10% of traffic for the first 24–48 hours — enough real traffic to catch issues, small enough to limit blast radius
- ✓ On-call designated: someone specific is watching the dashboards during the first 24 hours
- ✓ User feedback mechanism: thumbs up/down or 'report this response' button is live — you need a feedback signal
- ✓ Rollback plan: the exact steps to disable the feature or revert to previous prompt are written down and shared with the team
First two weeks post-launch
- ✓ Quality review: sample 50 random production responses manually — read them, classify them, find the failure modes you didn't anticipate
- ✓ Eval set expansion: add the interesting production cases (especially failures) to your eval set
- ✓ Cost vs. budget: actual vs. projected cost comparison — identify which features or query types are driving unexpected cost
- ✓ User feedback review: read every piece of negative feedback — it contains your next iteration priorities
The two items most commonly skipped and most often regretted: the red team (everyone thinks their use case is safe until it isn't) and the fallback path (everyone assumes the API will be up until it isn't). Do both. Don't ship without them.
AI launch checklist template →: Download and customise the launch checklist for your team in the AI PM module.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →