AI Governance in Production: Frameworks, Audits, and What Actually Works
EU AI Act, NIST AI RMF, and internal governance programs at Anthropic, Google, and Microsoft. What risk tiers mean for your product, how to structure model cards and audit trails, red-teaming cadences, and the governance gaps most teams miss before going to market.
Why Governance Is Now an Engineering Problem
AI governance used to be a policy team's concern. Now it's a deployment blocker. The EU AI Act, NIST AI RMF, and enterprise risk requirements are creating concrete technical requirements: model cards, audit trails, red-teaming evidence, and risk tier documentation. If you're shipping AI to enterprise customers, governance is part of your build.
The EU AI Act: Risk Tiers
The EU AI Act classifies AI systems by risk tier. Unacceptable risk: prohibited (social scoring, biometric mass surveillance). High risk: medical devices, HR systems, credit scoring — require conformity assessments, audit logs, human oversight mechanisms. Limited risk: chatbots — transparency requirements only. Minimal risk: everything else — no requirements.
Practical implication: if your AI system makes decisions that affect people's employment, credit, education, or healthcare, you're in the high-risk tier. This triggers conformity assessment requirements, mandatory logging, and human override mechanisms before EU deployment.
NIST AI Risk Management Framework
The NIST AI RMF (AI 100-1) provides a voluntary framework structured around four functions: GOVERN (establish AI risk culture and policies), MAP (identify and categorize risks), MEASURE (analyze and assess risks), MANAGE (prioritize and treat risks). Unlike the EU AI Act, it's not legislation — but it's increasingly referenced in US government contracts and enterprise procurement.
What Good Governance Actually Looks Like in Code
- Model cards: document model capabilities, limitations, training data sources, and known failure modes — before deployment, not after
- Audit logs: log every LLM call with prompt, response, user ID, and model version — you need this for incident investigation
- Red-teaming cadence: structured adversarial testing before major model updates, documented and signed off
- Human-in-the-loop gates: explicit override mechanisms for high-stakes decisions, not just soft suggestions
- Drift monitoring: detect when model behavior changes post-deployment — output distribution shift is a governance risk
The Gaps Most Teams Miss
The most common governance failures in production AI: no audit trail (makes incident investigation impossible), model cards written after launch (captures known issues but misses deployment-time unknowns), and red-teaming treated as a checkbox rather than a continuous process. The teams that get this right treat governance artifacts as engineering deliverables with owners and deadlines.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →