AI Engineering 11 min read

AI Governance in Production: Frameworks, Audits, and What Actually Works

EU AI Act, NIST AI RMF, and internal governance programs at Anthropic, Google, and Microsoft. What risk tiers mean for your product, how to structure model cards and audit trails, red-teaming cadences, and the governance gaps most teams miss before going to market.

Why Governance Is Now an Engineering Problem

AI governance used to be a policy team's concern. Now it's a deployment blocker. The EU AI Act, NIST AI RMF, and enterprise risk requirements are creating concrete technical requirements: model cards, audit trails, red-teaming evidence, and risk tier documentation. If you're shipping AI to enterprise customers, governance is part of your build.

The EU AI Act: Risk Tiers

The EU AI Act classifies AI systems by risk tier. Unacceptable risk: prohibited (social scoring, biometric mass surveillance). High risk: medical devices, HR systems, credit scoring — require conformity assessments, audit logs, human oversight mechanisms. Limited risk: chatbots — transparency requirements only. Minimal risk: everything else — no requirements.

Practical implication: if your AI system makes decisions that affect people's employment, credit, education, or healthcare, you're in the high-risk tier. This triggers conformity assessment requirements, mandatory logging, and human override mechanisms before EU deployment.

NIST AI Risk Management Framework

The NIST AI RMF (AI 100-1) provides a voluntary framework structured around four functions: GOVERN (establish AI risk culture and policies), MAP (identify and categorize risks), MEASURE (analyze and assess risks), MANAGE (prioritize and treat risks). Unlike the EU AI Act, it's not legislation — but it's increasingly referenced in US government contracts and enterprise procurement.

What Good Governance Actually Looks Like in Code

Model cards: document model capabilities, limitations, training data sources, and known failure modes — before deployment, not after
Audit logs: log every LLM call with prompt, response, user ID, and model version — you need this for incident investigation
Red-teaming cadence: structured adversarial testing before major model updates, documented and signed off
Human-in-the-loop gates: explicit override mechanisms for high-stakes decisions, not just soft suggestions
Drift monitoring: detect when model behavior changes post-deployment — output distribution shift is a governance risk

The Gaps Most Teams Miss

The most common governance failures in production AI: no audit trail (makes incident investigation impossible), model cards written after launch (captures known issues but misses deployment-time unknowns), and red-teaming treated as a checkbox rather than a continuous process. The teams that get this right treat governance artifacts as engineering deliverables with owners and deadlines.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →