GenAI Systems Lab Open interactive version →
AI Engineering 9 min read

Guardrails for LLMs: Input/Output Filtering in Production

How guardrail pipelines work — input classifiers, output validators, topic filters, PII redaction, and toxicity detection. What fails at scale.

Guardrails are the safety layer between your users and your model. They intercept inputs before they reach the LLM and outputs before they reach the user, filtering, transforming, or blocking content that violates your policies.

Input guardrails

def check_input(user_message: str) -> tuple[bool, str]:
    """Returns (is_allowed, reason)"""

    # 1. PII detection (fast regex + NER)
    if contains_pii(user_message):
        return False, "pii_detected"

    # 2. Topic relevance (small classifier, <10ms)
    if not is_on_topic(user_message, allowed_topics=["product", "support"]):
        return False, "off_topic"

    # 3. Injection risk (embedding similarity to known attacks)
    if injection_score(user_message) > 0.85:
        return False, "injection_detected"

    return True, "ok"

Output guardrails

Architecture: where to place guardrails

Guardrails can run synchronously (blocking — adds latency) or asynchronously (non-blocking — you deliver the response and log violations for review). For safety-critical applications, synchronous input + output checks are mandatory. For high-volume consumer applications, async output checking with human review is more practical.

The fastest guardrails run in 5–20ms (regex, small classifiers). The most accurate run in 100–500ms (LLM-based judges). Design your pipeline to run fast checks first and only invoke expensive checks when the cheap ones raise flags.

Off-the-shelf vs. custom

OptionLatencyAccuracyCustomisability
Llama Guard (Meta)50–200msGood for common categoriesFine-tuneable
Azure Content Safety100–300msStrong on CSAM, violence, hateLimited
Guardrails AIVariesModular, schema validationHigh — composable
NeMo Guardrails100–400msDialogue flows + policiesHigh
Custom classifier5–50msBest for domain-specificFull control

The cost of guardrails — latency budget

Guardrails add latency. A full synchronous pipeline (input check → LLM → output check) can add 100–600ms depending on which classifiers you use. For real-time chat this is often unacceptable. The solution: run fast synchronous checks (regex, small classifier, <20ms) and offload slow checks (LLM judge, NLI model) to async post-processing that logs violations for review. Only synchronously block on high-confidence, high-severity signals.

Guardrail typeLatencySync or async?Use for
Regex / pattern match<1msSyncPII, obvious injection patterns
Small classifier (DistilBERT)5–20msSyncToxicity, topic filter, jailbreak
Llama Guard50–200msSync (critical) / async (standard)Safety categories
LLM-as-judge300–800msAsync onlyHallucination check, faithfulness

Explore guardrails in Concepts →: See input and output filtering in action on the platform.

→ Interactive: The AI Guardrails module in Systems Lab walks through guardrail patterns, failure modes, and decision frameworks.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →