GenAI Systems Lab Open interactive version →
AI Engineering 9 min read

Bias in LLM Outputs: Sources, Types, and What You Can Detect

Training data bias, demographic representation, positional bias in RAG, and confirmation bias in reasoning. How to surface and measure these in your system.

LLMs don't generate bias from nowhere. They learn it from us — from the text we wrote, the decisions we recorded, the stories we told. The uncomfortable truth is that an LLM trained on the internet will reflect the internet: its brilliance and its prejudices, its expertise and its blind spots.

This isn't a reason to not build with LLMs. It's a reason to build with your eyes open — to know the types of bias, where they come from, and what you can actually detect and mitigate versus what requires ongoing human oversight.

Types of bias in LLM outputs

TypeWhat it looks likeExample
Representation biasUnder- or over-representation of groups in training dataModel defaults to male pronouns for 'engineer', female for 'nurse'
Stereotype amplificationModel exaggerates group patterns beyond what training data showsConsistently associates certain ethnicities with crime in creative writing
Performance disparityModel quality degrades for certain languages/dialects/accentsWeaker reasoning in African American Vernacular English vs. Standard American English
Allocation biasModel systematically advantages or disadvantages groups in decisionsResume screener rates equivalent CVs lower for certain names
SycophancyModel agrees with the user's apparent beliefs regardless of truthChanges its assessment of a political claim when told which party the user supports
Recency/salience biasOver-weights recent or frequently-discussed eventsAssumes every business is a tech startup if context is ambiguous

Where bias enters

Training data

The web over-represents English, over-represents wealthy countries, over-represents male voices in certain domains, and contains historical text from periods with explicit discrimination. A model trained on this data learns these patterns as features, not bugs — unless explicit effort is made to counteract them.

RLHF and fine-tuning

Human feedback is not neutral. Annotators have their own cultural backgrounds, language preferences, and implicit assumptions about what a 'good' answer looks like. If the annotator pool is not diverse, RLHF can encode a narrow view of quality. Some alignment research suggests RLHF may amplify sycophancy — the model learns to please, not to be accurate.

Your prompt and context

Priming effects are real. Prompts that mention certain groups, use certain frames, or carry implicit assumptions shift model outputs measurably. An evaluation task described as 'written by a student in a disadvantaged school' generates harsher feedback than the identical essay described neutrally.

What you can detect

# Test if model treats equivalent CVs differently based on perceived demographics
NAMES_SET_A = ["Emily Walsh", "Michael Johnson", "Sarah Chen"]
NAMES_SET_B = ["Lakisha Washington", "Jamal Williams", "María García"]

def audit_bias(resume_template, evaluation_prompt):
    results = {}
    for name_a, name_b in zip(NAMES_SET_A, NAMES_SET_B):
        resume_a = resume_template.replace("{NAME}", name_a)
        resume_b = resume_template.replace("{NAME}", name_b)

        score_a = llm(evaluation_prompt + resume_a)
        score_b = llm(evaluation_prompt + resume_b)

        results[f"{name_a} vs {name_b}"] = {
            "score_a": extract_score(score_a),
            "score_b": extract_score(score_b),
            "delta": extract_score(score_a) - extract_score(score_b)
        }
    return results

The 'Are Emily and Lakisha scored the same?' test is not a comprehensive bias audit. It catches one dimension of one type of bias. Real bias auditing is multi-dimensional, ongoing, and requires domain expertise. A passing pairwise test does not mean your system is unbiased.

Mitigations that actually work

Run a bias audit →: Test your prompts for systematic disparities in the Playground module.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →