GenAI Systems Lab Open interactive version →
Foundations & Architecture 12 min read

Probabilistic Graphical Models: Bayesian Networks, MRFs, Latent Variable Models, and EM

Bayesian networks factorize the joint distribution along a DAG. D-separation and the collider bias (conditioning on a common effect creates spurious correlation). MRFs and CRFs for sequence labeling. Latent variable models (GMM, LDA, VAE) and when to use EM vs. variational inference.

Probabilistic Graphical Models: Bayesian Networks, MRFs, and Latent Variable Models

Probabilistic graphical models (PGMs) are the language of structured uncertainty. When your data has known conditional independence structure — a document topic influences its words, a user's intent influences their query — PGMs let you encode that structure into the model rather than forcing a neural net to learn it from scratch.

Bayesian Networks: Directed Graphical Models

A Bayesian network encodes the joint distribution P(X1, ..., Xn) as a product of conditional distributions, one per node. Each node is conditionally independent of its non-descendants given its parents.

# Joint distribution factorizes along the graph
# For a simple chain A → B → C:
# P(A, B, C) = P(A) × P(B|A) × P(C|B)

# D-separation: X and Y are conditionally independent given Z
# if Z d-separates X from Y in the graph
# Three patterns:
# Chain: X → Z → Y   |  X ⊥ Y | Z  (Z blocks the path)
# Fork:  X ← Z → Y   |  X ⊥ Y | Z  (Z blocks the path)
# Collider: X → Z ← Y | X and Y are independent, but X ⊥̸ Y | Z (conditioning OPENS path)

The collider pattern is counterintuitive: X and Y are marginally independent, but conditioning on their common effect Z creates dependence. Example: Disease and Injury are independent causes of Hospitalization. Knowing someone is hospitalized makes Disease and Injury correlated — if they're not injured, they're more likely diseased.

Markov Random Fields: Undirected Graphical Models

MRFs encode pairwise potentials between connected variables. No direction — good for modeling symmetric relationships like pixel neighborhoods in images or word co-occurrences in text. The joint distribution is a product of potential functions over cliques, normalized by the partition function Z (which is usually intractable).

Where MRFs appear in ML: CRFs (Conditional Random Fields) are discriminative MRFs used for sequence labeling (NER, POS tagging). The transition matrix in a CRF encodes how likely one label is to follow another.

Latent Variable Models

Latent variable models assume observed data X is generated from unobserved (latent) variables Z. Learning requires marginalizing over Z — summing or integrating over all possible values of the latent variables. This is usually intractable exactly.

The EM Algorithm

For latent variable models where the posterior P(Z|X) is tractable, EM finds a local maximum of the marginal likelihood P(X). E-step: compute Q(Z) = P(Z|X, θ_old). M-step: update θ to maximize E_Q[log P(X,Z|θ)]. Repeat.

# EM for Gaussian Mixture Model (sketch)
# E-step: compute responsibilities r_nk = P(z_k | x_n, params)
# r_nk = π_k * N(x_n | μ_k, Σ_k) / Σ_j π_j * N(x_n | μ_j, Σ_j)

# M-step: update parameters using weighted MLE
# N_k = Σ_n r_nk  (effective number of points in cluster k)
# π_k = N_k / N
# μ_k = Σ_n r_nk * x_n / N_k
# Σ_k = Σ_n r_nk * (x_n - μ_k)(x_n - μ_k)^T / N_k

When Applied Scientist Interviews Probe This

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →