Model Registry With MLflow: Versioning, Stage Transitions, and Audit Trails
What a model registry actually does and why 'models folder in S3' is not a registry. MLflow stage transitions (Staging → Production → Archived), loading by stage alias in serving code, and what metadata belongs in every registered version.
What a Model Registry Actually Does
A model registry is a versioned catalog of trained models. It answers: which model is in production right now, what version is it, when was it trained, on what data, with what evaluation metrics, and who approved the deployment? Without this, these questions are answered by Slack archaeology.
The registry is not the serving infrastructure. It's the metadata and artifact store that sits between training and deployment. Models are promoted through stages (Staging → Production → Archived) with explicit transitions that create an audit trail.
MLflow Model Registry — Core Workflow
import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient
# 1. TRAINING: log run + register model
with mlflow.start_run() as run:
# ... train model ...
# Log parameters and metrics
mlflow.log_params({"n_estimators": 200, "max_depth": 8, "min_samples_leaf": 10})
mlflow.log_metrics({"auc_roc": 0.91, "precision_at_k": 0.74, "ndcg": 0.82})
# Log dataset fingerprint to prevent future "what data was used?" questions
mlflow.log_param("training_data_hash", training_data_sha256)
mlflow.log_param("training_cutoff", "2024-03-01")
# Register model to registry (creates version N)
model_uri = f"runs:/{run.info.run_id}/model"
mlflow.register_model(model_uri, "churn_predictor")
# 2. EVALUATION: transition to Staging after passing eval
client = MlflowClient()
client.transition_model_version_stage(
name="churn_predictor",
version=7, # the version just registered
stage="Staging",
archive_existing_versions=False, # don't auto-archive current Staging
)
# 3. LOAD FROM STAGING for integration tests
model = mlflow.sklearn.load_model("models:/churn_predictor/Staging")
# 4. PROMOTE to Production after approval
client.transition_model_version_stage(
name="churn_predictor",
version=7,
stage="Production",
archive_existing_versions=True, # archive old Production version
)
client.update_model_version(
name="churn_predictor",
version=7,
description="Sprint 4 model. AUC 0.91 vs 0.88 previous. Approved by @avinash 2024-03-15."
)
Loading Production Model in Serving Code
# Serving code always loads from stage alias, not version number
# When you promote v8 to Production, serving auto-picks it up on next restart
import mlflow.sklearn
def load_production_model(model_name: str):
"""Load current production model — decoupled from version numbers."""
return mlflow.sklearn.load_model(f"models:/{model_name}/Production")
# Warm model on startup
model = load_production_model("churn_predictor")
# In FastAPI
@app.post("/predict")
async def predict(features: dict):
return {"score": float(model.predict_proba([list(features.values())])[0, 1])}
What to Store in the Registry
- Hyperparameters: every param used in training, not just the ones you tuned.
- Evaluation metrics: all metrics on all splits (train, val, test). Not just the headline number.
- Data lineage: hash of training dataset, cutoff date, version of preprocessing code.
- Model artifact: the serialized model file (pickle, ONNX, SavedModel).
- Inference dependencies: conda.yaml or requirements.txt that pins every dependency version.
- Approval trail: who approved promotion, when, and with what justification.
The minimum viable registry is a spreadsheet with model version, git hash, training date, and AUC. It's embarrassing but it's better than nothing. Migrate to MLflow when the spreadsheet becomes a source of arguments.
Registry Alternatives
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →