Production & LLMOps 11 min read

Feature Stores: Solving Training-Serving Skew and Point-in-Time Correctness

The training-serving skew problem that feature stores exist to solve. Offline vs. online store architecture, point-in-time correct joins to prevent leakage, materialization cadence, and when you genuinely don't need a feature store. Feast code throughout.

Why Feature Stores Exist

Imagine building a churn prediction model. The data scientist computes rolling 30-day engagement features in a Jupyter notebook. The model ships to production. The engineer re-implements the same rolling window logic in the serving stack — in a different language, with a slightly different timestamp handling convention. Six months later you debug why the production model underperforms the offline evaluation. It's the feature skew.

A feature store solves three problems: (1) the training-serving skew from re-implementing features in multiple places; (2) the latency of computing expensive features at serving time; (3) the discoverability and reuse of features across teams.

The Two Stores

Every feature store has an offline store and an online store. They are different systems optimized for different access patterns.

Offline store: a column-oriented storage system (Parquet on S3, BigQuery, Redshift) optimized for bulk historical reads. Used for training data generation: 'give me all feature values for user X at timestamp T for every event in my training set'. Must support point-in-time correct joins to avoid leakage — a feature value seen after the label event must not appear in training data.

Online store: a key-value store (Redis, DynamoDB, Cassandra) optimized for low-latency single-entity lookups. Used at serving time: 'give me the current feature vector for user_id=12345 in <5ms'. Contains only the latest feature values, not history.

# Feast (open-source feature store) — minimal working pattern
from feast import FeatureStore, Entity, FeatureView, Field
from feast.types import Float32, Int64
from datetime import timedelta

# Define entity (the 'key' concept)
user = Entity(name="user_id", description="User identifier")

# Define feature view — specifies source, TTL, and schema
user_engagement = FeatureView(
    name="user_engagement",
    entities=[user],
    ttl=timedelta(days=7),   # evict from online store after 7 days of inactivity
    schema=[
        Field(name="rolling_30d_sessions",   dtype=Int64),
        Field(name="avg_session_duration_s",  dtype=Float32),
        Field(name="days_since_last_login",   dtype=Int64),
    ],
    source=bigquery_source,   # offline source
)

# Materialization: push offline → online
store = FeatureStore(repo_path=".")
store.materialize_incremental(end_date=datetime.utcnow())

# Online retrieval at serving time (<5ms)
feature_vector = store.get_online_features(
    features=["user_engagement:rolling_30d_sessions",
              "user_engagement:avg_session_duration_s"],
    entity_rows=[{"user_id": 12345}]
).to_dict()

Point-in-Time Correct Joins (Avoiding Leakage)

When generating training data, you must retrieve the feature value that was actually available at the time of each training event — not the current value. If a user churned on March 1st, their 30-day rolling engagement should be computed from February 1–28, not from the latest batch run.

import pandas as pd

# Training events: each row is (user_id, event_timestamp, label)
training_events = pd.DataFrame({
    "user_id":         [1001, 1002, 1001, 1003],
    "event_timestamp": pd.to_datetime(["2024-01-15", "2024-01-20", "2024-02-01", "2024-02-10"]),
    "churned":         [0, 1, 0, 1],
})

# Point-in-time correct join: Feast handles this automatically
training_data = store.get_historical_features(
    entity_df=training_events,      # timestamp column used for PIT join
    features=["user_engagement:rolling_30d_sessions",
              "user_engagement:avg_session_duration_s"],
).to_df()
# Each row now has the feature value AS OF event_timestamp — not the latest value

When You Don't Need a Feature Store

Feature stores add operational complexity. They're worth it when: (a) you have >3 models sharing features, (b) you've already been burned by training-serving skew, or (c) you need sub-5ms feature retrieval for a high-traffic serving path. For a single model with batch predictions, a well-named column in a database is often enough.

Feast (open source, flexible backend)
Tecton (managed, enterprise)
Hopsworks (open source, includes MLflow-like experiment tracking)
Vertex AI Feature Store (GCP)
SageMaker Feature Store (AWS)

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →