Managing AI Engineers: Performance, Growth, and Retention in ML Teams
AI engineers stagnate at three transitions. Calibrate on process (experiment design, failure documentation) not outcomes. The graveyard problem. Growth conversations owned by the engineer outperform manager-directed plans.
AI engineers are not interchangeable with software engineers. They have different growth curves, different motivations, and different retention risks. Managing them well requires understanding what makes the work meaningful to them — and what destroys that meaning.
The AI Engineer Growth Model
AI engineers typically stagnate at one of three transitions: (1) moving from running experiments to designing them, (2) moving from designing experiments to making production systems reliable, (3) moving from execution to organizational leverage. Each transition requires a different type of support.
- Junior → mid: teach experiment design — what makes a good ablation, how to isolate variables, how to know when a negative result is a real result. Mid → senior: teach production thinking — latency budgets, failure modes, monitoring, the cost of being wrong in production. Senior → staff: teach organizational leverage — influencing without authority, making decisions under uncertainty, mentoring.
Performance Calibration in AI Teams
AI work is hard to calibrate because outcomes are uncertain and attribution is murky. Who gets credit for a model improvement — the engineer who designed the experiment, the one who found the data issue, the one who wrote the infrastructure?
Calibrate on process, not outcomes. An engineer who ran a rigorous experiment that returned a negative result is performing well. An engineer who shipped an improvement through lucky hyperparameter tuning without understanding why it worked is not.
What to Look for at Each Level
The Retention Problem
AI engineers leave for three reasons: (1) they're not learning anymore, (2) their work isn't getting used, (3) they don't believe in the product direction. The manager can fix (1) and (2) directly. (3) is a signal about the business, not the team.
- Not learning: rotate engineers through different problem types. An engineer who's run RAG experiments for two years will leave. One who's done RAG, then serving infrastructure, then evals is growing. Work not used: the graveyard problem — models trained, never deployed. Enforce a deployment gate: nothing goes to prod training without a deployment plan. Product direction: be honest about this in 1:1s. If you can't give them a credible answer about why the product matters, you have an organizational problem.
Growth Conversations That Actually Work
Most growth conversations fail because they're vague ('you need to have more impact') or positional ('to get to senior you need to...'). Effective growth conversations are specific, behavioral, and self-directed.
- Ask: what do you want to be able to do in 12 months that you can't do now? Then: what's the gap between now and that, and what's the smallest experiment we could run to close it? Ownership: the engineer owns the growth plan. Your job is to create the conditions — the right project, the right feedback, the right exposure. Cadence: review quarterly, not at the annual perf cycle.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →