AI Engineering 11 min read

Staff AI Engineer: What to Do in Week 1 (and What Not To)

Read the on-call runbook before the codebase. Interview users before forming architecture opinions. Find the highest-leverage model improvement before proposing anything. The week-1 anti-patterns that signal overconfidence. What the staff interview question about week 1 actually wants.

Staff AI Engineer: What to Do in Week 1 (and What Not To)

The most common staff+ failure mode is proposing architecture changes before understanding what's actually failing in production. Week 1 at a new staff role is not about demonstrating competence by moving fast. It's about building the foundation to move correctly later.

Day 1-2: The Runbook Before the Code

Read the on-call runbook before you read the codebase. The runbook is the ground truth about what breaks, how often, and what the workarounds are. It tells you what the team has learned the hard way — what the architecture diagrams won't.

What to extract from the runbook: which alerts fire most frequently? Which incidents require manual intervention? Which fixes are 'restart the service' vs. 'requires engineering time'? Which systems have no runbook entry — those are the dark matter, the things that break silently. What this tells you: the runbook maps directly to the team's technical debt and monitoring gaps. The things that fire alerts every Tuesday at 3am are where you should spend your first month, not on the rewrite you're already planning. Red flag if you skip this: you'll propose changes that break things the runbook was protecting against. The team will notice. Trust is expensive to rebuild.

Day 2-4: User Interviews Before Architecture Opinions

Talk to the people who use the system you now own before forming any opinion about what should change. Internal users (product managers, downstream engineers, data scientists) know things about system behavior that aren't in any document.

Questions to ask: what do you wish the system did that it doesn't? What do you work around? What breaks your workflow? What would you pay for (in engineering time) if you could have it? What this gives you: a ranked list of actual user pain points vs. the architecture-aesthetic opinions that tend to dominate in engineering discussions. The trap: skipping user interviews because you already know what's wrong technically. You know what's wrong technically from the outside. You don't know what's wrong from the user's perspective until you ask.

Day 3-5: Find the Highest-Leverage Model Improvement

Before any architectural work, identify the single highest-leverage model improvement — the change with the best ratio of expected impact to implementation risk. This is usually not the most exciting change, but it's the one that builds trust fastest.

How to find it: look at error analysis from the last 3 production incidents. What failure mode appears most frequently? Is it a data issue, a model issue, or a deployment issue? The answer determines the intervention. Data issue: the model is correct given its training data, but the training data doesn't represent the production distribution. Fix: data collection change, not model change. Model issue: the model is making systematic errors on a specific input type. Fix: targeted fine-tuning or a rule layer, not a full retraining. Deployment issue: the model is correct but the surrounding system (retrieval, feature pipeline, serving) is introducing errors. Fix: the surrounding system, not the model. The highest-leverage intervention is almost always upstream of the model.

Week 1 Anti-Patterns

Proposing a rewrite: you don't know what you don't know. The system that looks messy from the outside has embedded knowledge about failure modes that a rewrite would lose. If you propose a rewrite in week 1, the team will assume you haven't understood the existing system — because you haven't. Presenting a 30/60/90-day plan before listening: a plan built before you understand the system is based on assumptions, not knowledge. It signals overconfidence, not competence. Optimizing for visibility over impact: staff engineers who optimize for being seen (in meetings, on Slack, in docs) rather than for actual system improvement are common and easy to spot. Pick the unglamorous high-impact work over the glamorous low-impact work. Ignoring the social architecture: every engineering system has a social system underneath it. Who owns what decisions? Who needs to be consulted? Who blocks? Getting this wrong costs more than getting the technical architecture wrong.

The Staff Interview Question This Maps To

'You join as staff ML engineer at a company with a broken recommendation system. In the first 30 days, what do you do?' The answer they want: read the runbook → interview users and downstream teams → error analysis on production failures → identify highest-leverage intervention → propose one concrete improvement with a reversibility plan. The answer they don't want: 'I'd rebuild it on a modern two-tower architecture.'

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →