AI Engineering
18 min read
The Minimum Tech Stack for Every AI Role, Level, and Company Tier
Exactly what tools, frameworks, and skills are expected from AI Engineers, ML Engineers, AI PMs, FDEs, and MLOps engineers — at junior, mid, senior, and staff level.
What does 'knowing the stack' mean for an AI Engineer vs. an ML Engineer vs. an AI PM vs. a Field Developer Engineer? These roles share vocabulary but have almost completely different minimum competency sets. This is the definitive reference: what each role actually needs, at each level, at each company tier — and what gaps will get you screened out.
This post covers six roles: AI Engineer, ML Engineer, MLOps/LLMOps Engineer, Technical AI PM, Non-Technical AI PM, and Field Developer / Solutions Engineer. For each role, skills are layered: junior must have everything in the junior row; senior must have everything from junior + mid + senior.
How to read this guide
Each level is additive — senior means you have everything from junior and mid, plus the senior additions. Company tier shapes depth and specialisation but rarely changes the baseline. Where tier matters significantly (e.g., FAANG expects internal tooling fluency; frontier labs expect JAX/Triton), it's called out explicitly in the tier notes after each role.
Role 1: AI Engineer
AI Engineers build products and systems on top of foundation models. They use APIs, build RAG pipelines, design agent workflows, write evals, and own the LLM-powered feature end-to-end. They are not training models — they are building with models.
Junior AI Engineer (0–2 years)
| Category | Must have |
|---|
| Language | Python — functions, classes, async basics, virtual envs |
| LLM APIs | OpenAI or Anthropic SDK — basic chat completions, streaming, error handling |
| Prompting | System/user/assistant message structure, temperature, max_tokens |
| Data handling | JSON parsing, basic Pandas, reading CSVs and text files |
| Version control | Git — commit, branch, PR workflow |
| Environment | Can run a local dev server, understands .env files and API keys |
| Basic RAG | Can build a simple retrieval pipeline: embed → search → generate |
| Vector basics | Knows what cosine similarity is, has used one vector store (Chroma, Pinecone, or Qdrant) |
Mid AI Engineer (2–5 years)
| Category | Must have |
|---|
| RAG pipeline | Full RAG: chunking strategy, embedding model choice, hybrid search (BM25 + vector), reranking |
| Frameworks | LangChain or LlamaIndex — knows when to use and when to avoid them |
| Evals | Can build and run a basic offline eval suite. Knows LLM-as-judge, exact match, RAGAS |
| Structured output | Tool use / function calling, JSON schema validation, retry-on-error pattern |
| Agents | Has built at least one multi-step agent with tool use. Knows ReAct pattern |
| Observability | LangSmith or similar for tracing LLM calls. Can debug a broken agent from traces |
| Deployment | Docker basics, can deploy a FastAPI or Flask endpoint to a cloud provider |
| Prompt management | Prompts in version control, not hardcoded. Understands prompt caching |
| Cost awareness | Can estimate monthly token costs, knows price differences across model tiers |
Senior AI Engineer (5–8 years)
| Category | Must have |
|---|
| System design | Can design a full production AI system: retrieval, generation, guardrails, observability, fallback |
| Multi-agent | Supervisor / pipeline / mesh patterns. Handles agent state, retries, failure recovery |
| Evals at scale | CI-gated eval pipeline, LLM judge calibration, eval set maintenance strategy |
| Fine-tuning basics | Can explain LoRA/QLoRA trade-offs, knows when fine-tuning beats prompting |
| Guardrails | Input/output filtering pipeline, Llama Guard or Perspective API integration |
| Model selection | Can benchmark 3 models on their specific task and make a cost/quality recommendation |
| MCP / tool design | Designs tool contracts with clear schemas, error surfaces, and retry semantics |
| Infra | Kubernetes basics, CI/CD with GitHub Actions, knows how to set rate limits and circuit breakers |
| Mentoring | Can review junior/mid PRs on AI systems and explain the tradeoffs |
Staff / Principal AI Engineer (8+ years)
| Category | Must have |
|---|
| Platform thinking | Designs shared AI infra: model gateway, eval platform, prompt registry, cost dashboards |
| Strategy | Can make the build-vs-buy-vs-fine-tune call with data to back it up |
| Cross-team | Shapes how multiple product teams use AI — consistency, safety, shared tooling |
| Frontier awareness | Knows the capability curve of major model releases and their implications for the product |
| Research translation | Can read ML papers and determine if the technique is relevant and produceable |
| Hiring bar | Can design AI engineering interview loops and calibrate what 'good' looks like |
AI Engineer — Company tier differences
| Tier | Stack differences |
|---|
| Early-stage startup | Full stack often required (Next.js + backend + AI layer). Vercel AI SDK, Supabase pgvector. Ship fast, minimal tooling. |
| Growth-stage (Series B–D) | Dedicated AI team forming. LangSmith, DataDog, Sentry expected. GitHub Actions CI. Cost tracking required. |
| Enterprise | Azure OpenAI Service or AWS Bedrock (not direct API). Compliance tooling. Databricks or Snowflake for data. Heavy documentation. |
| FAANG / Big Tech | Internal model gateways and prompt registries. Custom eval frameworks. Production ML infra at scale. |
| Frontier AI Lab | May train models, not just use them. JAX or PyTorch at training scale. Direct access to unreleased models. |
Role 2: ML Engineer
ML Engineers own the model training and serving pipeline. They work closer to the model weights than AI Engineers. In 2025, most new ML Engineering work is LLM-adjacent: fine-tuning, RLHF pipelines, inference optimisation, and training infrastructure.
Junior ML Engineer (0–2 years)
| Category | Must have |
|---|
| Language | Python — comfortable with OOP, type hints, pytest |
| ML frameworks | PyTorch — build a neural network, understand forward/backward pass, optimisers |
| Data | NumPy, Pandas, HuggingFace datasets. Can load, inspect, and preprocess a dataset |
| Training basics | Training loop from scratch: forward pass, loss, .backward(), optimiser step |
| Experiment tracking | MLflow or W&B — log metrics, compare runs, save checkpoints |
| HuggingFace | Transformers library — load a pretrained model, run inference, fine-tune with Trainer API |
| Notebooks | Jupyter for experimentation, knows when to move to scripts |
| Cloud basics | Has trained a model on a cloud VM or managed service (SageMaker, Vertex, or Colab Pro) |
Mid ML Engineer (2–5 years)
| Category | Must have |
|---|
| Fine-tuning | LoRA / QLoRA — has fine-tuned a 7B+ model on a custom dataset |
| Distributed training | DataParallel or DistributedDataParallel. Understands gradient synchronisation |
| Data pipelines | Reproducible data processing: versioned datasets, deterministic splits, deduplication |
| Model serving | TorchServe, FastAPI + model loading, or vLLM. Understands batching and throughput |
| Evaluation | Task-specific metrics (BLEU, ROUGE, accuracy, F1), custom eval harness |
| Inference optimisation | Quantisation (GPTQ/AWQ), knows INT4 vs FP16 quality/speed tradeoff |
| Model registry | MLflow Model Registry or HuggingFace Hub — version and deploy models properly |
| Containerisation | Docker for ML — GPU Docker, model artifact management, reproducible environments |
Senior ML Engineer (5–8 years)
| Category | Must have |
|---|
| Large-scale training | FSDP, DeepSpeed ZeRO stages, gradient checkpointing. Can train 30B+ models on multi-GPU |
| RLHF pipeline | Has implemented or fine-tuned a reward model + PPO/DPO training loop |
| Infra design | GPU cluster setup, job scheduling (SLURM or K8s), distributed checkpoint strategy |
| Speculative decoding | Understands draft/verify pattern and when it applies |
| Custom CUDA/Triton | Can write a custom kernel for a performance bottleneck (or at minimum can read one) |
| Data flywheel | Designs feedback loops: production signals → training data → model improvement |
| ML platform | Owns the shared training infra for a team — experiment reproducibility, cost attribution |
| Research reading | Can read and implement key papers (LoRA, Flash Attention, etc.) within a sprint |
Staff / Principal ML Engineer (8+ years)
| Category | Must have |
|---|
| Architecture decisions | Selects base models, training approaches, and serving strategies for org-wide use |
| Hardware strategy | GPU procurement decisions: H100 vs A100 vs inference chips. ROI calculations |
| Compute efficiency | End-to-end FLOPs budget management across training and serving |
| Novel techniques | Evaluates and productionises techniques from recent papers before they're mainstream |
| Org-level impact | Training and serving infra decisions affect multiple product teams |
ML Engineer — Company tier differences
| Tier | Stack differences |
|---|
| Early-stage startup | Fine-tuning via HuggingFace + Modal or RunPod. No dedicated infra. Often hybrid AI Engineer + MLE role. |
| Growth-stage | Dedicated MLE role. W&B required. Modal/Lambda Labs for compute. MLflow for registry. |
| Enterprise | AWS SageMaker, Azure ML, or GCP Vertex. Databricks MLflow. Compliance and data governance heavy. |
| FAANG | Internal training frameworks (Meta's fairseq, Google's T5X/Flax). Enormous compute budgets. Specialised MLE tracks. |
| Frontier AI Lab | JAX + XLA is common (DeepMind, Google Brain). Triton kernels. Training at 1000s of GPUs. First-principles ML. |
Role 3: MLOps / LLMOps Engineer
MLOps Engineers own the infrastructure that makes AI systems reliable in production: training pipelines, serving infrastructure, monitoring, cost management, and the developer experience for AI teams. As LLMs become dominant, the role shifts toward LLMOps: prompt versioning, eval pipelines, observability, and model gateways.
Junior MLOps Engineer (0–2 years)
| Category | Must have |
|---|
| Cloud | AWS, GCP, or Azure — compute, storage, IAM basics. Can provision a GPU instance |
| Containers | Docker — build, run, push images. Understands Dockerfile best practices for ML |
| CI/CD | GitHub Actions or CircleCI — can write a pipeline that tests and deploys code |
| Python | Strong enough to write automation scripts, Makefile targets, data processing jobs |
| Experiment tracking | MLflow or W&B — set up tracking server, log runs, compare experiments |
| Monitoring basics | CloudWatch or Prometheus — can set up basic service health alerts |
Mid MLOps Engineer (2–5 years)
| Category | Must have |
|---|
| Orchestration | Kubernetes — pods, deployments, services, HPA. Can deploy a model serving endpoint |
| Workflow pipelines | Airflow, Prefect, or Kubeflow — orchestrate multi-step ML pipelines |
| Model serving | Seldon, BentoML, TorchServe, or vLLM — latency-optimised serving with health checks |
| LLM observability | LangSmith, Helicone, or Arize — trace LLM calls, track token costs, flag failures |
| Prompt management | Git-based prompt versioning. Eval gates before prompt promotion to production |
| Feature store | Feast or Tecton basics — online vs. offline feature pipelines |
| Cost tracking | Per-model, per-feature LLM cost dashboards. Budget alerts. Token quota enforcement |
| IaC | Terraform or Pulumi — provision ML infra as code, not click-ops |
Senior MLOps Engineer (5–8 years)
| Category | Must have |
|---|
| Platform design | Designs the internal AI platform: model registry, gateway, eval framework, observability stack |
| Model gateway | Builds a routing layer: rate limiting, model fallback, A/B traffic splitting, cost attribution |
| Eval CI/CD | Eval pipeline that gates prompt and model changes. Regression detection before prod |
| Multi-cloud | Can design and operate ML infra across providers. Vendor lock-in avoidance strategy |
| Security | API key management, audit logging, data isolation, PII scrubbing in LLM pipelines |
| SRE for LLMs | Incident response for AI failures, runbooks, latency regression diagnosis |
| Capacity planning | Models GPU and API quota requirements against product growth forecasts |
MLOps/LLMOps — Company tier differences
| Tier | Stack differences |
|---|
| Early-stage startup | Often no dedicated MLOps. Modal or Replicate for hosting. Railway or Render for APIs. Minimal monitoring. |
| Growth-stage | First MLOps hire. Buildkite/GitHub Actions CI, DataDog for monitoring, LangSmith for traces. |
| Enterprise | AWS SageMaker Pipelines or Azure ML Pipelines. Kubeflow or Vertex. Databricks for data. Compliance logging. |
| FAANG | Internal platforms (Meta's FBLearner, Google's Vertex internals). Dedicated LLMOps teams. Custom gateways. |
| Frontier Lab | Training infra at scale. SLURM cluster management. Custom checkpointing. GPU utilisation optimisation is its own specialty. |
Role 4: Technical AI PM
Technical AI PMs can read code, write prompts, build prototypes, and evaluate model outputs. They bridge research/engineering and product. They don't need to build production systems — but they need to understand them deeply enough to spec them precisely, debug quality issues, and make model trade-off calls.
Junior Technical AI PM (0–2 years)
| Category | Must have |
|---|
| APIs | Can call OpenAI or Anthropic API in Python or via Postman. Understands request structure |
| Prompting | Can write and iterate on system prompts. Understands few-shot, chain-of-thought, output format control |
| Token literacy | Knows what tokens are, how context windows work, and how pricing works |
| Basic RAG | Can explain what RAG is, why you'd use it, and what can go wrong |
| Evals basics | Understands the concept of a golden eval set and LLM-as-judge |
| Product sense | Can write a user story for an AI feature that includes failure modes |
| Data reading | Can read a confusion matrix, understand precision/recall trade-offs at a conceptual level |
Mid Technical AI PM (2–5 years)
| Category | Must have |
|---|
| Prototype building | Can build a working RAG or agent demo using LangChain/LlamaIndex to validate a product idea |
| Eval ownership | Owns the eval set for their AI feature. Can write judging rubrics and set pass/fail thresholds |
| AI PRD | Writes PRDs with: model spec, failure mode table, eval plan, guardrails requirements |
| Model selection | Can compare models on a benchmark task and articulate cost/quality/latency trade-offs |
| Observability | Uses LangSmith or similar to understand what the model is actually doing in production |
| Guardrails literacy | Can spec input/output filtering requirements for a feature and work with eng to implement |
| A/B testing LLMs | Understands how to run experiments on AI features (not the same as deterministic A/B tests) |
| Hallucination triage | Can diagnose why a model hallucinated on a specific input and propose a fix |
Senior Technical AI PM (5–8 years)
| Category | Must have |
|---|
| AI system design | Can sketch a production AI architecture (RAG pipeline, agent system, eval loop) on a whiteboard |
| Eval strategy | Designs the multi-layer eval strategy for a product area: unit, integration, production |
| Model partnerships | Can evaluate model providers, negotiate commercial terms, and manage vendor relationships |
| Safety governance | Owns the AI risk framework for their product area. Runs or coordinates red-team exercises |
| Exec communication | Can explain model quality regressions, cost spikes, and AI limitations to C-level stakeholders |
| Build vs. buy | Makes the call on fine-tuning vs. prompting vs. external service with data to back it up |
| Platform influence | Shapes how the AI platform team prioritises tooling based on product team needs |
Staff Technical AI PM
| Category | Must have |
|---|
| AI strategy | Defines the AI product vision and 2–3 year roadmap for a product area or business unit |
| Research awareness | Tracks frontier model capabilities and anticipates how they shift product opportunities |
| Cross-functional | Aligns safety, legal, engineering, and business on AI governance policies |
| Thought leadership | Published AI product perspectives (internal or external) that influence the field |
Technical AI PM — Company tier differences
| Tier | Stack differences |
|---|
| Early-stage startup | More hands-on than typical PM — expected to build prototypes, write prompts, and review eval results directly |
| Growth-stage | Dedicated AI PM role. Expected to own eval pipeline, run model benchmarks, write AI PRDs independently |
| Enterprise | Compliance and governance skills critical. Azure/AWS AI service literacy. Working with legal on AI risk |
| FAANG | Works with internal models. Deep familiarity with internal eval frameworks and model cards required |
| Frontier Lab | TPM-style role. Deep technical depth, often with an eng or research background. Shapes research priorities |
Role 5: Non-Technical AI PM
Non-technical AI PMs come from product, business, or domain backgrounds. They don't write code. But they need to be sophisticated enough to spec AI features precisely, challenge engineering decisions with evidence, and avoid the two classic failure modes: over-trusting the model and under-specifying the requirements.
Junior Non-Technical AI PM (0–2 years)
| Category | Must have |
|---|
| Hands-on usage | Power user of ChatGPT, Claude, Gemini — knows their strengths, limitations, and prompt strategies |
| Basic prompting | Can write a system prompt and iterate on it without engineering help |
| Vocabulary | Fluent in: tokens, context window, hallucination, RAG, embeddings, temperature, fine-tuning, evals |
| Failure modes | Can name and describe the 5 main ways LLMs fail (hallucination, context limits, injection, bias, inconsistency) |
| AI product examples | Has studied 3+ AI products deeply — how they work, what problems they solve, how they fail |
| Data intuition | Comfortable reading a bar chart of model scores. Understands what 'better on evals' means |
Mid Non-Technical AI PM (2–5 years)
| Category | Must have |
|---|
| AI PRD | Writes AI feature PRDs with model requirements, failure mode tables, and eval criteria |
| Eval literacy | Can review an eval dashboard, identify regressions, and ask the right questions of engineering |
| User research | Runs user research specifically around AI trust, confusion, and error handling expectations |
| Vendor evaluation | Can evaluate AI tool vendors: asks about model cards, SLAs, data retention, compliance |
| Guardrails spec | Can define the content policy for an AI feature and translate it into engineering requirements |
| Cost literacy | Understands token costs, can estimate monthly AI feature spend, knows the cost levers |
| Metrics design | Defines success metrics for AI features that aren't just engagement (quality, trust, task completion) |
Senior Non-Technical AI PM (5–8 years)
| Category | Must have |
|---|
| AI strategy | Builds the AI roadmap for a product area: sequenced investments, build-vs-buy calls, maturity model |
| Risk frameworks | Runs AI risk assessments: identifies high-risk use cases, proposes mitigations, documents decisions |
| Safety ownership | Works with legal, compliance, and safety to define and enforce AI usage policies |
| Competitive intel | Tracks competitor AI features systematically. Identifies product differentiation through AI capabilities |
| Exec storytelling | Can communicate AI product strategy, progress, and risks to board-level stakeholders clearly |
Non-Technical AI PM — Company tier differences
| Tier | Stack differences |
|---|
| Early-stage startup | Often not a distinct role — founder or generalist PM owns AI product. Must be hands-on with the model directly. |
| Growth-stage | First AI PM hire. Expected to be self-sufficient on prompting, eval reading, and vendor research. |
| Enterprise | Heavy compliance, procurement, and stakeholder management load. Legal fluency around AI risk required. |
| FAANG | Works alongside a technical AI PM or TPM. Focuses on market strategy, user research, and business model. |
Role 6: Field Developer Engineer / Solutions Engineer
FDEs (also called Solutions Engineers, Developer Advocates, or AI Customer Engineers) work at the interface between a model provider or AI platform and its enterprise customers. They write demo code, lead customer workshops, debug integration issues, and translate customer requirements into product feedback. The role requires both technical depth and customer-facing communication.
Junior FDE / Solutions Engineer (0–2 years)
| Category | Must have |
|---|
| API fluency | Can demo any core API feature live, from scratch, without notes. Handles unexpected questions confidently |
| Sample code | Has built 5+ small working demos across different use cases (RAG, agents, summarisation, classification) |
| Language breadth | Python required. JavaScript/TypeScript strongly preferred (most enterprise integration is JS) |
| Explanation skills | Can explain embeddings, RAG, and function calling to a non-technical developer audience |
| Debugging | Can diagnose API errors, rate limit issues, and bad outputs in front of a customer without panicking |
| Documentation | Deep familiarity with the API docs, model cards, and changelog. Knows where to look quickly |
Mid FDE / Solutions Engineer (2–5 years)
| Category | Must have |
|---|
| Architecture guidance | Can review a customer's AI architecture and identify failure points, cost inefficiencies, or missing guardrails |
| Integration depth | Has built full integrations: CRM, enterprise search, customer data platforms. Knows OAuth, webhooks, enterprise auth |
| Workshop facilitation | Runs customer workshops: prompt engineering, RAG design, agent patterns. Can handle live Q&A from senior engineers |
| Competitive knowledge | Deep comparative knowledge: where the product wins, where it doesn't, and how to position honestly |
| Escalation | Can triage a complex customer issue, write a clear internal escalation report, and follow it to resolution |
| Feedback loop | Turns customer pain points into structured product feedback. Has relationships with PM and engineering |
| Vertical knowledge | Deep expertise in 1–2 industries (fintech, healthcare, legal) — knows the compliance, data, and use case landscape |
Senior FDE / Solutions Engineer (5–8 years)
| Category | Must have |
|---|
| Reference architecture | Authors and maintains reference architectures for key customer use cases |
| Executive engagement | Can lead an executive briefing on AI strategy, discuss risk, ROI, and roadmap at C-level |
| Technical depth | Can go 5 levels deep on any product feature — from API parameter to serving infrastructure |
| Partner ecosystem | Knows the partner landscape (system integrators, consultancies) and manages key technical relationships |
| Enablement | Builds training content and technical enablement programs for partner engineers |
| Product influence | Has shaped product priorities through sustained, evidence-based customer feedback |
FDE / Solutions Engineer — Company tier differences
| Tier | Stack differences |
|---|
| AI-native startup | Wears many hats: part sales engineer, part DevRel, part customer success. Must be an excellent communicator and fast learner. |
| Growth-stage | Dedicated SE team forming. Expected to build polished demo environments and run customer PoCs independently. |
| Enterprise AI vendor | Deep enterprise integration expertise: SSO, compliance, data residency, procurement. Custom PoC development. |
| FAANG (cloud AI) | Scale and breadth: support hundreds of customers. Strong documentation and self-serve tooling skills required. |
Universal skills every AI role needs
Regardless of role, there are six skills that are expected at every level and in every company. These are the things that get you past the basics filter in any AI interview.
| Skill | What 'competent' looks like |
|---|
| LLM mental model | Can explain what happens inside a transformer at a conceptual level. Knows tokens, embeddings, attention. |
| Hallucination literacy | Can explain why models hallucinate, name 3 common triggers, and describe mitigation strategies for each. |
| RAG conceptual | Can explain naive RAG end-to-end, name its 3 main failure modes, and describe one architectural improvement. |
| Cost thinking | Has a rough intuition for token costs. Can back-of-envelope estimate monthly LLM spend for a feature. |
| Safety awareness | Knows prompt injection, jailbreaks, and output filtering. Can identify unsafe AI feature designs. |
| Eval mindset | Understands why you can't manually test an LLM feature and what eval automation requires. |
The stack evolution: what to add in 2025–2026
The AI stack moves fast. These are the skills that are transitioning from 'nice-to-have' to 'expected' over the next 12–18 months:
- MCP (Model Context Protocol): already expected at senior AI engineer level; will be expected at mid level within 12 months
- Agentic evaluation: testing multi-step agent workflows with success rate and error recovery metrics — rapidly becoming standard
- Multi-modal pipelines: vision + text is moving from experimental to production; expected at mid level for AI engineers building consumer products
- Reasoning model usage: knowing when to invoke o1/o3-class models vs. standard models, and how to structure prompts differently for them
- AI governance documentation: model cards, data cards, AI impact assessments — expected in enterprise and regulated industries at all seniority levels
- Vibe coding literacy: engineers who can't accelerate their own coding with AI tools (Cursor, Copilot, Claude Code) are at a compounding disadvantage
The biggest career risk in AI right now is over-specialising on one framework (LangChain, LlamaIndex) or one model provider. Frameworks change every 6 months. The durable skills are the conceptual foundations: how retrieval works, how agents are structured, how evals are designed — not the specific library API.
Explore all AI career paths →: Salary guides, role definitions, and learning paths for every AI role in the Careers section.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →