GenAI Systems Lab Open interactive version →
AI Engineering 18 min read

The Minimum Tech Stack for Every AI Role, Level, and Company Tier

Exactly what tools, frameworks, and skills are expected from AI Engineers, ML Engineers, AI PMs, FDEs, and MLOps engineers — at junior, mid, senior, and staff level.

What does 'knowing the stack' mean for an AI Engineer vs. an ML Engineer vs. an AI PM vs. a Field Developer Engineer? These roles share vocabulary but have almost completely different minimum competency sets. This is the definitive reference: what each role actually needs, at each level, at each company tier — and what gaps will get you screened out.

This post covers six roles: AI Engineer, ML Engineer, MLOps/LLMOps Engineer, Technical AI PM, Non-Technical AI PM, and Field Developer / Solutions Engineer. For each role, skills are layered: junior must have everything in the junior row; senior must have everything from junior + mid + senior.

How to read this guide

Each level is additive — senior means you have everything from junior and mid, plus the senior additions. Company tier shapes depth and specialisation but rarely changes the baseline. Where tier matters significantly (e.g., FAANG expects internal tooling fluency; frontier labs expect JAX/Triton), it's called out explicitly in the tier notes after each role.


Role 1: AI Engineer

AI Engineers build products and systems on top of foundation models. They use APIs, build RAG pipelines, design agent workflows, write evals, and own the LLM-powered feature end-to-end. They are not training models — they are building with models.

Junior AI Engineer (0–2 years)

CategoryMust have
LanguagePython — functions, classes, async basics, virtual envs
LLM APIsOpenAI or Anthropic SDK — basic chat completions, streaming, error handling
PromptingSystem/user/assistant message structure, temperature, max_tokens
Data handlingJSON parsing, basic Pandas, reading CSVs and text files
Version controlGit — commit, branch, PR workflow
EnvironmentCan run a local dev server, understands .env files and API keys
Basic RAGCan build a simple retrieval pipeline: embed → search → generate
Vector basicsKnows what cosine similarity is, has used one vector store (Chroma, Pinecone, or Qdrant)

Mid AI Engineer (2–5 years)

CategoryMust have
RAG pipelineFull RAG: chunking strategy, embedding model choice, hybrid search (BM25 + vector), reranking
FrameworksLangChain or LlamaIndex — knows when to use and when to avoid them
EvalsCan build and run a basic offline eval suite. Knows LLM-as-judge, exact match, RAGAS
Structured outputTool use / function calling, JSON schema validation, retry-on-error pattern
AgentsHas built at least one multi-step agent with tool use. Knows ReAct pattern
ObservabilityLangSmith or similar for tracing LLM calls. Can debug a broken agent from traces
DeploymentDocker basics, can deploy a FastAPI or Flask endpoint to a cloud provider
Prompt managementPrompts in version control, not hardcoded. Understands prompt caching
Cost awarenessCan estimate monthly token costs, knows price differences across model tiers

Senior AI Engineer (5–8 years)

CategoryMust have
System designCan design a full production AI system: retrieval, generation, guardrails, observability, fallback
Multi-agentSupervisor / pipeline / mesh patterns. Handles agent state, retries, failure recovery
Evals at scaleCI-gated eval pipeline, LLM judge calibration, eval set maintenance strategy
Fine-tuning basicsCan explain LoRA/QLoRA trade-offs, knows when fine-tuning beats prompting
GuardrailsInput/output filtering pipeline, Llama Guard or Perspective API integration
Model selectionCan benchmark 3 models on their specific task and make a cost/quality recommendation
MCP / tool designDesigns tool contracts with clear schemas, error surfaces, and retry semantics
InfraKubernetes basics, CI/CD with GitHub Actions, knows how to set rate limits and circuit breakers
MentoringCan review junior/mid PRs on AI systems and explain the tradeoffs

Staff / Principal AI Engineer (8+ years)

CategoryMust have
Platform thinkingDesigns shared AI infra: model gateway, eval platform, prompt registry, cost dashboards
StrategyCan make the build-vs-buy-vs-fine-tune call with data to back it up
Cross-teamShapes how multiple product teams use AI — consistency, safety, shared tooling
Frontier awarenessKnows the capability curve of major model releases and their implications for the product
Research translationCan read ML papers and determine if the technique is relevant and produceable
Hiring barCan design AI engineering interview loops and calibrate what 'good' looks like

AI Engineer — Company tier differences

TierStack differences
Early-stage startupFull stack often required (Next.js + backend + AI layer). Vercel AI SDK, Supabase pgvector. Ship fast, minimal tooling.
Growth-stage (Series B–D)Dedicated AI team forming. LangSmith, DataDog, Sentry expected. GitHub Actions CI. Cost tracking required.
EnterpriseAzure OpenAI Service or AWS Bedrock (not direct API). Compliance tooling. Databricks or Snowflake for data. Heavy documentation.
FAANG / Big TechInternal model gateways and prompt registries. Custom eval frameworks. Production ML infra at scale.
Frontier AI LabMay train models, not just use them. JAX or PyTorch at training scale. Direct access to unreleased models.

Role 2: ML Engineer

ML Engineers own the model training and serving pipeline. They work closer to the model weights than AI Engineers. In 2025, most new ML Engineering work is LLM-adjacent: fine-tuning, RLHF pipelines, inference optimisation, and training infrastructure.

Junior ML Engineer (0–2 years)

CategoryMust have
LanguagePython — comfortable with OOP, type hints, pytest
ML frameworksPyTorch — build a neural network, understand forward/backward pass, optimisers
DataNumPy, Pandas, HuggingFace datasets. Can load, inspect, and preprocess a dataset
Training basicsTraining loop from scratch: forward pass, loss, .backward(), optimiser step
Experiment trackingMLflow or W&B — log metrics, compare runs, save checkpoints
HuggingFaceTransformers library — load a pretrained model, run inference, fine-tune with Trainer API
NotebooksJupyter for experimentation, knows when to move to scripts
Cloud basicsHas trained a model on a cloud VM or managed service (SageMaker, Vertex, or Colab Pro)

Mid ML Engineer (2–5 years)

CategoryMust have
Fine-tuningLoRA / QLoRA — has fine-tuned a 7B+ model on a custom dataset
Distributed trainingDataParallel or DistributedDataParallel. Understands gradient synchronisation
Data pipelinesReproducible data processing: versioned datasets, deterministic splits, deduplication
Model servingTorchServe, FastAPI + model loading, or vLLM. Understands batching and throughput
EvaluationTask-specific metrics (BLEU, ROUGE, accuracy, F1), custom eval harness
Inference optimisationQuantisation (GPTQ/AWQ), knows INT4 vs FP16 quality/speed tradeoff
Model registryMLflow Model Registry or HuggingFace Hub — version and deploy models properly
ContainerisationDocker for ML — GPU Docker, model artifact management, reproducible environments

Senior ML Engineer (5–8 years)

CategoryMust have
Large-scale trainingFSDP, DeepSpeed ZeRO stages, gradient checkpointing. Can train 30B+ models on multi-GPU
RLHF pipelineHas implemented or fine-tuned a reward model + PPO/DPO training loop
Infra designGPU cluster setup, job scheduling (SLURM or K8s), distributed checkpoint strategy
Speculative decodingUnderstands draft/verify pattern and when it applies
Custom CUDA/TritonCan write a custom kernel for a performance bottleneck (or at minimum can read one)
Data flywheelDesigns feedback loops: production signals → training data → model improvement
ML platformOwns the shared training infra for a team — experiment reproducibility, cost attribution
Research readingCan read and implement key papers (LoRA, Flash Attention, etc.) within a sprint

Staff / Principal ML Engineer (8+ years)

CategoryMust have
Architecture decisionsSelects base models, training approaches, and serving strategies for org-wide use
Hardware strategyGPU procurement decisions: H100 vs A100 vs inference chips. ROI calculations
Compute efficiencyEnd-to-end FLOPs budget management across training and serving
Novel techniquesEvaluates and productionises techniques from recent papers before they're mainstream
Org-level impactTraining and serving infra decisions affect multiple product teams

ML Engineer — Company tier differences

TierStack differences
Early-stage startupFine-tuning via HuggingFace + Modal or RunPod. No dedicated infra. Often hybrid AI Engineer + MLE role.
Growth-stageDedicated MLE role. W&B required. Modal/Lambda Labs for compute. MLflow for registry.
EnterpriseAWS SageMaker, Azure ML, or GCP Vertex. Databricks MLflow. Compliance and data governance heavy.
FAANGInternal training frameworks (Meta's fairseq, Google's T5X/Flax). Enormous compute budgets. Specialised MLE tracks.
Frontier AI LabJAX + XLA is common (DeepMind, Google Brain). Triton kernels. Training at 1000s of GPUs. First-principles ML.

Role 3: MLOps / LLMOps Engineer

MLOps Engineers own the infrastructure that makes AI systems reliable in production: training pipelines, serving infrastructure, monitoring, cost management, and the developer experience for AI teams. As LLMs become dominant, the role shifts toward LLMOps: prompt versioning, eval pipelines, observability, and model gateways.

Junior MLOps Engineer (0–2 years)

CategoryMust have
CloudAWS, GCP, or Azure — compute, storage, IAM basics. Can provision a GPU instance
ContainersDocker — build, run, push images. Understands Dockerfile best practices for ML
CI/CDGitHub Actions or CircleCI — can write a pipeline that tests and deploys code
PythonStrong enough to write automation scripts, Makefile targets, data processing jobs
Experiment trackingMLflow or W&B — set up tracking server, log runs, compare experiments
Monitoring basicsCloudWatch or Prometheus — can set up basic service health alerts

Mid MLOps Engineer (2–5 years)

CategoryMust have
OrchestrationKubernetes — pods, deployments, services, HPA. Can deploy a model serving endpoint
Workflow pipelinesAirflow, Prefect, or Kubeflow — orchestrate multi-step ML pipelines
Model servingSeldon, BentoML, TorchServe, or vLLM — latency-optimised serving with health checks
LLM observabilityLangSmith, Helicone, or Arize — trace LLM calls, track token costs, flag failures
Prompt managementGit-based prompt versioning. Eval gates before prompt promotion to production
Feature storeFeast or Tecton basics — online vs. offline feature pipelines
Cost trackingPer-model, per-feature LLM cost dashboards. Budget alerts. Token quota enforcement
IaCTerraform or Pulumi — provision ML infra as code, not click-ops

Senior MLOps Engineer (5–8 years)

CategoryMust have
Platform designDesigns the internal AI platform: model registry, gateway, eval framework, observability stack
Model gatewayBuilds a routing layer: rate limiting, model fallback, A/B traffic splitting, cost attribution
Eval CI/CDEval pipeline that gates prompt and model changes. Regression detection before prod
Multi-cloudCan design and operate ML infra across providers. Vendor lock-in avoidance strategy
SecurityAPI key management, audit logging, data isolation, PII scrubbing in LLM pipelines
SRE for LLMsIncident response for AI failures, runbooks, latency regression diagnosis
Capacity planningModels GPU and API quota requirements against product growth forecasts

MLOps/LLMOps — Company tier differences

TierStack differences
Early-stage startupOften no dedicated MLOps. Modal or Replicate for hosting. Railway or Render for APIs. Minimal monitoring.
Growth-stageFirst MLOps hire. Buildkite/GitHub Actions CI, DataDog for monitoring, LangSmith for traces.
EnterpriseAWS SageMaker Pipelines or Azure ML Pipelines. Kubeflow or Vertex. Databricks for data. Compliance logging.
FAANGInternal platforms (Meta's FBLearner, Google's Vertex internals). Dedicated LLMOps teams. Custom gateways.
Frontier LabTraining infra at scale. SLURM cluster management. Custom checkpointing. GPU utilisation optimisation is its own specialty.

Role 4: Technical AI PM

Technical AI PMs can read code, write prompts, build prototypes, and evaluate model outputs. They bridge research/engineering and product. They don't need to build production systems — but they need to understand them deeply enough to spec them precisely, debug quality issues, and make model trade-off calls.

Junior Technical AI PM (0–2 years)

CategoryMust have
APIsCan call OpenAI or Anthropic API in Python or via Postman. Understands request structure
PromptingCan write and iterate on system prompts. Understands few-shot, chain-of-thought, output format control
Token literacyKnows what tokens are, how context windows work, and how pricing works
Basic RAGCan explain what RAG is, why you'd use it, and what can go wrong
Evals basicsUnderstands the concept of a golden eval set and LLM-as-judge
Product senseCan write a user story for an AI feature that includes failure modes
Data readingCan read a confusion matrix, understand precision/recall trade-offs at a conceptual level

Mid Technical AI PM (2–5 years)

CategoryMust have
Prototype buildingCan build a working RAG or agent demo using LangChain/LlamaIndex to validate a product idea
Eval ownershipOwns the eval set for their AI feature. Can write judging rubrics and set pass/fail thresholds
AI PRDWrites PRDs with: model spec, failure mode table, eval plan, guardrails requirements
Model selectionCan compare models on a benchmark task and articulate cost/quality/latency trade-offs
ObservabilityUses LangSmith or similar to understand what the model is actually doing in production
Guardrails literacyCan spec input/output filtering requirements for a feature and work with eng to implement
A/B testing LLMsUnderstands how to run experiments on AI features (not the same as deterministic A/B tests)
Hallucination triageCan diagnose why a model hallucinated on a specific input and propose a fix

Senior Technical AI PM (5–8 years)

CategoryMust have
AI system designCan sketch a production AI architecture (RAG pipeline, agent system, eval loop) on a whiteboard
Eval strategyDesigns the multi-layer eval strategy for a product area: unit, integration, production
Model partnershipsCan evaluate model providers, negotiate commercial terms, and manage vendor relationships
Safety governanceOwns the AI risk framework for their product area. Runs or coordinates red-team exercises
Exec communicationCan explain model quality regressions, cost spikes, and AI limitations to C-level stakeholders
Build vs. buyMakes the call on fine-tuning vs. prompting vs. external service with data to back it up
Platform influenceShapes how the AI platform team prioritises tooling based on product team needs

Staff Technical AI PM

CategoryMust have
AI strategyDefines the AI product vision and 2–3 year roadmap for a product area or business unit
Research awarenessTracks frontier model capabilities and anticipates how they shift product opportunities
Cross-functionalAligns safety, legal, engineering, and business on AI governance policies
Thought leadershipPublished AI product perspectives (internal or external) that influence the field

Technical AI PM — Company tier differences

TierStack differences
Early-stage startupMore hands-on than typical PM — expected to build prototypes, write prompts, and review eval results directly
Growth-stageDedicated AI PM role. Expected to own eval pipeline, run model benchmarks, write AI PRDs independently
EnterpriseCompliance and governance skills critical. Azure/AWS AI service literacy. Working with legal on AI risk
FAANGWorks with internal models. Deep familiarity with internal eval frameworks and model cards required
Frontier LabTPM-style role. Deep technical depth, often with an eng or research background. Shapes research priorities

Role 5: Non-Technical AI PM

Non-technical AI PMs come from product, business, or domain backgrounds. They don't write code. But they need to be sophisticated enough to spec AI features precisely, challenge engineering decisions with evidence, and avoid the two classic failure modes: over-trusting the model and under-specifying the requirements.

Junior Non-Technical AI PM (0–2 years)

CategoryMust have
Hands-on usagePower user of ChatGPT, Claude, Gemini — knows their strengths, limitations, and prompt strategies
Basic promptingCan write a system prompt and iterate on it without engineering help
VocabularyFluent in: tokens, context window, hallucination, RAG, embeddings, temperature, fine-tuning, evals
Failure modesCan name and describe the 5 main ways LLMs fail (hallucination, context limits, injection, bias, inconsistency)
AI product examplesHas studied 3+ AI products deeply — how they work, what problems they solve, how they fail
Data intuitionComfortable reading a bar chart of model scores. Understands what 'better on evals' means

Mid Non-Technical AI PM (2–5 years)

CategoryMust have
AI PRDWrites AI feature PRDs with model requirements, failure mode tables, and eval criteria
Eval literacyCan review an eval dashboard, identify regressions, and ask the right questions of engineering
User researchRuns user research specifically around AI trust, confusion, and error handling expectations
Vendor evaluationCan evaluate AI tool vendors: asks about model cards, SLAs, data retention, compliance
Guardrails specCan define the content policy for an AI feature and translate it into engineering requirements
Cost literacyUnderstands token costs, can estimate monthly AI feature spend, knows the cost levers
Metrics designDefines success metrics for AI features that aren't just engagement (quality, trust, task completion)

Senior Non-Technical AI PM (5–8 years)

CategoryMust have
AI strategyBuilds the AI roadmap for a product area: sequenced investments, build-vs-buy calls, maturity model
Risk frameworksRuns AI risk assessments: identifies high-risk use cases, proposes mitigations, documents decisions
Safety ownershipWorks with legal, compliance, and safety to define and enforce AI usage policies
Competitive intelTracks competitor AI features systematically. Identifies product differentiation through AI capabilities
Exec storytellingCan communicate AI product strategy, progress, and risks to board-level stakeholders clearly

Non-Technical AI PM — Company tier differences

TierStack differences
Early-stage startupOften not a distinct role — founder or generalist PM owns AI product. Must be hands-on with the model directly.
Growth-stageFirst AI PM hire. Expected to be self-sufficient on prompting, eval reading, and vendor research.
EnterpriseHeavy compliance, procurement, and stakeholder management load. Legal fluency around AI risk required.
FAANGWorks alongside a technical AI PM or TPM. Focuses on market strategy, user research, and business model.

Role 6: Field Developer Engineer / Solutions Engineer

FDEs (also called Solutions Engineers, Developer Advocates, or AI Customer Engineers) work at the interface between a model provider or AI platform and its enterprise customers. They write demo code, lead customer workshops, debug integration issues, and translate customer requirements into product feedback. The role requires both technical depth and customer-facing communication.

Junior FDE / Solutions Engineer (0–2 years)

CategoryMust have
API fluencyCan demo any core API feature live, from scratch, without notes. Handles unexpected questions confidently
Sample codeHas built 5+ small working demos across different use cases (RAG, agents, summarisation, classification)
Language breadthPython required. JavaScript/TypeScript strongly preferred (most enterprise integration is JS)
Explanation skillsCan explain embeddings, RAG, and function calling to a non-technical developer audience
DebuggingCan diagnose API errors, rate limit issues, and bad outputs in front of a customer without panicking
DocumentationDeep familiarity with the API docs, model cards, and changelog. Knows where to look quickly

Mid FDE / Solutions Engineer (2–5 years)

CategoryMust have
Architecture guidanceCan review a customer's AI architecture and identify failure points, cost inefficiencies, or missing guardrails
Integration depthHas built full integrations: CRM, enterprise search, customer data platforms. Knows OAuth, webhooks, enterprise auth
Workshop facilitationRuns customer workshops: prompt engineering, RAG design, agent patterns. Can handle live Q&A from senior engineers
Competitive knowledgeDeep comparative knowledge: where the product wins, where it doesn't, and how to position honestly
EscalationCan triage a complex customer issue, write a clear internal escalation report, and follow it to resolution
Feedback loopTurns customer pain points into structured product feedback. Has relationships with PM and engineering
Vertical knowledgeDeep expertise in 1–2 industries (fintech, healthcare, legal) — knows the compliance, data, and use case landscape

Senior FDE / Solutions Engineer (5–8 years)

CategoryMust have
Reference architectureAuthors and maintains reference architectures for key customer use cases
Executive engagementCan lead an executive briefing on AI strategy, discuss risk, ROI, and roadmap at C-level
Technical depthCan go 5 levels deep on any product feature — from API parameter to serving infrastructure
Partner ecosystemKnows the partner landscape (system integrators, consultancies) and manages key technical relationships
EnablementBuilds training content and technical enablement programs for partner engineers
Product influenceHas shaped product priorities through sustained, evidence-based customer feedback

FDE / Solutions Engineer — Company tier differences

TierStack differences
AI-native startupWears many hats: part sales engineer, part DevRel, part customer success. Must be an excellent communicator and fast learner.
Growth-stageDedicated SE team forming. Expected to build polished demo environments and run customer PoCs independently.
Enterprise AI vendorDeep enterprise integration expertise: SSO, compliance, data residency, procurement. Custom PoC development.
FAANG (cloud AI)Scale and breadth: support hundreds of customers. Strong documentation and self-serve tooling skills required.

Universal skills every AI role needs

Regardless of role, there are six skills that are expected at every level and in every company. These are the things that get you past the basics filter in any AI interview.

SkillWhat 'competent' looks like
LLM mental modelCan explain what happens inside a transformer at a conceptual level. Knows tokens, embeddings, attention.
Hallucination literacyCan explain why models hallucinate, name 3 common triggers, and describe mitigation strategies for each.
RAG conceptualCan explain naive RAG end-to-end, name its 3 main failure modes, and describe one architectural improvement.
Cost thinkingHas a rough intuition for token costs. Can back-of-envelope estimate monthly LLM spend for a feature.
Safety awarenessKnows prompt injection, jailbreaks, and output filtering. Can identify unsafe AI feature designs.
Eval mindsetUnderstands why you can't manually test an LLM feature and what eval automation requires.

The stack evolution: what to add in 2025–2026

The AI stack moves fast. These are the skills that are transitioning from 'nice-to-have' to 'expected' over the next 12–18 months:

The biggest career risk in AI right now is over-specialising on one framework (LangChain, LlamaIndex) or one model provider. Frameworks change every 6 months. The durable skills are the conceptual foundations: how retrieval works, how agents are structured, how evals are designed — not the specific library API.

Explore all AI career paths →: Salary guides, role definitions, and learning paths for every AI role in the Careers section.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →