AI Engineering 8 min read

What Does an ML Engineer Actually Do in 2025?

The evolving ML Engineer role post-LLM revolution — what's changed, what's still core (training, MLOps, model serving), and how to position yourself.

The ML Engineer title covers a wide range — from building training pipelines for billion-parameter models to deploying fine-tuned classifiers in production microservices. Understanding what the role actually involves, how it differs from AI Engineer and Data Scientist, and what the career path looks like is essential reading before you apply.

What ML engineers actually do

ML Engineers sit at the intersection of software engineering and machine learning research. They write production code, but the code trains and serves models. Day-to-day work includes: building and maintaining training pipelines, curating and versioning training datasets, running experiments and tracking results, deploying models to serving infrastructure, monitoring model performance in production, and collaborating with researchers to productionise new techniques.

The distinction from Data Scientists: ML Engineers own the production path. A Data Scientist builds a model in a notebook; an ML Engineer turns it into a service that handles 10K requests per minute, fails gracefully, and can be retrained and redeployed in an hour.

ML Engineer vs AI Engineer — the 2025 distinction

Dimension	ML Engineer	AI Engineer
Primary work	Training + fine-tuning models	Building on top of foundation models
Core skill	PyTorch / JAX, distributed training	Prompt engineering, RAG, agents, evals
Output	Model weights + serving infrastructure	LLM-powered applications
Infra depth	Deep — owns GPUs, distributed systems	Moderate — uses managed APIs
Math depth	High — loss functions, gradients	Moderate — uses models as black boxes
2025 demand	High at labs and large tech	Rapidly growing across all sectors

Core technical skills

Python at a production level — not just scripts, but services with tests, types, and CI
PyTorch or JAX — building, training, and debugging neural networks from scratch
Distributed training — data parallelism, model parallelism, FSDP, DeepSpeed
ML infrastructure — experiment tracking (MLflow, W&B), model registry, artifact storage
Data pipelines — building reliable, reproducible data processing at scale
Model serving — TorchServe, ONNX, TensorRT, vLLM, or Triton Inference Server
Cloud ML platforms — SageMaker, Vertex AI, or Azure ML for managed training jobs

What companies want in 2025

Pre-2022, most ML engineering roles focused on classical models — tabular data, recommendation systems, NLP classifiers. Post-2022, the majority of new ML Engineering hiring is LLM-adjacent: fine-tuning foundation models, building RLHF pipelines, scaling training infrastructure for frontier model training, or deploying and serving large models efficiently.

The most in-demand specialisations: LLM fine-tuning (LoRA, QLoRA, full fine-tune at scale), inference optimisation (quantisation, speculative decoding, vLLM deployment), and training infrastructure (GPU cluster management, distributed training debugging).

Career progression

Level	Scope	Key milestone
Junior MLE	Executes well-defined tasks on existing pipelines	Ships first model to production
Mid MLE	Owns a model or pipeline end-to-end	Reduces training time or serving cost by 2×
Senior MLE	Leads cross-functional ML projects	Designs the ML architecture for a new product
Staff MLE	Sets technical direction for an ML platform or area	Influence across multiple teams or products
Principal MLE	Org-level impact on ML strategy	Drives multi-year technical roadmap

How to get in

The clearest path from SWE to MLE: build a project that requires training a model from scratch — not fine-tuning an existing one. Build the data pipeline, write the training loop, deploy the model, and monitor it. Show this project in interviews. Complement it with a strong understanding of transformers, backpropagation, and distributed systems.

The Karpathy path: watch 'Let's build GPT from scratch', implement it yourself, then implement GPT-2 training on a small dataset. This project — described confidently in interviews — opens more MLE doors than any certification.

Explore the AI careers section →: Salary guides, role comparisons, and breaking-in strategies for every AI role.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →