GenAI Systems Lab Open interactive version →
AI Engineering 9 min read

Continued Pretraining: When Fine-Tuning Isn't Deep Enough

When domain adaptation requires continued pretraining on unlabelled text rather than supervised fine-tuning. Medical, legal, and code domains — what it takes and when it's worth it.

Instruction fine-tuning teaches a model how to respond. Continued pretraining teaches a model what to know. The distinction matters when your domain is genuinely out-of-distribution from the base model's pretraining data — where the vocabulary, concepts, and reasoning patterns of your domain are so specialised that no amount of instruction tuning on labelled examples will fully close the gap.

Medicine. Law. Highly specialised scientific domains. Proprietary internal codebases with unique conventions. These are the domains where continued pretraining on unlabelled text becomes the right tool.

What continued pretraining actually does

Continued pretraining runs the standard language modelling objective (predict the next token) on a large corpus of domain-specific text — without any instruction-response structure. The model doesn't learn to answer questions; it learns the statistical patterns, terminology, and reasoning structures of the domain.

After continued pretraining, you still need instruction fine-tuning on top to teach the model how to use that knowledge in response to instructions. Continued pretraining → instruction fine-tuning is the standard two-stage pipeline for deep domain adaptation.

Continued pretraining changes what the model knows. Instruction fine-tuning changes how the model responds. You often need both. The order is always: continued pretraining first, then instruction fine-tuning on top of the domain-adapted base.

When continued pretraining is worth it

When it's not worth it

Data requirements and preparation

Continued pretraining data is unlabelled — just raw text from your domain. Quality still matters enormously. A 10B token corpus of high-quality medical literature will produce a better model than 50B tokens of scraped web text that happens to mention medical topics.

Continued pretraining vs. RAG

AspectContinued PretrainingRAG
Knowledge typeStatistical patterns, reasoning structures, vocabularySpecific factual claims, citations
Update costHigh — requires retrainingLow — update the index
Knowledge freshnessStatic until next training runCan be updated in real time
Hallucination riskDoesn't reduce on facts outside training corpusReduces for facts in the retrieved documents
Best forDeep domain vocabulary + reasoningDynamic factual knowledge + citation

Explore domain adaptation approaches →: Compare continued pretraining, fine-tuning, and RAG for domain adaptation tasks.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →