LoRA in Practice: A Step-by-Step Production Fine-Tuning Guide
From dataset prep to adapter merge to deployment. The exact config choices that matter — rank, alpha, target modules, learning rate, epochs — with what goes wrong at each step.
The LoRA paper tells you the theory. This post tells you what actually happens when you run it — the config decisions that matter, what goes wrong at each step, and how to go from a raw dataset to a merged adapter ready for production.
Most teams waste their first two or three LoRA runs by making the same preventable mistakes: wrong rank, wrong target modules, no baseline eval, no validation split. This is the guide to not doing that.
Step 1: Dataset preparation
LoRA is only as good as its training data. Before writing a single line of training code, your dataset needs to be in the right shape.
- Format: every example must follow exactly one consistent format. The model learns the format as strongly as it learns the content. Inconsistency teaches inconsistency.
- Quality: remove low-quality examples aggressively. 500 excellent examples beat 5,000 mediocre ones (see LIMA).
- Diversity: ensure topic, length, and instruction type diversity. A homogeneous dataset produces a model that handles only the cases it trained on.
- Validation split: hold out 5–10% of examples for evaluation. Never train and evaluate on the same set.
- Deduplication: near-duplicate examples cause overfitting. Use MinHash or embedding similarity to detect and remove them.
Step 2: Base model selection
The base model determines the ceiling of your fine-tuned model. A 7B model fine-tuned on excellent data will not match a 70B model on complex reasoning tasks. Choose the smallest model that can plausibly handle your task in zero-shot — then fine-tune it.
Prefer instruct-tuned base models over raw pretrained models for most fine-tuning tasks. Instruct-tuned models already know the conversation format and follow instructions reasonably well — your fine-tuning data then only needs to teach the domain-specific patterns, not the basic interaction format.
Step 3: LoRA configuration
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16, # rank — start here, experiment up/down
lora_alpha=32, # scaling factor — typically 2× rank
target_modules=[ # which layers to adapt
"q_proj", "k_proj", "v_proj", "o_proj", # attention
"gate_proj", "up_proj", "down_proj", # FFN (include for better results)
],
lora_dropout=0.05, # prevent overfitting on small datasets
bias="none", # don't train bias terms
task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
# Trainable params: ~40M / 7B total = ~0.6%
Step 4: Training hyperparameters
| Hyperparameter | Starting Value | When to Adjust |
|---|---|---|
| Learning rate | 2e-4 | Lower (1e-4) if loss is unstable; higher (3e-4) if training is slow |
| Batch size | 4–16 (per device) | Larger = more stable gradients; limited by VRAM |
| Gradient accumulation | 4–8 steps | Use to achieve effective batch size of 32–64 |
| Epochs | 1–3 | More epochs = more overfitting risk on small datasets |
| Warmup steps | 10–50 | Prevents early training instability |
| LR scheduler | cosine | Cosine decay generally outperforms linear |
Step 5: What to watch during training
- Training loss curve: should decrease steadily. Spikes indicate learning rate too high or data issues.
- Validation loss: should decrease with training loss. If validation loss rises while training loss falls, you're overfitting.
- GPU utilisation: should be >80%. Lower means your data loading or batch size is the bottleneck.
- Gradient norm: monitor with gradient clipping (max_grad_norm=1.0). High gradients indicate instability.
Step 6: Evaluation before merging
Never merge a LoRA adapter into the base model before evaluating it. Use your held-out validation set and your production eval suite. Run the adapter-loaded model against your baseline (base model + best system prompt) on the same eval. If the fine-tuned model isn't clearly better on your target metric, do not proceed to deployment.
Step 7: Merging and exporting
from peft import PeftModel
# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
peft_model = PeftModel.from_pretrained(base_model, adapter_path)
# Merge adapter weights into base model weights
merged_model = peft_model.merge_and_unload()
# Save as standard model (no PEFT dependency at inference)
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")
# Result: a standard model file that loads without PEFT library
# Inference speed identical to the base model
Merging is irreversible — always keep the base model and the separate adapter checkpoint. The merged model can't be 'unmerged'. If you need to update training, you retrain the adapter and re-merge.
Explore fine-tuning approaches →: Compare LoRA configurations and see their effects on task performance.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →