AI Engineering 12 min read

LoRA in Practice: A Step-by-Step Production Fine-Tuning Guide

From dataset prep to adapter merge to deployment. The exact config choices that matter — rank, alpha, target modules, learning rate, epochs — with what goes wrong at each step.

The LoRA paper tells you the theory. This post tells you what actually happens when you run it — the config decisions that matter, what goes wrong at each step, and how to go from a raw dataset to a merged adapter ready for production.

Most teams waste their first two or three LoRA runs by making the same preventable mistakes: wrong rank, wrong target modules, no baseline eval, no validation split. This is the guide to not doing that.

Step 1: Dataset preparation

LoRA is only as good as its training data. Before writing a single line of training code, your dataset needs to be in the right shape.

Format: every example must follow exactly one consistent format. The model learns the format as strongly as it learns the content. Inconsistency teaches inconsistency.
Quality: remove low-quality examples aggressively. 500 excellent examples beat 5,000 mediocre ones (see LIMA).
Diversity: ensure topic, length, and instruction type diversity. A homogeneous dataset produces a model that handles only the cases it trained on.
Validation split: hold out 5–10% of examples for evaluation. Never train and evaluate on the same set.
Deduplication: near-duplicate examples cause overfitting. Use MinHash or embedding similarity to detect and remove them.

Step 2: Base model selection

The base model determines the ceiling of your fine-tuned model. A 7B model fine-tuned on excellent data will not match a 70B model on complex reasoning tasks. Choose the smallest model that can plausibly handle your task in zero-shot — then fine-tune it.

Prefer instruct-tuned base models over raw pretrained models for most fine-tuning tasks. Instruct-tuned models already know the conversation format and follow instructions reasonably well — your fine-tuning data then only needs to teach the domain-specific patterns, not the basic interaction format.

Step 3: LoRA configuration

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,                    # rank — start here, experiment up/down
    lora_alpha=32,           # scaling factor — typically 2× rank
    target_modules=[         # which layers to adapt
        "q_proj", "k_proj", "v_proj", "o_proj",  # attention
        "gate_proj", "up_proj", "down_proj",       # FFN (include for better results)
    ],
    lora_dropout=0.05,       # prevent overfitting on small datasets
    bias="none",             # don't train bias terms
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
model.print_trainable_parameters()
# Trainable params: ~40M / 7B total = ~0.6%

Step 4: Training hyperparameters

Hyperparameter	Starting Value	When to Adjust
Learning rate	2e-4	Lower (1e-4) if loss is unstable; higher (3e-4) if training is slow
Batch size	4–16 (per device)	Larger = more stable gradients; limited by VRAM
Gradient accumulation	4–8 steps	Use to achieve effective batch size of 32–64
Epochs	1–3	More epochs = more overfitting risk on small datasets
Warmup steps	10–50	Prevents early training instability
LR scheduler	cosine	Cosine decay generally outperforms linear

Step 5: What to watch during training

Training loss curve: should decrease steadily. Spikes indicate learning rate too high or data issues.
Validation loss: should decrease with training loss. If validation loss rises while training loss falls, you're overfitting.
GPU utilisation: should be >80%. Lower means your data loading or batch size is the bottleneck.
Gradient norm: monitor with gradient clipping (max_grad_norm=1.0). High gradients indicate instability.

Step 6: Evaluation before merging

Never merge a LoRA adapter into the base model before evaluating it. Use your held-out validation set and your production eval suite. Run the adapter-loaded model against your baseline (base model + best system prompt) on the same eval. If the fine-tuned model isn't clearly better on your target metric, do not proceed to deployment.

Step 7: Merging and exporting

from peft import PeftModel

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
peft_model = PeftModel.from_pretrained(base_model, adapter_path)

# Merge adapter weights into base model weights
merged_model = peft_model.merge_and_unload()

# Save as standard model (no PEFT dependency at inference)
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

# Result: a standard model file that loads without PEFT library
# Inference speed identical to the base model

Merging is irreversible — always keep the base model and the separate adapter checkpoint. The merged model can't be 'unmerged'. If you need to update training, you retrain the adapter and re-merge.

Explore fine-tuning approaches →: Compare LoRA configurations and see their effects on task performance.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →