LoRA (Low-Rank Adaptation of Large Language Models) is a technique designed to efficiently fine-tune large language models (LLMs) by introducing trainable low-rank matrices while freezing the original model weights. This method drastically reduces the computational and memory costs associated with training massive models like GPT, BERT, or others.
How LoRA Works
-
Concept of Low-Rank Decomposition:
LoRA assumes that the updates (or weight changes) required for fine-tuning are low-rank matrices.
It decomposes the weight update matrices into a pair of smaller matrices with low rank. These smaller matrices are trainable, while the original model weights remain frozen. -
Integration:
LoRA adds these low-rank matrices to specific layers of the model (e.g., transformer layers), particularly in the projection layers of attention mechanisms.
During inference or training, the low-rank matrices are combined with the original frozen weights.
Benefits of LoRA
Reduced Compute Requirements:
Since only the low-rank matrices are trainable, the memory footprint and computational cost are significantly lower.
This is especially helpful for fine-tuning large models on resource-constrained devices (e.g., GPUs with limited VRAM).
Efficiency:
Fine-tuning with LoRA is often faster because it updates fewer parameters compared to full-model fine-tuning.
Scalability:
Multiple tasks can be fine-tuned on the same base model by storing different sets of LoRA parameters without duplicating the entire model.
Compatibility:
LoRA is modular. You can mix and match pretrained LoRA adapters with different base models or tasks.
Let’s jump to the coding.
To install the necessary packages for the script, you can use the following pip install
command:
pip <span>install </span>transformers peft datasets torchpip <span>install </span>transformers peft datasets torchpip install transformers peft datasets torch
Enter fullscreen mode Exit fullscreen mode
This command will install the transformers, peft, datasets, and torch libraries, which are required for the script to run.
<span>from</span> <span>transformers</span> <span>import</span> <span>AutoModelForCausalLM</span><span>,</span> <span>AutoTokenizer</span><span>,</span> <span>TrainingArguments</span><span>,</span> <span>Trainer</span><span>from</span> <span>peft</span> <span>import</span> <span>LoraConfig</span><span>,</span> <span>get_peft_model</span><span>from</span> <span>datasets</span> <span>import</span> <span>load_dataset</span><span>import</span> <span>torch</span><span># Step 1: Load the base model and tokenizer </span><span>model_name</span> <span>=</span> <span>"</span><span>gpt2</span><span>"</span> <span># Replace with the desired model </span><span>model</span> <span>=</span> <span>AutoModelForCausalLM</span><span>.</span><span>from_pretrained</span><span>(</span><span>model_name</span><span>)</span><span>tokenizer</span> <span>=</span> <span>AutoTokenizer</span><span>.</span><span>from_pretrained</span><span>(</span><span>model_name</span><span>)</span><span>if</span> <span>tokenizer</span><span>.</span><span>pad_token</span> <span>is</span> <span>None</span><span>:</span><span>tokenizer</span><span>.</span><span>pad_token</span> <span>=</span> <span>tokenizer</span><span>.</span><span>eos_token</span><span># Step 2: Configure and apply LoRA </span><span>lora_config</span> <span>=</span> <span>LoraConfig</span><span>(</span><span>r</span><span>=</span><span>8</span><span>,</span> <span># Low-rank dimension </span> <span>lora_alpha</span><span>=</span><span>32</span><span>,</span> <span># Scaling factor </span> <span>target_modules</span><span>=</span><span>[</span><span>"</span><span>c_attn</span><span>"</span><span>],</span> <span># Target modules for LoRA (check model's architecture) </span> <span>lora_dropout</span><span>=</span><span>0.1</span><span>,</span> <span># Dropout for LoRA layers </span> <span>bias</span><span>=</span><span>"</span><span>none</span><span>"</span> <span># Whether to tune biases </span><span>)</span><span>model</span> <span>=</span> <span>get_peft_model</span><span>(</span><span>model</span><span>,</span> <span>lora_config</span><span>)</span><span># Step 3: Load and preprocess the dataset </span><span>dataset</span> <span>=</span> <span>load_dataset</span><span>(</span><span>"</span><span>wikitext</span><span>"</span><span>,</span> <span>"</span><span>wikitext-2-raw-v1</span><span>"</span><span>)</span><span>def</span> <span>tokenize_function</span><span>(</span><span>examples</span><span>):</span><span>return</span> <span>tokenizer</span><span>(</span><span>examples</span><span>[</span><span>"</span><span>text</span><span>"</span><span>],</span> <span>truncation</span><span>=</span><span>True</span><span>,</span> <span>padding</span><span>=</span><span>"</span><span>max_length</span><span>"</span><span>,</span> <span>max_length</span><span>=</span><span>128</span><span>)</span><span>tokenized_dataset</span> <span>=</span> <span>dataset</span><span>.</span><span>map</span><span>(</span><span>tokenize_function</span><span>,</span> <span>batched</span><span>=</span><span>True</span><span>)</span><span># Step 4: Define training arguments </span><span>training_args</span> <span>=</span> <span>TrainingArguments</span><span>(</span><span>output_dir</span><span>=</span><span>"</span><span>./results</span><span>"</span><span>,</span><span>eval_strategy</span><span>=</span><span>"</span><span>epoch</span><span>"</span><span>,</span> <span># Updated to use `eval_strategy` </span> <span>learning_rate</span><span>=</span><span>5e-4</span><span>,</span><span>per_device_train_batch_size</span><span>=</span><span>8</span><span>,</span><span>num_train_epochs</span><span>=</span><span>3</span><span>,</span><span>weight_decay</span><span>=</span><span>0.01</span><span>,</span><span>logging_dir</span><span>=</span><span>"</span><span>./logs</span><span>"</span><span>,</span><span>save_strategy</span><span>=</span><span>"</span><span>epoch</span><span>"</span><span>,</span><span>save_total_limit</span><span>=</span><span>2</span><span>)</span><span># Step 5: Initialize the Trainer with a custom compute_loss function </span><span>class</span> <span>CustomTrainer</span><span>(</span><span>Trainer</span><span>):</span><span>def</span> <span>compute_loss</span><span>(</span><span>self</span><span>,</span> <span>model</span><span>,</span> <span>inputs</span><span>,</span> <span>return_outputs</span><span>=</span><span>False</span><span>):</span><span>outputs</span> <span>=</span> <span>model</span><span>(</span><span>**</span><span>inputs</span><span>)</span><span>logits</span> <span>=</span> <span>outputs</span><span>.</span><span>get</span><span>(</span><span>"</span><span>logits</span><span>"</span><span>)</span><span>labels</span> <span>=</span> <span>inputs</span><span>.</span><span>get</span><span>(</span><span>"</span><span>input_ids</span><span>"</span><span>)</span><span># Shift so that tokens < n predict n </span> <span>shift_logits</span> <span>=</span> <span>logits</span><span>[...,</span> <span>:</span><span>-</span><span>1</span><span>,</span> <span>:].</span><span>contiguous</span><span>()</span><span>shift_labels</span> <span>=</span> <span>labels</span><span>[...,</span> <span>1</span><span>:].</span><span>contiguous</span><span>()</span><span># Flatten the tokens </span> <span>loss_fct</span> <span>=</span> <span>torch</span><span>.</span><span>nn</span><span>.</span><span>CrossEntropyLoss</span><span>()</span><span>loss</span> <span>=</span> <span>loss_fct</span><span>(</span><span>shift_logits</span><span>.</span><span>view</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>shift_logits</span><span>.</span><span>size</span><span>(</span><span>-</span><span>1</span><span>)),</span> <span>shift_labels</span><span>.</span><span>view</span><span>(</span><span>-</span><span>1</span><span>))</span><span>return </span><span>(</span><span>loss</span><span>,</span> <span>outputs</span><span>)</span> <span>if</span> <span>return_outputs</span> <span>else</span> <span>loss</span><span>def</span> <span>training_step</span><span>(</span><span>self</span><span>,</span> <span>model</span><span>,</span> <span>inputs</span><span>,</span> <span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>):</span><span>model</span><span>.</span><span>train</span><span>()</span><span>inputs</span> <span>=</span> <span>self</span><span>.</span><span>_prepare_inputs</span><span>(</span><span>inputs</span><span>)</span><span>loss</span> <span>=</span> <span>self</span><span>.</span><span>compute_loss</span><span>(</span><span>model</span><span>,</span> <span>inputs</span><span>)</span><span># Use the optimizer and scheduler provided by the Trainer </span> <span>self</span><span>.</span><span>optimizer</span><span>.</span><span>zero_grad</span><span>()</span><span>loss</span><span>.</span><span>backward</span><span>()</span><span>self</span><span>.</span><span>optimizer</span><span>.</span><span>step</span><span>()</span><span>self</span><span>.</span><span>lr_scheduler</span><span>.</span><span>step</span><span>()</span><span>return</span> <span>loss</span><span>trainer</span> <span>=</span> <span>CustomTrainer</span><span>(</span><span>model</span><span>=</span><span>model</span><span>,</span><span>args</span><span>=</span><span>training_args</span><span>,</span><span>train_dataset</span><span>=</span><span>tokenized_dataset</span><span>[</span><span>"</span><span>train</span><span>"</span><span>],</span><span>eval_dataset</span><span>=</span><span>tokenized_dataset</span><span>[</span><span>"</span><span>validation</span><span>"</span><span>]</span><span>)</span><span># Step 6: Fine-tune the model with LoRA </span><span>trainer</span><span>.</span><span>train</span><span>()</span><span># Step 7: Save the fine-tuned model </span><span>model</span><span>.</span><span>save_pretrained</span><span>(</span><span>"</span><span>./lora_finetuned_model</span><span>"</span><span>)</span><span>tokenizer</span><span>.</span><span>save_pretrained</span><span>(</span><span>"</span><span>./lora_finetuned_model</span><span>"</span><span>)</span><span>print</span><span>(</span><span>"</span><span>Fine-tuning complete! The model is saved at </span><span>'</span><span>./lora_finetuned_model</span><span>'</span><span>.</span><span>"</span><span>)</span><span>from</span> <span>transformers</span> <span>import</span> <span>AutoModelForCausalLM</span><span>,</span> <span>AutoTokenizer</span><span>,</span> <span>TrainingArguments</span><span>,</span> <span>Trainer</span> <span>from</span> <span>peft</span> <span>import</span> <span>LoraConfig</span><span>,</span> <span>get_peft_model</span> <span>from</span> <span>datasets</span> <span>import</span> <span>load_dataset</span> <span>import</span> <span>torch</span> <span># Step 1: Load the base model and tokenizer </span><span>model_name</span> <span>=</span> <span>"</span><span>gpt2</span><span>"</span> <span># Replace with the desired model </span><span>model</span> <span>=</span> <span>AutoModelForCausalLM</span><span>.</span><span>from_pretrained</span><span>(</span><span>model_name</span><span>)</span> <span>tokenizer</span> <span>=</span> <span>AutoTokenizer</span><span>.</span><span>from_pretrained</span><span>(</span><span>model_name</span><span>)</span> <span>if</span> <span>tokenizer</span><span>.</span><span>pad_token</span> <span>is</span> <span>None</span><span>:</span> <span>tokenizer</span><span>.</span><span>pad_token</span> <span>=</span> <span>tokenizer</span><span>.</span><span>eos_token</span> <span># Step 2: Configure and apply LoRA </span><span>lora_config</span> <span>=</span> <span>LoraConfig</span><span>(</span> <span>r</span><span>=</span><span>8</span><span>,</span> <span># Low-rank dimension </span> <span>lora_alpha</span><span>=</span><span>32</span><span>,</span> <span># Scaling factor </span> <span>target_modules</span><span>=</span><span>[</span><span>"</span><span>c_attn</span><span>"</span><span>],</span> <span># Target modules for LoRA (check model's architecture) </span> <span>lora_dropout</span><span>=</span><span>0.1</span><span>,</span> <span># Dropout for LoRA layers </span> <span>bias</span><span>=</span><span>"</span><span>none</span><span>"</span> <span># Whether to tune biases </span><span>)</span> <span>model</span> <span>=</span> <span>get_peft_model</span><span>(</span><span>model</span><span>,</span> <span>lora_config</span><span>)</span> <span># Step 3: Load and preprocess the dataset </span><span>dataset</span> <span>=</span> <span>load_dataset</span><span>(</span><span>"</span><span>wikitext</span><span>"</span><span>,</span> <span>"</span><span>wikitext-2-raw-v1</span><span>"</span><span>)</span> <span>def</span> <span>tokenize_function</span><span>(</span><span>examples</span><span>):</span> <span>return</span> <span>tokenizer</span><span>(</span><span>examples</span><span>[</span><span>"</span><span>text</span><span>"</span><span>],</span> <span>truncation</span><span>=</span><span>True</span><span>,</span> <span>padding</span><span>=</span><span>"</span><span>max_length</span><span>"</span><span>,</span> <span>max_length</span><span>=</span><span>128</span><span>)</span> <span>tokenized_dataset</span> <span>=</span> <span>dataset</span><span>.</span><span>map</span><span>(</span><span>tokenize_function</span><span>,</span> <span>batched</span><span>=</span><span>True</span><span>)</span> <span># Step 4: Define training arguments </span><span>training_args</span> <span>=</span> <span>TrainingArguments</span><span>(</span> <span>output_dir</span><span>=</span><span>"</span><span>./results</span><span>"</span><span>,</span> <span>eval_strategy</span><span>=</span><span>"</span><span>epoch</span><span>"</span><span>,</span> <span># Updated to use `eval_strategy` </span> <span>learning_rate</span><span>=</span><span>5e-4</span><span>,</span> <span>per_device_train_batch_size</span><span>=</span><span>8</span><span>,</span> <span>num_train_epochs</span><span>=</span><span>3</span><span>,</span> <span>weight_decay</span><span>=</span><span>0.01</span><span>,</span> <span>logging_dir</span><span>=</span><span>"</span><span>./logs</span><span>"</span><span>,</span> <span>save_strategy</span><span>=</span><span>"</span><span>epoch</span><span>"</span><span>,</span> <span>save_total_limit</span><span>=</span><span>2</span> <span>)</span> <span># Step 5: Initialize the Trainer with a custom compute_loss function </span><span>class</span> <span>CustomTrainer</span><span>(</span><span>Trainer</span><span>):</span> <span>def</span> <span>compute_loss</span><span>(</span><span>self</span><span>,</span> <span>model</span><span>,</span> <span>inputs</span><span>,</span> <span>return_outputs</span><span>=</span><span>False</span><span>):</span> <span>outputs</span> <span>=</span> <span>model</span><span>(</span><span>**</span><span>inputs</span><span>)</span> <span>logits</span> <span>=</span> <span>outputs</span><span>.</span><span>get</span><span>(</span><span>"</span><span>logits</span><span>"</span><span>)</span> <span>labels</span> <span>=</span> <span>inputs</span><span>.</span><span>get</span><span>(</span><span>"</span><span>input_ids</span><span>"</span><span>)</span> <span># Shift so that tokens < n predict n </span> <span>shift_logits</span> <span>=</span> <span>logits</span><span>[...,</span> <span>:</span><span>-</span><span>1</span><span>,</span> <span>:].</span><span>contiguous</span><span>()</span> <span>shift_labels</span> <span>=</span> <span>labels</span><span>[...,</span> <span>1</span><span>:].</span><span>contiguous</span><span>()</span> <span># Flatten the tokens </span> <span>loss_fct</span> <span>=</span> <span>torch</span><span>.</span><span>nn</span><span>.</span><span>CrossEntropyLoss</span><span>()</span> <span>loss</span> <span>=</span> <span>loss_fct</span><span>(</span><span>shift_logits</span><span>.</span><span>view</span><span>(</span><span>-</span><span>1</span><span>,</span> <span>shift_logits</span><span>.</span><span>size</span><span>(</span><span>-</span><span>1</span><span>)),</span> <span>shift_labels</span><span>.</span><span>view</span><span>(</span><span>-</span><span>1</span><span>))</span> <span>return </span><span>(</span><span>loss</span><span>,</span> <span>outputs</span><span>)</span> <span>if</span> <span>return_outputs</span> <span>else</span> <span>loss</span> <span>def</span> <span>training_step</span><span>(</span><span>self</span><span>,</span> <span>model</span><span>,</span> <span>inputs</span><span>,</span> <span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>):</span> <span>model</span><span>.</span><span>train</span><span>()</span> <span>inputs</span> <span>=</span> <span>self</span><span>.</span><span>_prepare_inputs</span><span>(</span><span>inputs</span><span>)</span> <span>loss</span> <span>=</span> <span>self</span><span>.</span><span>compute_loss</span><span>(</span><span>model</span><span>,</span> <span>inputs</span><span>)</span> <span># Use the optimizer and scheduler provided by the Trainer </span> <span>self</span><span>.</span><span>optimizer</span><span>.</span><span>zero_grad</span><span>()</span> <span>loss</span><span>.</span><span>backward</span><span>()</span> <span>self</span><span>.</span><span>optimizer</span><span>.</span><span>step</span><span>()</span> <span>self</span><span>.</span><span>lr_scheduler</span><span>.</span><span>step</span><span>()</span> <span>return</span> <span>loss</span> <span>trainer</span> <span>=</span> <span>CustomTrainer</span><span>(</span> <span>model</span><span>=</span><span>model</span><span>,</span> <span>args</span><span>=</span><span>training_args</span><span>,</span> <span>train_dataset</span><span>=</span><span>tokenized_dataset</span><span>[</span><span>"</span><span>train</span><span>"</span><span>],</span> <span>eval_dataset</span><span>=</span><span>tokenized_dataset</span><span>[</span><span>"</span><span>validation</span><span>"</span><span>]</span> <span>)</span> <span># Step 6: Fine-tune the model with LoRA </span><span>trainer</span><span>.</span><span>train</span><span>()</span> <span># Step 7: Save the fine-tuned model </span><span>model</span><span>.</span><span>save_pretrained</span><span>(</span><span>"</span><span>./lora_finetuned_model</span><span>"</span><span>)</span> <span>tokenizer</span><span>.</span><span>save_pretrained</span><span>(</span><span>"</span><span>./lora_finetuned_model</span><span>"</span><span>)</span> <span>print</span><span>(</span><span>"</span><span>Fine-tuning complete! The model is saved at </span><span>'</span><span>./lora_finetuned_model</span><span>'</span><span>.</span><span>"</span><span>)</span>from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer from peft import LoraConfig, get_peft_model from datasets import load_dataset import torch # Step 1: Load the base model and tokenizer model_name = "gpt2" # Replace with the desired model model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # Step 2: Configure and apply LoRA lora_config = LoraConfig( r=8, # Low-rank dimension lora_alpha=32, # Scaling factor target_modules=["c_attn"], # Target modules for LoRA (check model's architecture) lora_dropout=0.1, # Dropout for LoRA layers bias="none" # Whether to tune biases ) model = get_peft_model(model, lora_config) # Step 3: Load and preprocess the dataset dataset = load_dataset("wikitext", "wikitext-2-raw-v1") def tokenize_function(examples): return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128) tokenized_dataset = dataset.map(tokenize_function, batched=True) # Step 4: Define training arguments training_args = TrainingArguments( output_dir="./results", eval_strategy="epoch", # Updated to use `eval_strategy` learning_rate=5e-4, per_device_train_batch_size=8, num_train_epochs=3, weight_decay=0.01, logging_dir="./logs", save_strategy="epoch", save_total_limit=2 ) # Step 5: Initialize the Trainer with a custom compute_loss function class CustomTrainer(Trainer): def compute_loss(self, model, inputs, return_outputs=False): outputs = model(**inputs) logits = outputs.get("logits") labels = inputs.get("input_ids") # Shift so that tokens < n predict n shift_logits = logits[..., :-1, :].contiguous() shift_labels = labels[..., 1:].contiguous() # Flatten the tokens loss_fct = torch.nn.CrossEntropyLoss() loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) return (loss, outputs) if return_outputs else loss def training_step(self, model, inputs, *args, **kwargs): model.train() inputs = self._prepare_inputs(inputs) loss = self.compute_loss(model, inputs) # Use the optimizer and scheduler provided by the Trainer self.optimizer.zero_grad() loss.backward() self.optimizer.step() self.lr_scheduler.step() return loss trainer = CustomTrainer( model=model, args=training_args, train_dataset=tokenized_dataset["train"], eval_dataset=tokenized_dataset["validation"] ) # Step 6: Fine-tune the model with LoRA trainer.train() # Step 7: Save the fine-tuned model model.save_pretrained("./lora_finetuned_model") tokenizer.save_pretrained("./lora_finetuned_model") print("Fine-tuning complete! The model is saved at './lora_finetuned_model'.")
Enter fullscreen mode Exit fullscreen mode
Here’s a quick explanation of the code:
Imports:
The script imports necessary libraries and modules, including transformers, peft, datasets, and torch.
Load Model and Tokenizer:
The base model (gpt2) and tokenizer are loaded using AutoModelForCausalLM and AutoTokenizer from the transformers library.
If the tokenizer does not have a padding token, it sets the padding token to the end-of-sequence token.
Configure and Apply LoRA:
A LoraConfig object is created with specific parameters for low-rank adaptation (LoRA).
The base model is modified using get_peft_model to apply the LoRA configuration.
Load and Preprocess Dataset:
The wikitext-2-raw-v1 dataset is loaded using the datasets library.
A tokenize_function is defined to tokenize the dataset examples.
The dataset is tokenized using the map method.
Define Training Arguments:
Training arguments are defined using TrainingArguments from the transformers library, specifying parameters like output directory, evaluation strategy, learning rate, batch size, number of epochs, weight decay, logging directory, and save strategy.
Custom Trainer Class:
A CustomTrainer class is defined, inheriting from Trainer.
The compute_loss method is overridden to compute the loss using cross-entropy loss.
The training_step method is overridden to perform a training step, including zeroing gradients, backpropagation, and optimizer step.
Initialize Trainer:
An instance of CustomTrainer is created with the model, training arguments, and tokenized datasets.
Fine-tune the Model:
The train method of the CustomTrainer instance is called to fine-tune the model.
Save the Fine-tuned Model:
The fine-tuned model and tokenizer are saved to the specified directory.
Print Completion Message:
A message is printed to indicate that fine-tuning is complete and the model is saved.
This script fine-tunes a GPT-2 model using LoRA on the Wikitext-2 dataset and saves the fine-tuned model and tokenizer.
Happy coding!
暂无评论内容