How To Build Your Own ChatGPT / Llama Model With Hugging Face
The Plan
Pretraining
Supervised Fine-tuning
Training a Reward model
Proximal Policy Optimization
The Plan
There is a lot of value in understanding the basics of pre-training a model and aligning it to follow instructions. I want to demonstrate here the simplicity in terms of code to obtain a model like ChatGPT (GPT 3.5 Turbo, actually) or Llama 2. I am just going to follow the blueprints provided in the InstructGPT paper:
Here are the following steps to train a ChatGPT-like model:
We need to pre-train a model for Language Modeling. We just feed the model a lot of text data for it to learn the statistical pattern of that data. The model will learn the common distribution of words and tokens in texts generated by humans. The model will become good at generating human-like text data.
We need to fine-tune the model in a supervised-learning manner by showing it what are good pairs of questions and answers. The model will become good at generating answers to questions.
We need to train a Reward model. The reward model is trained to recognize best to worst answers to a question and predicts a score to assess the quality of the answer.
We further fine-tune the previous model with Reinforcement Learning by using the Reward model. This will help the model understand the difference between a good answer and an even better answer.
Pretraining
Constructing the model backbone
To understand how to pre-train a model for Language Modeling, one can learn from the way GPT-3 is trained (“Language Models are Few-Shot Learners”). Let’s install a couple of libraries first
pip install transformers datasets trl torchview
Let’s first create a base model by using the Transformers package from Hugging Face. I am going to start with GPT-2 as the backbone model:
from transformers import GPT2Config, GPT2Model
config = GPT2Config()
gpt2_model = GPT2Model(config)
gpt2_model
This is a PyTorch model. For any transformer models, we need a tokenizer to map words to tokens:
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
We could visualize this model by using the torchview package:
from torchview import draw_graph
inputs = tokenizer("Hello world!", return_tensors="pt")
model_graph = draw_graph(
gpt2_model,
input_data=inputs,
)
model_graph.visual_graph
This gives me the default GPT-2 model. I could decide to make it ridiculously small if I wanted to use it on my phone, for example:
config = GPT2Config(
n_head=2,
n_layer=3,
n_embd=6
)
gpt2_model = GPT2Model(config)
In the above example, I changed the configuration of the model by reducing the number of attention heads from 12 to 2, the number of transformer blocks from 12 to 3, and the embedding size from 768 to 6. That won’t be a very performant model! You can look at the default configuration:
config = GPT2Config()
config
We could get a Llama model in a similar manner:
Keep reading with a 7-day free trial
Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.