How To Build Your Own ChatGPT / Llama Model With Hugging Face

Oct 16, 2023

∙ Paid

The Plan
Pretraining
Supervised Fine-tuning
Training a Reward model
Proximal Policy Optimization

The Plan

There is a lot of value in understanding the basics of pre-training a model and aligning it to follow instructions. I want to demonstrate here the simplicity in terms of code to obtain a model like ChatGPT (GPT 3.5 Turbo, actually) or Llama 2. I am just going to follow the blueprints provided in the InstructGPT paper:

From https://arxiv.org/pdf/2203.02155.pdf

Here are the following steps to train a ChatGPT-like model:

We need to pre-train a model for Language Modeling. We just feed the model a lot of text data for it to learn the statistical pattern of that data. The model will learn the common distribution of words and tokens in texts generated by humans. The model will become good at generating human-like text data.
We need to fine-tune the model in a supervised-learning manner by showing it what are good pairs of questions and answers. The model will become good at generating answers to questions.
We need to train a Reward model. The reward model is trained to recognize best to worst answers to a question and predicts a score to assess the quality of the answer.
We further fine-tune the previous model with Reinforcement Learning by using the Reward model. This will help the model understand the difference between a good answer and an even better answer.

Pretraining

Constructing the model backbone

To understand how to pre-train a model for Language Modeling, one can learn from the way GPT-3 is trained (“Language Models are Few-Shot Learners”). Let’s install a couple of libraries first

pip install transformers datasets trl torchview

Let’s first create a base model by using the Transformers package from Hugging Face. I am going to start with GPT-2 as the backbone model:

from transformers import GPT2Config, GPT2Model

config = GPT2Config()
gpt2_model = GPT2Model(config)
gpt2_model

This is a PyTorch model. For any transformer models, we need a tokenizer to map words to tokens:

from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

We could visualize this model by using the torchview package:

from torchview import draw_graph

inputs = tokenizer("Hello world!", return_tensors="pt")
model_graph = draw_graph(
    gpt2_model, 
    input_data=inputs, 
)

model_graph.visual_graph

This gives me the default GPT-2 model. I could decide to make it ridiculously small if I wanted to use it on my phone, for example:

config = GPT2Config(
    n_head=2,
    n_layer=3,
    n_embd=6
)
gpt2_model = GPT2Model(config)

In the above example, I changed the configuration of the model by reducing the number of attention heads from 12 to 2, the number of transformer blocks from 12 to 3, and the embedding size from 768 to 6. That won’t be a very performant model! You can look at the default configuration:

config = GPT2Config()
config

We could get a Llama model in a similar manner:

Keep reading with a 7-day free trial

Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.