The AiEdge Newsletter

The AiEdge Newsletter

Share this post

The AiEdge Newsletter
The AiEdge Newsletter
How To Build Your Own ChatGPT / Llama Model With Hugging Face
Copy link
Facebook
Email
Notes
More

How To Build Your Own ChatGPT / Llama Model With Hugging Face

Damien Benveniste's avatar
Damien Benveniste
Oct 16, 2023
∙ Paid
14

Share this post

The AiEdge Newsletter
The AiEdge Newsletter
How To Build Your Own ChatGPT / Llama Model With Hugging Face
Copy link
Facebook
Email
Notes
More
Share
  • The Plan

  • Pretraining

  • Supervised Fine-tuning

  • Training a Reward model

  • Proximal Policy Optimization


The Plan

There is a lot of value in understanding the basics of pre-training a model and aligning it to follow instructions. I want to demonstrate here the simplicity in terms of code to obtain a model like ChatGPT (GPT 3.5 Turbo, actually) or Llama 2. I am just going to follow the blueprints provided in the InstructGPT paper:

From https://arxiv.org/pdf/2203.02155.pdf

Here are the following steps to train a ChatGPT-like model:

  1. We need to pre-train a model for Language Modeling. We just feed the model a lot of text data for it to learn the statistical pattern of that data. The model will learn the common distribution of words and tokens in texts generated by humans. The model will become good at generating human-like text data.

  2. We need to fine-tune the model in a supervised-learning manner by showing it what are good pairs of questions and answers. The model will become good at generating answers to questions.

  3. We need to train a Reward model. The reward model is trained to recognize best to worst answers to a question and predicts a score to assess the quality of the answer.

  4. We further fine-tune the previous model with Reinforcement Learning by using the Reward model. This will help the model understand the difference between a good answer and an even better answer.

Pretraining

Constructing the model backbone

To understand how to pre-train a model for Language Modeling, one can learn from the way GPT-3 is trained (“Language Models are Few-Shot Learners”). Let’s install a couple of libraries first

pip install transformers datasets trl torchview

Let’s first create a base model by using the Transformers package from Hugging Face. I am going to start with GPT-2 as the backbone model:

from transformers import GPT2Config, GPT2Model

config = GPT2Config()
gpt2_model = GPT2Model(config)
gpt2_model

This is a PyTorch model. For any transformer models, we need a tokenizer to map words to tokens:

from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

We could visualize this model by using the torchview package:

from torchview import draw_graph

inputs = tokenizer("Hello world!", return_tensors="pt")
model_graph = draw_graph(
    gpt2_model, 
    input_data=inputs, 
)

model_graph.visual_graph

This gives me the default GPT-2 model. I could decide to make it ridiculously small if I wanted to use it on my phone, for example:

config = GPT2Config(
    n_head=2,
    n_layer=3,
    n_embd=6
)
gpt2_model = GPT2Model(config)

In the above example, I changed the configuration of the model by reducing the number of attention heads from 12 to 2, the number of transformer blocks from 12 to 3, and the embedding size from 768 to 6. That won’t be a very performant model! You can look at the default configuration:

config = GPT2Config()
config

We could get a Llama model in a similar manner:

Keep reading with a 7-day free trial

Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 AiEdge
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More