The AiEdge Newsletter

Share this post

GPT-4: The largest Large Language Model yet!

newsletter.theaiedge.io
Newsletters

GPT-4: The largest Large Language Model yet!

Let's break the secret!

Damien Benveniste
Mar 15
15
Share this post

GPT-4: The largest Large Language Model yet!

newsletter.theaiedge.io

GPT-4 is here and it is probably the biggest Large Language Model yet! A lot of the information is actually secret but there are a lot of guesses we can make. We are going to look at:

  • The Architecture

  • The Training process

  • The number of parameters

  • When the creators starts to fear their creation


GPT-4 was just released yesterday on Pi Day (March 14th 2023) and it looks delicious! Well actually, so far you can only join the waiting list to have access to the API: GPT-4. But at least now we have more information to separate separate fantasy from reality. Here is the GPT-4 technical paper: “GPT-4 Technical Report“. The main features are:

  • It is multi-modal with text and image data as input

  • It fine-tuned to mitigate harmful content

  • It is bigger!

Let’s dig into it!


DataInterview (sponsored)

Aspiring to become an ML Engineer? Join the 𝗠𝗟𝗘 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗕𝗼𝗼𝘁𝗰𝗮𝗺𝗽 led by ML Tech Leads from Google and Meta:

  •  4 Weeks of intensive prep from March 20 through April 13

  • Get in-depth coverage of coding, system design, ML system design

  • Prep with 100+ questions and detailed solutions asked in actual interviews

  • Break the senior and staff bar

  • Behavioral and leadership interview prep

  • Weekly office hours

  • Private slack groups

Coupon Code (Takes $400 off): aiedgemle

Enroll today


The Architecture

GPT-4 is here and it is probably the biggest Large Language Model yet! OpenAI finished training GPT-4 back in August 2022, and they spent the past 7 months studying it and making sure it is "safe" for launch! GPT-4 takes as input text and image prompts and generates text.

From the paper:

“Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.”

OpenAI is trying to keep the architecture a secret but there are many guesses we can make. First, they made it clear that it is a Transformer model, and following the GPT-1, GPT-2, and GPT-3 tradition, it is very likely a decoder only architecture pre-trained with the next word prediction learning task. To include image inputs, we need to encode the image in the latent space using a ConvNet or a Vision Transformer. From the ViT paper (“An image is worth 16x16 words: Transformers for image recognition at scale”), we know that it outperforms ConvNet with enough data and using the attention mechanism would help building cross "text-image" attentions.

The Training

We know they used the same training method as InstructGPT and ChatGPT but it is further fine-tuned with a set of rule-based reward models (RBRMs):

  • They first pre-trained the model to predict the next word with tons of internet data and data licensed from third party providers. 

  • Then they sampled typical human prompts and asked labelers to write down the correct outputs. They fine-tuned the model in a supervised learning manner. 

  • Then, they sampled human prompts and generated multiple outputs from the model. A labeler is then asked to rank those outputs. The resulting data is used to train a Reward model.

  • They then sampled more human prompts and they were used to fine-tuned the supervised fine-tuned model with Proximal Policy Optimization algorithm (PPO) (“Proximal Policy Optimization Algorithms“), a Reinforcement Learning algorithm.. The prompt is fed to the PPO model, the Reward model generates a reward value, and the PPO model is iteratively fine-tuned using the rewards and the prompts.

InstructGPT and ChatGPT training method
  • The RBRMs are there to mitigate harmful behaviors. There is a set of zero-shot GTP-4 classifiers that provide an additional reward signal for the PPO model. The model is fine-tuned such that it is rewarded for refusing to generate harmful content. For example, GPT-4 will refuse to explain how to build a bomb if prompted to do so.

Table 6 in the GPT-4 Technical Report

The number of parameters

OpenAI CEO: “people are begging to be disappointed and they will be” talking about the rumor that GPT-4 could have 100 Trillion parameters.

OpenAI is trying to hide the number of parameters but it is actually not too difficult to estimate it! In Figure 1 of the GPT-4 Technical report, they plot the loss function as a function of the compute needed to train the model.

Figure 1 of the GPT-4 Technical report

It happens that the same plot exists in the GPT-3 paper (“Language Models are Few-Shot Learners“) in Figure 3.1, and we can see that the point with a val-loss slightly below 2 corresponds to GPT 13B (GPT-3 model with ~13B parameters):

Figure 3.1 of the GPT-3 paper (https://arxiv.org/pdf/2005.14165.pdf)

To make sure of that, let’s use the given formula:

\(L(C)=2.57\cdot C^{-0.048}\)

and the corresponding Compute (PetaFLOP/s-days) given in table D.1 of the GPT-3 paper

Table D.1 of the GPT-3 paper (https://arxiv.org/pdf/2005.14165.pdf)

We have:

  • For GPT-3 6.7B: L(1.39E+02) ≃ 2.03

  • For GPT-3 13B: L(2.68E+02) ≃ 1.96

  • For GPT-3 175B: L(3.64E+03) ≃ 1.73

So the point corresponding to ~10,000 less compute than GPT-4 could potentially be GPT-3 175B but Figure 2 of the GPT-4 Technical report showcases another bigger model that corresponds to 1,000 less compute than GPT-4 and this is likely GPT-3 175B:

Figure 2 of the GPT-4 Technical report

Table D.1 of the GPT-3 paper shows a clear linear relation between compute and model size which leads us to believe that GPT-4 has 10,000 more parameters than GPT-3 13B which means ~100 Trillion parameters! This is a surprising estimate considering that OpenAI CEO Sam Altman said “people are begging to be disappointed and they will be” talking about the rumor that GPT-4 could have 100 Trillion parameters.


Here are the latest articles you may have missed:

  • The AiEdge+: Explainable AI - LIME and SHAP

  • Advanced Data Manipulation with Pandas

  • The AiEdge+: All the Transformers Applications

  • Exploratory Data Analysis with Pandas

To receive all the full articles and features of the AiEdge Newsletter, consider subscribing:


When the creators starts to fear their creation

With GPT-4, we reached the next stage of AI, not only for the capabilities of the model, but also for the tests we ran on it to ensure it was safe.

  • A preliminary model evaluation was run by the Alignment Research Center (ARC) to ensure it would not be able to autonomously replicate on its own. It is scary to think we have reached a point where this has to be tested!

  • They tested whether the model could provide the necessary information to proliferators seeking to develop, acquire, or disperse nuclear, radiological, biological and chemical weapons.

  • They are testing whether the model can identify individuals when augmented with outside data.

  • They contracted external cybersecurity experts to test GPT-4’s ability to aid in computer vulnerability discovery, assessment, and exploitation.

  • They GPT-4’s ability to interact with other systems to achieve tasks that could be adversarial in nature.

We are at a point where OpenAI warns us about the future economic impacts of the tool by predicting that numerous jobs will be replaced by GPT-4. We are really catching up on science fiction!

Share this post

GPT-4: The largest Large Language Model yet!

newsletter.theaiedge.io
Previous
Next
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 AiEdge
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing