How Do We Create Tokens From Words in LLMs

Introduction to LLMs

This is the 5th video in the Introduction to LLMs series. Check out the Table of Contents for more information.

  • Word-level vs Character-level vs Subword level embeddings

  • The Byte Pair Encoding Strategy

  • Special Tokens

  • The Hugging Face Tokenizer

  • Visualizing the Attentions with the Padding Tokken

