The AiEdge+: All the Transformers Applications

There is more to Transformers than just Large Language Models!

Mar 06, 2023

∙ Paid

Today we dig into the field that is spicing up the news these days: Transformers! Everybody is talking about ChatGPT, OpenAI, the new Bing … Well, the culprit for such exciting turmoil in the AI world are the Transformers. We are going to look at:

The history of Transformers
The different applications of Transformers
More Github repos, Medium-like articles and Youtube videos at the end of the post

Size REALLY matters! Well at least for Transformers! We have already reached models of the order of Trillion parameters and we are just getting started. I am hoping future advancements will allow us to reach similar or better performance with significantly less parameters.

The history of Transformers

It all started with "Attention is all you need" back in 2017. But it really took off at the end of 2018 where BERT by Google was responding to the success of ELMo. 2019 was a great year to be alive! Each month, a new Sesame Street character was popping out to outperform the previous models on NLP tasks! BERT was really about Language understanding, mostly to encode sentences into a latent space without specific NLP tasks in mind.

There has been a lot of effort put into more specific applications. For example, XLM by Facebook marked the advent of the use of transformers for cross-lingual language representation to encode and decode sentences in any language. Have you heard of ChatGPT 🤣? It is a text generative transformer, and GPT-1 by OpenAI is one of the earliest precursors in this category.

There have been a lot of data domain specific models that have been built as the base BERT models were not great for specialized text data. For example, BioBERT is trained on biomedical texts, SciBERT on scientific texts and FinBERT on financial texts.

A lot of work has been put into trying to modify the original transformer block such as the Transformer XL and Switch, the multihead self-attention such as Longformer and Big Bird, or the training efficiency such as T5.

Transformers have gained success in NLP tasks due to their superiority to LSTM networks, but they also find applications in computer vision (Vision Transformer), speech recognition (Wave2vec 2.0), video (TimeSFormer), graphs and reinforcement learning (Decision Transformer). It is also a building block of current generative diffusion models such as Stable Diffusion and DALL-E 2.

To learn more about Transformers, I recommend "Transformers for Machine Learning: A Deep Dive". I find that all the books written by those guys are actually really great! A lot of the historical survey done in this post comes from that book.

By the way, if you don't see your "favorite" Transformer in this timeline, don't be mad, this is just a small survey!

The different applications of Transformers

If you think about Transformers, chances are you are thinking about NLP applications but how can we use Transformers for data types other than text? Actually, you can use Transformers on any data that you are able to express as a sequence of vectors, which is what Transformers feed on! Typically any sequence or time series of data points should be able to fit the bill.

Keep reading with a 7-day free trial

Subscribe to The AiEdge Newsletter to keep reading this post and get 7 days of free access to the full post archives.