The Code Compass

The Code Compass

Share this post

The Code Compass
The Code Compass
Transformers and the Power of Positional Encoding [Transformers Series]

Transformers and the Power of Positional Encoding [Transformers Series]

How Transformers Find Order In Data

CodeCompass's avatar
CodeCompass
May 09, 2024
∙ Paid
7

Share this post

The Code Compass
The Code Compass
Transformers and the Power of Positional Encoding [Transformers Series]
1
Share

Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.

Consider sharing this with someone who wants to know more about machine learning.


Not sure where to begin? You can read the last post in the Transformers Series below:

"Attention, Please!": A Visual Guide To The Attention Mechanism [Transformers Series]

"Attention, Please!": A Visual Guide To The Attention Mechanism [Transformers Series]

CodeCompass
·
May 2, 2024
Read full story

The Rise of The Transformer

Before delving into positional encoding, let's briefly touch upon the rise of Transformers. In 2017, the introduction of the Transformer architecture in the seminal paper “Attention is All You Need” revolutionized the field of natural language processing (NLP) and eventually the whole of machine learning.

Unlike traditional sequential models such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs), Transformers leveraged the power of attention mechanisms to capture long-range dependencies efficiently. This breakthrough paved the way for Transformer-based models like BERT, GPT, and T5, which achieved remarkable success across NLP tasks.


Introduction

At the heart of the Transformer architecture lies the self-attention mechanism, enabling the model to weigh the relevance of different tokens in the input sequence dynamically.

Transformer: the all-powerful in the land of machine learning. Courtesy: PeakPx

However, unlike sequential models such as RNNs that inherently possess positional information through the order of tokens, Transformers treat input sequences as unordered sets. This lack of positional understanding could pose a significant challenge, particularly for tasks where the order of elements is crucial, such as language understanding.

This is where positional encoding comes into play. Positional encoding is a mechanism used to inject positional information into the input embeddings, enabling the Transformer to discern the sequential order of tokens. By encoding each token's position within the sequence, positional encoding equips the model with the necessary spatial awareness to process sequences effectively.

In this article, we discuss the importance of positional encoding in Transformers.

Transformer architecture [1].

1. What is Positional Encoding?

1.1 An Analogy: Reading a Book

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 CodeCompass
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share