[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma

Constructing your local and zero cost RAG pipeline using free, open-source tools

Jul 23, 2024

∙ Paid

Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.

Consider sharing this with someone who wants to know more about machine learning.

New here and don’t know where to begin?
Read the Getting Started page to find your way around The Code Compass.

Game of Thrones | Official Website for the HBO Series | HBO.com — Courtesy: Game Of Thrones (HBO)

0. What is Bran Stark’s age?

Here, is what you will be able to do by the end of this hands-on tutorial:

Give your favorite LLM (ChatGPT, LLAMA 3, Mixtral, Gemini, Claude, etc.) some documents, ask it a question and see it respond. This will be running locally, using open-source libraries. Zero API and tooling costs.

I was re-watching Game of Thrones and thought wouldn't it be fun to give the LLM scripts from the episode and ask it a question that requires fetching the relevant context and then generating the correct response?

Main title card for Game of Thrones — Courtesy: Game Of Thrones (HBO)

Me: “What is Bran’s age?”

Before, I show you the RAG response, do you know what Bran’s age was in Game of Thrones (Season 1)?

Jaime Lannister: The Things I Do For Love | geekysuperhero — Courtesy: Game Of Thrones (HBO)

I was not sure what the correct answer was either. The RAG pipeline then responds to my question with some reasoning as well as sources.

RAG output: “According to the script, Bran is 8 years old. Reasoning: Jaime asks Bran “How old are you, boy?” and Bran responds with ‘Eight.’”

And was the LLM correct? I went through the script and indeed, there it was, Jaime, asking him “How old are you, boy?” and he did say “Eight”!

Excerpt from the Game Of Thrones script.

1. Recap: What is RAG?

Last time, we discussed what RAG or retrieval augmented generation is.

Making an LLM perform well on your custom use case can be a costly affair. Pre-training ChatGPT-4 required 10s of TB of data, 100M$, and 100s of cutting-edge GPUs.

TL;DR: RAG overcomes the limitations of LLMs by bringing in external sources of information as relevant context. It does so at a fraction of the cost of pre-training a model from scratch or fine-tuning an existing model.

It works much like a student in an open-book exam. When faced with a question, the student can look up the latest information in textbooks or online resources, ensuring their answer is accurate and up-to-date.

Check out all the details here, with visuals to help you understand the concept of RAG and what it is all about:

What is Retrieval Augmented Generation? A Visual Guide On RAGs in the Context of LLMs

CodeCompass

July 11, 2024

Read full story

2. Building An Open-Source RAG Pipeline From Scratch

Today, we see how you can build your own RAG pipeline.

YES! We will dive into the details and build a RAG pipeline from scratch.

I have made a couple of decisions while deciding on the implementation. To build the RAG pipeline, I have consciously decided to use:

Open-source tools: You won’t need to purchase API keys from OpenAI or other vendors to get this running.
Google Colab: The whole session will use Google Colab, so you can try it out yourself.

The results from last time’s poll were loud and clear!

The poll from last time worked out quite well. There was a clear interest in seeing what Retrieval Augmented Generation looks like in Python. And here we are!

"Attention, Please!": A Visual Guide To The Attention Mechanism [Transformers Series]

CodeCompass

May 2, 2024

Read full story

3. Why Should You Care About RAG?

From last time, we recall RAG helps overcome limitations that come with all LLMs:

Integrate Sources of Information to Generate Accurate and Relevant Responses
Remedy Out-of-Date Pre-Trained Models
Model Grounding and Trustworthy LLM Responses
Enhanced Contextual Understanding

How Apple Performs Person Recognition Without Photos Leaving Your iPhone

CodeCompass

April 5, 2024

Read full story

Confident Product Decisions with Data: Inside Spotify’s Risk-Aware A/B Testing Framework

CodeCompass

June 29, 2024

Read full story

Inside AlphaFold: DeepMind’s Recipe For Success

CodeCompass

June 6, 2024

Read full story

4. Hands-on Jupyter Notebook

It’s time to get your hands dirty!

You can find the Jupyter Notebook with the implementation below. I have added text annotations to help you better understand what is going on. Additionally, I have broken the process of creating a RAG from scratch (using open-source tools) into smaller parts.

Embeddings lie at the center of RAG. Apple and Netflix have used embeddings to solve similar problems in their products. Similarity on the embedding vectors is used to rank and retrieve the most relevant context based on the query.

The Code Compass

[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma

Constructing your local and zero cost RAG pipeline using free, open-source tools

0. What is Bran Stark’s age?

1. Recap: What is RAG?

What is Retrieval Augmented Generation? A Visual Guide On RAGs in the Context of LLMs

2. Building An Open-Source RAG Pipeline From Scratch

"Attention, Please!": A Visual Guide To The Attention Mechanism [Transformers Series]

3. Why Should You Care About RAG?

How Apple Performs Person Recognition Without Photos Leaving Your iPhone

Confident Product Decisions with Data: Inside Spotify’s Risk-Aware A/B Testing Framework

Inside AlphaFold: DeepMind’s Recipe For Success

4. Hands-on Jupyter Notebook

This post is for paid subscribers