The Code Compass

The Code Compass

Share this post

The Code Compass
The Code Compass
[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma
Copy link
Facebook
Email
Notes
More

[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma

Constructing your local and zero cost RAG pipeline using free, open-source tools

CodeCompass's avatar
CodeCompass
Jul 23, 2024
∙ Paid
11

Share this post

The Code Compass
The Code Compass
[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma
Copy link
Facebook
Email
Notes
More
2
Share

Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.

Consider sharing this with someone who wants to know more about machine learning.

New here and don’t know where to begin?
Read the Getting Started page to find your way around The Code Compass.


Game of Thrones | Official Website for the HBO Series | HBO.com
Courtesy: Game Of Thrones (HBO)

0. What is Bran Stark’s age?

Here, is what you will be able to do by the end of this hands-on tutorial:

Give your favorite LLM (ChatGPT, LLAMA 3, Mixtral, Gemini, Claude, etc.) some documents, ask it a question and see it respond. This will be running locally, using open-source libraries. Zero API and tooling costs.

I was re-watching Game of Thrones and thought wouldn't it be fun to give the LLM scripts from the episode and ask it a question that requires fetching the relevant context and then generating the correct response?

Main title card for Game of Thrones
Courtesy: Game Of Thrones (HBO)

Me: “What is Bran’s age?”

Before, I show you the RAG response, do you know what Bran’s age was in Game of Thrones (Season 1)?

Jaime Lannister: The Things I Do For Love | geekysuperhero
Courtesy: Game Of Thrones (HBO)

I was not sure what the correct answer was either. The RAG pipeline then responds to my question with some reasoning as well as sources.

RAG output: “According to the script, Bran is 8 years old. Reasoning: Jaime asks Bran “How old are you, boy?” and Bran responds with ‘Eight.’”

And was the LLM correct? I went through the script and indeed, there it was, Jaime, asking him “How old are you, boy?” and he did say “Eight”!

Excerpt from the Game Of Thrones script.

1. Recap: What is RAG?

Last time, we discussed what RAG or retrieval augmented generation is.

Making an LLM perform well on your custom use case can be a costly affair. Pre-training ChatGPT-4 required 10s of TB of data, 100M$, and 100s of cutting-edge GPUs.

TL;DR: RAG overcomes the limitations of LLMs by bringing in external sources of information as relevant context. It does so at a fraction of the cost of pre-training a model from scratch or fine-tuning an existing model.

It works much like a student in an open-book exam. When faced with a question, the student can look up the latest information in textbooks or online resources, ensuring their answer is accurate and up-to-date.

Check out all the details here, with visuals to help you understand the concept of RAG and what it is all about:

What is Retrieval Augmented Generation? A Visual Guide On RAGs in the Context of LLMs

What is Retrieval Augmented Generation? A Visual Guide On RAGs in the Context of LLMs

CodeCompass
·
July 11, 2024
Read full story

2. Building An Open-Source RAG Pipeline From Scratch

Today, we see how you can build your own RAG pipeline.

YES! We will dive into the details and build a RAG pipeline from scratch.

I have made a couple of decisions while deciding on the implementation. To build the RAG pipeline, I have consciously decided to use:

  • Open-source tools: You won’t need to purchase API keys from OpenAI or other vendors to get this running.

  • Google Colab: The whole session will use Google Colab, so you can try it out yourself.

The results from last time’s poll were loud and clear!

The poll from last time worked out quite well. There was a clear interest in seeing what Retrieval Augmented Generation looks like in Python. And here we are!


"Attention, Please!": A Visual Guide To The Attention Mechanism [Transformers Series]

"Attention, Please!": A Visual Guide To The Attention Mechanism [Transformers Series]

CodeCompass
·
May 2, 2024
Read full story

3. Why Should You Care About RAG?

From last time, we recall RAG helps overcome limitations that come with all LLMs:

  • Integrate Sources of Information to Generate Accurate and Relevant Responses

  • Remedy Out-of-Date Pre-Trained Models

  • Model Grounding and Trustworthy LLM Responses

  • Enhanced Contextual Understanding


How Apple Performs Person Recognition Without Photos Leaving Your iPhone

How Apple Performs Person Recognition Without Photos Leaving Your iPhone

CodeCompass
·
April 5, 2024
Read full story
Confident Product Decisions with Data: Inside Spotify’s Risk-Aware A/B Testing Framework

Confident Product Decisions with Data: Inside Spotify’s Risk-Aware A/B Testing Framework

CodeCompass
·
June 29, 2024
Read full story
Inside AlphaFold: DeepMind’s Recipe For Success

Inside AlphaFold: DeepMind’s Recipe For Success

CodeCompass
·
June 6, 2024
Read full story

4. Hands-on Jupyter Notebook

It’s time to get your hands dirty!

You can find the Jupyter Notebook with the implementation below. I have added text annotations to help you better understand what is going on. Additionally, I have broken the process of creating a RAG from scratch (using open-source tools) into smaller parts.

Embeddings lie at the center of RAG. Apple and Netflix have used embeddings to solve similar problems in their products. Similarity on the embedding vectors is used to rank and retrieve the most relevant context based on the query.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 CodeCompass
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More