Large Language Models, popularly known as LLMs is one of the most hotly debated topics in the AI industry. We all are aware of the possibilities and capabilities of ChatGPT by OpenAI. Eventually using those LLMs to our advantage discovers a lot of new possibilities using the data.

But you can’t just expect everything from LLMs because of its design limitations and a lot of other factors. But using the concept of Vector Search gives us a new type of app called Retrieval Augmented Generation (RAG) Applications. So let’s see what RAG applications are, how they can be used to tackle new problems, how they can be developed using Couchbase and have a detailed overview of how Vector Search helps in making those applications.

Before we get into the background of the application, here is an architectural diagram of what we are building and how LangChain ties into it:

Retrieval augmented generation (RAG)

RAG provides a way to augment the output of an LLM with targeted information without modifying the underlying model itself such that targeted information can be more up-to-date than the LLM as well as specific to a particular organization and industry. That means the generative AI system can provide more contextually appropriate answers to prompts as well as base those answers on extremely current data. Let’s understand this concept using a real life example.

Let’s say you belong to organization X which has a ton of data stored in their database and you are in charge of developing an application which asks for user input and gives the output based on the data present in your database.

Initially you may think, this looks easy, right? If you are aware of LLMs and how to leverage them to your needs, then it’s a simple task. You just choose OpenAI LLMs or Llama and Mistral models if you want to be cost effective, and simply shoot user questions to the LLM and get the results. 

But there is a big problem here…

For example, let’s assume you’re using the Llama 2 LLM 8B type.

Now this model is trained with almost all the data present on the public internet. You ask any questions, even questions about your org X, and it gives you the closest correct answer.

Now let’s make a small change to your problem statement. The ton of data which is present in your database is no longer public data, but private. Which means Llama 2 is unaware of your data and you will no longer give you correct answers.

Having the above scenario in mind, consider the user question, “what are the updates of the component C in org X?

So, how to solve this?

You might think, why don’t we pass the entire data present in the database, along with the prompt, so the LLM can use the data as context and answer the question. But here is the big problem, all LLMs have a constraint called token limit. Without getting into what are tokens, etc., for now consider 1 token == 1 word. 

Sadly, the token size limit of Llama 2 is 4096 tokens (words). Suppose the entire data present in your database has 10M words, then it becomes impossible to pass the entire data for context purposes. 

The solution to the above problem is called RAG. In RAG, we select a proportion of data present in your database which is very closely related to the user’s query. The proportion size is such that:

Proportion size < token limit

Now we pass the extracted data as context along with the query and get good results. This is RAG. But how do we get the proportion of data which is closely related to the user query and at the same time the size doesn’t exceed the token size? This is solved using the concept of Vector Search.

What is vector search?

Vector search leverages machine learning (ML) to capture the meaning and context of unstructured data, including text and images, transforming it into a numeric representation. Frequently used for semantic search, vector search finds similar data using approximate nearest neighbor (ANN) algorithms. 

Couchbase version 7.6.0 and above comes with this Vector Search feature. The important thing here is, no external libraries, modules and setups are required. Just having at least 1 search node does the job.

Couchbase internally uses the FAISS framework provided by Facebook to perform vector search.

Building a RAG application

Now let’s get to the actual stuff of developing a RAG application end-to-end using the Couchbase Vector Search functionality.

In this walkthrough, we are going to develop a chat with your pdfs application.

Before moving on, there are several ways to create the app. One such way is using the LangChain framework which we will use to develop the RAG application.

App 1: Building using LangChain framework

Step 1:  Setting up a Couchbase database 

You can set up the Couchbase server in EC2, Virtual Machine, your local machine, etc.

Follow this link to setup the Couchbase cluster. Make sure you have these services enabled, others are optional:

    • Data
    • Search

Note: Make sure you install Couchbase Server version 7.6.0 or above to perform vector search. Also we will develop this application using Python in the Mac OS environment.

Once the cluster is up and running, create a new project <project_name> and create a new Python file call app.py.

Now in the project terminal, execute the below command:

Now go to the UI and create a bucket named project. For this walkthrough, we will go with the default scope and collection.

Now there are different ways to generate vector embeddings. The popular one being OpenAI and we will be using that to generate vector embeddings.

Copy the below code to app.py:

If you hosted Couchbase in a VM, Then make sure you replace the word localhost to the public ip of the VM.

Step 2: Importing the search index

The vector search feature in Couchbase requires a search index to perform.  There are multiple ways to create the index, but to make things easy and fast, below is the index JSON. Copy the below code and paste it in:

  • UI > Search > Add index (top right) > Import

Index.json

Step 3: Loading the data

Now it’s time to store all the PDFs data as chunks along with its vector embeddings in the database.


Note: Read this detailed blog on chunking, data gathering, etc. It’s highly recommended to go through the blog to have a clear understanding of what we will be covering in the later steps.


There are libraries for different types of documents you want to upload. For example, if your source data is in .txt format then add the following code to your app.py:

But suppose say your source type is PDF, then:

Not only PDFs are support, LangChain offers support for multiple types such as: 

    • CSV
    • HTML
    • JSON
    • Markdown, and more

Learn more about Langchain document loaders.

Step 4: Inferring results

Now we are ready to send queries to our application:

App 2: Creating the app from scratch

Before starting, as described in the previous section, spin up your cluster with a bucket named project. Also, follow Step 2 of the previous section, making sure to import the search index.

Step 1: Setting up Couchbase

If you are going with defaults, then your app.py should look something like this:

Now that the Couchbase collection links and search index are ready, let’s move to the data loading part.

Step 2: Data loading

To keep things modularized, create a new Python file named load.py.

There are multiple ways to extract data from PDFs. To make it easy, lets use the pypdf package from Langchain:

load.py

Now this pages variable is a list of text blocks extracted from the pdf. Let’s merge all the content into one variable:

Before chunking, we need to set up the embedding model. In this case, let’s go with the sentence-transformers/paraphrase-distilroberta-base-v1 from hugging face.

This model gives you vector embeddings of 768 dimensions.

load.py

Now our model is ready. Let’s push the documents. We can use the package recursive character text splitter from Langchain.

This package gives you chunks of the provided size, then we will find the vector embedding for each chunk using the above model and push the document into the database.

So, the document will have two fields:

load.py

Now that our chunks are ready, we can find the embeddings for each chunk and push it to the database.

load.py

Wonder where the cb_coll came from? It’s the collection connector we created in app.py. To pass it, let’s wrap this entire thing in load.py to a function which accepts cb_coll as parameter.

So, finally your load.py should look like this:

Now let’s go to app.py, import this and call the load_data function.

app.py

Now this will push the docs in the required format and our search index will also get mutated. Now it’s time to do vector search.

Step 3: Vector Search

In Couchbase you have multiple ways to do it, one of them being the Curl method.

The result.stdout contains the k nearest Doc IDs. You can extend the script to perform a get request on all the IDs returned and combine the results to get the final context. Then we pass this context along with the prompt to the LLM to get the desired results.

References

Author

Posted by Sanjivani Patra - Software Engineer

Leave a reply