Vector Search with a Python Couchbase SDK raise couchbase.exceptions.QueryIndexNotFoundException

roberace · April 30, 2024, 6:14pm

Couchbase Server 7.6.1-3200 (EE)
Bucket = ‘chatui’, Collection = ‘default’

follow this link Run a Vector Search with a Couchbase SDK | Couchbase Docs

I have tried doing a vector search:

def get_retriever(self, query: str, k: int = 4) -> VectorStoreRetriever:
    # Generate the query vector using the embeddings model
    query_vector = self.embeddings.embed_query(query)

    try:
        # Construct the vector search request
        search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
            VectorSearch.from_vector_query(VectorQuery('vectors', query_vector, num_candidates=k))
        )

        #scope = self.bucket.default_scope()
        query_index_manager = self.cluster.query_indexes()
        indexes = query_index_manager.get_all_indexes('chatui')
        index_names = {index.name for index in indexes}  # Set of all index names for quick lookup
        for name in index_names:
                print(name)
        # Execute the vector search request
        result = self.cluster.search("vectors_index", search_req, SearchOptions(limit=k, fields=["text", "metadata"]))

        # Extract the documents and vectors from the search result
        documents = []
        vectors = []
        for row in result.rows():
            document_data = row.fields
            if "text" in document_data and "metadata" in document_data:
                for text_chunk in document_data["text"]:
                    documents.append(Document(page_content=text_chunk, metadata=document_data["metadata"]))
                vectors.extend(document_data["vectors"])

        if not documents:
            raise ValueError("No relevant documents found")

        # Create a vector store using the VectorFactory
        vectorstore, _, _ = VectorFactory.create_vector_storage(vector_type="faiss", embedding_size=len(vectors[0]))

        # Add the extracted vectors to the vector store's index
        vectorstore.add_texts([doc.page_content for doc in documents], [doc.metadata for doc in documents])

        # Create a VectorStoreRetriever with the documents and vector store
        retriever = VectorStoreRetriever(documents=documents, vectorstore=vectorstore.vectorstore)

        return retriever

    except CouchbaseException as ex:
        import traceback
        traceback.print_exc()
        raise ex

the structure document looks like that

{
“metadata”: {
“source”: “Mitologia Celta”,
“file_id”: “662f75f5596c6e6c64527484”,
“user_id”: “user123”,
“file_size”: 1201455
},
“text”: [“Reina de las hadas irlandesas. Hija de Manannan. Se la confunde a veces con Morrigan, a veces con Dana.\n\n\n\n\n\n\n\n\nLUNED. Es una hada dotada de poderes sobrenaturales. Imagen de una antigua divinidad de amor y magia.\n\n\n\nGIGANTES\n===\n\nLos gigantes aparecen en toda la tradición Celta. Aunque hay muchos, parece ser que existió un grupo muy importante que fueron los Fomore. Este era un pueblo misterioso de la tradición irlandesa. Los Fomore no invadieron Irlanda, pero la amenzaron constantemente. Son gigantes que vivían en las islas que rodean a Irlanda. Estos son algunos de los gigantes más famosos.”],
“text_length”: 2,
“vectors”: [
[-0.006697558332234621, -0.014219328761100769, -0.018976973369717598, -0.023895440623164177, -0.021067658439278603, 0.0023051127791404724, -0.024069664999842644, 0.013053370639681816, -0.011391544714570045, -0.013636349700391293, 0.009656010195612907, 0.03704262524843216, 0.008476649411022663, 0.0016727143665775657, -0.010031260550022125,

and there is a index for the field “vectors” called “vector_index”
CREATE INDEX vector_index ON chatui(vectors)
we can check that exits
[
{
“code”: 4300,
“msg”: “The index vector_index already exists.”,
“reason”: {
“name”: “vector_index”
}
}
]

Ok, but there is no way, it doesn’t works, all time and exception raises, even the index stay here

File “C:\Projects Python\evo-brains-backend-v2\core\vectorstorage\couchbase.py”, line 294, in get_retriever
raise ex
File “C:\Projects Python\evo-brains-backend-v2\core\vectorstorage\couchbase.py”, line 270, in get_retriever
for row in result.rows():
File “c:\Users\rgutierrez.mourente\AppData\Local\anaconda3\envs\evopy2\Lib\site-packages\couchbase\search.py”, line 136, in next
raise ex
File “c:\Users\rgutierrez.mourente\AppData\Local\anaconda3\envs\evopy2\Lib\site-packages\couchbase\search.py”, line 130, in next
return self._get_next_row()
^^^^^^^^^^^^^^^^^^^^
File “c:\Users\rgutierrez.mourente\AppData\Local\anaconda3\envs\evopy2\Lib\site-packages\couchbase\search.py”, line 121, in _get_next_row
raise ErrorMapper.build_exception(row)
couchbase.exceptions.QueryIndexNotFoundException: QueryIndexNotFoundException(<ec=17, category=couchbase.common, message=index_not_found (17), context=SearchErrorContext({‘last_dispatched_to’: ‘127.0.0.1:8094’, ‘last_dispatched_from’: ‘127.0.0.1:57172’, ‘retry_attempts’: 0, ‘client_context_id’: ‘b5c992-1ece-cb4a-9289-351455534f79d4’, ‘method’: ‘POST’, ‘path’: ‘/api/index/vectors_index/query’, ‘http_status’: 400, ‘http_body’: '{“error”:“rest_auth: preparePerms, err: index not found”,“request”:{“ctl”:{“timeout”:75000},“explain”:false,“fields”:[“text”,“metadata”],“knn”:[{“field”:“vectors”,“k”:2,“vector”:[0.0058336150482652645,-0.004799766902900294,0.0…

I will be very grateful if someone can help me

mreiche · April 30, 2024, 7:15pm

Hi @roberace - it’s great to see someone trying out vector search. You are so close…

No ‘s’. vector_index

{
“code”: 4300,
“msg”: “The index vector_index already exists.”,
“reason”: {
“name”: “vector_index”
}

roberace · April 30, 2024, 9:19pm

Thanks for response mreiche.

Ok, I change the implementation to point to vector_index, but still a recieved the same error

def get_retriever(self, query: str, k: int = 4) → VectorStoreRetriever:
# Generate the query vector using the embeddings model
query_vector = self.embeddings.embed_query(query)

    try:
        # Construct the vector search request
        search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
            VectorSearch.from_vector_query(VectorQuery('vector', query_vector, num_candidates=k))
        )

        #scope = self.bucket.default_scope()
        query_index_manager = self.cluster.query_indexes()
        indexes = query_index_manager.get_all_indexes(COUCHBASE_VECSTORE_DB_NAME)
        index_names = {index.name for index in indexes}  # Set of all index names for quick lookup
        for name in index_names:
                print(name)
        # Execute the vector search request
        result = self.cluster.search("vector_index", search_req, SearchOptions(limit=k, fields=["text", "metadata"]))

        # Extract the documents and vectors from the search result
        documents = []
        vectors = []
        for row in result.rows():
            document_data = row.fields
            if "text" in document_data and "metadata" in document_data:
                for text_chunk in document_data["text"]:
                    documents.append(Document(page_content=text_chunk, metadata=document_data["metadata"]))
                vectors.extend(document_data["vector"])

        if not documents:
            raise ValueError("No relevant documents found")

        # Create a vector store using the VectorFactory
        vectorstore, _, _ = VectorFactory.create_vector_storage(vector_type="faiss", embedding_size=len(vectors[0]))

        # Add the extracted vectors to the vector store's index
        vectorstore.add_texts([doc.page_content for doc in documents], [doc.metadata for doc in documents])

        # Create a VectorStoreRetriever with the documents and vector store
        retriever = VectorStoreRetriever(documents=documents, vectorstore=vectorstore.vectorstore)

        return retriever

    except CouchbaseException as ex:
        import traceback
        traceback.print_exc()
        raise ex

couchbase.exceptions.QueryIndexNotFoundException: QueryIndexNotFoundException(<ec=17, category=couchbase.common, message=index_not_found (17), context=SearchErrorContext({‘last_dispatched_to’: ‘127.0.0.1:8094’, ‘last_dispatched_from’: ‘127.0.0.1:52994’, ‘retry_attempts’: 0, ‘client_context_id’: ‘ed13da-895b-524f-56fe-1ba0c5df1ed733’, ‘method’: ‘POST’, ‘path’: ‘/api/index/vector_index/query’, ‘http_status’: 400, ‘http_body’: '{“error”:“rest_auth: preparePerms, err: index not found”,“request”:{“ctl”:{“timeout”:75000},“explain”:false,“fields”:[“text”,“metadata”],“knn”:[{“field”:“vector”,“k”:2,“vector”:[0.0058336150482652645,-0.004799766902900294,

What I can be doing wrong?

roberace · April 30, 2024, 9:20pm

mreiche · April 30, 2024, 9:31pm

Can you point me to the example or documentation you are following?

Search uses an fts index, but you’ve created a GSI index which is used for query.

roberace · April 30, 2024, 10:23pm

Hi mreiche, you were right!
I have created the vector_index with the Web Console, following the specifications on this page Create a Vector Search Index with the Web Console | Couchbase Docs

Now the search finds the index, but another error needed to be corrected, because it was executing the search on the cluster

result = self.cluster.search("vector_index", search_req, SearchOptions(limit=k, fields=["vector"]))

instead of bucket scope

scope = self.bucket.default_scope()
result = scope.search("vector_index", search_req, SearchOptions(limit=k, fields=["vector","text","metadata"]))

If the cluster is used then the index could find it this way

result = self.cluster.search("chatui._default.vector_index", search_req, SearchOptions(limit=k, fields=["vector"]))

In the end it finds the index, but the search does not return any results.

Initially what I do is obtain the embedding of a text similar to the content of the text of the document

query_vector = self.embeddings.embed_query(query)
             search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
                 VectorSearch.from_vector_query(VectorQuery('vector', query_vector, num_candidates=k))
             )

but search does not return any row

So I have tried the following:

I have selected an element vector from a document

and I have plugged the array with that single element of the vector directly into the query_vector

query_vector = [-0.006697558332234621]
search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
                 VectorSearch.from_vector_query(VectorQuery('vector', query_vector, num_candidates=k))
             )

but the search doesn’t find results either

raise ValueError("No relevant documents found")

Something must be wrong

roberace · April 30, 2024, 10:31pm

vector_index looks like that:

{
  "type": "fulltext-index",
  "name": "chatui._default.vector_index",
  "uuid": "3c197d352e8360c9",
  "sourceType": "gocbcore",
  "sourceName": "chatui",
  "sourceUUID": "03324e97fac08d21b52d3354c3508270",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "scope.collection.type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": true,
        "properties": {
          "vector": {
            "dynamic": false,
            "enabled": true,
            "fields": [
              {
                "dims": 2048,
                "index": true,
                "name": "vector",
                "similarity": "l2_norm",
                "type": "vector",
                "vector_index_optimized_for": "recall"
              }
            ]
          }
        }
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type"
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 16
    }
  },
  "sourceParams": {}
}

roberace · April 30, 2024, 10:32pm

and that is the function

def get_retriever(self, query: str, k: int = 4) -> VectorStoreRetriever:
        # Generate the query vector using the embeddings model
        query_vector = self.embeddings.embed_query(query)
        query_vector = [-0.006697558332234621]
        try:
            # Construct the vector search request
            # search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
            #     VectorSearch.from_vector_query(VectorQuery('vector', query_vector, num_candidates=k))
            # )
            search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
                VectorSearch.from_vector_query(VectorQuery('vector', query_vector, num_candidates=k))
            )
            scope = self.bucket.default_scope()            
            # Execute the vector search request
            #result = self.cluster.search("chatui._default.vector_index", search_req, SearchOptions(limit=k, fields=["vector"]))            
            result = scope.search("vector_index", search_req, SearchOptions(limit=k, fields=["vector","text","metadata"]))

            # Extract the documents and vectors from the search result
            documents = []
            vectors = []
            for row in result.rows():
                document_data = row.fields
                if "text" in document_data and "metadata" in document_data:
                    for text_chunk in document_data["text"]:
                        documents.append(Document(page_content=text_chunk, metadata=document_data["metadata"]))
                    vectors.extend(document_data["vector"])

            if not documents:
                raise ValueError("No relevant documents found")

            # Create a vector store using the VectorFactory
            vectorstore, _, _ = VectorFactory.create_vector_storage(vector_type="faiss", embedding_size=len(vectors[0]))

            # Add the extracted vectors to the vector store's index
            vectorstore.add_texts([doc.page_content for doc in documents], [doc.metadata for doc in documents])

            # Create a VectorStoreRetriever with the documents and vector store
            retriever = VectorStoreRetriever(documents=documents, vectorstore=vectorstore.vectorstore)

            return retriever

mreiche · April 30, 2024, 10:37pm

If I’m not mistaken, MatchNoneQuery never matches. Which example are you following?

roberace · April 30, 2024, 10:39pm

from this link Run a Vector Search with a Couchbase SDK | Couchbase Docs

Example: Semantic Search with Color Descriptions

#!/usr/bin/env python3

import os
import sys
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from couchbase.auth import PasswordAuthenticator
from couchbase.exceptions import CouchbaseException
import couchbase.search as search
from couchbase.options import SearchOptions
from couchbase.vector_search import VectorQuery, VectorSearch
from openai import OpenAI

# Change the question as desired
question = "What color hides everything like the night?"

# Make sure to replace OPENAI_API_KEY with your own API Key
openai_api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI()

# Make sure to change CB_USERNAME, CB_PASSWORD, and CB_HOSTNAME to the username, password, and hostname for your database. 
pa = PasswordAuthenticator(os.getenv("CB_USERNAME"), os.getenv("CB_PASSWORD"))
cluster = Cluster("couchbases://" + os.getenv("CB_HOSTNAME") + "/?ssl=no_verify", ClusterOptions(pa))
# Make sure to change the bucket, scope, and index names to match where you stored the sample data in your database. 
bucket = cluster.bucket("vector-sample")
scope = bucket.scope("color")
search_index = "color-index"
try:
    vector = client.embeddings.create(input = [question], model="text-embedding-ada-002").data[0].embedding
    search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
        VectorSearch.from_vector_query(VectorQuery('embedding_vector_dot', vector, num_candidates=2)))
        # Change the limit value to return more results. Change the fields array to return different fields from your Search index.
    result = scope.search(search_index, search_req, SearchOptions(limit=13,fields=["color", "description"]))
    for row in result.rows():
        print("Found row: {}".format(row))
    print("Reported total rows: {}".format(
        result.metadata().metrics().total_rows()))
except CouchbaseException as ex:
    import traceback
    traceback.print_exc()

mreiche · April 30, 2024, 10:42pm

Ok - I see - the Python example.

roberace · April 30, 2024, 10:56pm

What I am trying to do, you know, is to obtain the documents closest in similarity to the questions that the user asks the RAG Agent, to include the document or documents in the context of the Agent’s Prompt, with the purpose of injecting private information into the AI.

For this there are two options, or you bring all the documents from the db, you generate the embedings in the runtime of the app, and then you can run a similarity_search on the embedings:

question = "Who is Arturo?"
documents = retriever.vectorstore.similarity_search(question, k=2)

This would be the complete example:

# Create embeddings
embeddings = OpenAIembeddings()

# Create the vector store
vector_store = CouchbaseVectorStorage(collection_name="", embedding=embeddings)

# Create a retriever
retriever = vector_store.get_vector_storage()

# Create a question-answering chain
qa_chain = RetrievalQA.from_chain_type(
     llm=OpenAI(),
     chain_type="stuff",
     retriever=retriever,
     return_source_documents=True
)

# Define a prompt template
prompt_template = """
Given the following extracted parts of a long document and a question, create a final answer.

{context}

Question: {question}
Final Answer:
"""

# Create a prompt
prompt = PromptTemplate(
     template=prompt_template,
     input_variables=["context", "question"]
)

# Create a custom chain
custom_chain = LLMChain(
     llm=OpenAI(),
     prompt=prompt
)

question = "Who is Arturo?"
documents = retriever.vectorstore.similarity_search(question, k=2)
#documents = vector_store.similarity_search(question, k=2, threshold=0.5)

# Combine the relevant documents into a single context string
context = "\n".join([doc.page_content for doc in documents])

# Run the custom chain with the combined context
result = custom_chain.run(
     question=question,
     context=context
)

print(result)

advantages: similarity_search works 100% fine
disadvantages: it is inefficient, because you need to bring all the documents in the collection and do the embedding again

and I was trying this other option, which is Vector Search on the server

This way the proximity vector search is done by the server and you only get the relevant documents, which is what I was trying to do with the Couchbase Python SDK. With this approach the app will work much more efficiently

roberace · April 30, 2024, 11:08pm

A simple N1QL query is not suitable for this purpose, because what is sought is to find the semantic similarity of the user’s question within the embedding of the documents stored in the db.

roberace · April 30, 2024, 11:10pm

roberace · April 30, 2024, 11:20pm

If the db is not capable of performing semantic proximity vector searches, then you are forced to manually select the most relevant documents by doing N1QL queries to the metadata or attributes such as the text chunks of those documents, so as not to have to bring all the documents from the db to the app, since with FAISS for example, you can do very precise semantic vector searches

If the server does not perform vector semantic search, this reduces the functionality and efficiency of the app, because it is ideal for the server to perform this function for much more precise AI responses, with a reduced context very fine-tuned to what the user is asking. which results in less bandwidth and faster responses

mreiche · April 30, 2024, 11:30pm

roberace:

        result = self.cluster.search("vector_index", search_req, SearchOptions(limit=k, fields=["text", "metadata"]))

        # Extract the documents and vectors from the search result
        documents = []
        vectors = []
        for row in result.rows():
            document_data = row.fields
            if "text" in document_data and "metadata" in document_data:
                for text_chunk in document_data["text"]:
                    documents.append(Document(page_content=text_chunk, metadata=document_data["metadata"]))
                vectors.extend(document_data["vector"])

        if not documents:
            raise ValueError("No relevant documents found")

Maybe the only issue is that ‘documents’ is not being populated?
How about just printing out ‘result’? And ‘row’?

abhinav · April 30, 2024, 11:42pm

@roberace So firstly you cannot use GSI at the moment to do a vector search query in Couchbase, you’ll need to introduce the search service and use a search index.

So you cannot use the CREATE INDEX command here to set up your index - it seems you’ve figure that out already.

Now there’s been a lot of conversation here, but looking at your earlier logs -

‘path’: ‘/api/index/vectors_index/query’, ‘http_status’: 400

You’re trying to hit a search service endpoint for an index of the name vectors_index - but as you can see that index is not in your search system.

You’ve gone ahead and introduced a search index for the vector field per this comment. So you now have a search index in the system, whose name is - chatui._default.vector_index.

If you’re using the bucket API in the python SDK (whose syntaxes I’m not super familiar with) - you should be using that name^ as opposed to just vector_index.

Would you let me know if this solves your query index not found exception?

p.s. Also, I hope your query vector’s dimension matches 2048 which you’ve assigned for the vector field in your index definition.

nithishr · May 1, 2024, 2:20pm

@roberace I think Abhi’s suggestions should solve the problem with the SDK code.

Just a couple of things to note:

When you are using the scoped search index, the name of the index in the search should be just the name without the bucket & scope.
scope.search(index_name)
If you don’t want to perform hybrid searches, you can use just the VectorQuery in the search request. Here is an example:

search_req = search.SearchRequest.create(
            VectorSearch.from_vector_query(
                VectorQuery(
                    vector,
                    search_embedding,
                    no_of_docs,
                )
            )
        )

From the example, it seems like you are using LangChain for the application. Is there any reason to not use the Couchbase integration in LangChain?

roberace · May 1, 2024, 10:51pm

Ok, I’ve been working hard to find out how this works, and I want to share my findings with all of you.

I have simplified the example code as much as possible to, from a minimalist perspective, implement the only things strictly necessary to perform the laboratory test.

I worked with a simple document structure:

{
  "id": "#FFEFD5",
  "color": "papaya whip",
  "brightness": 240.82,
  "colorvect_l2": [255, 239, 213, 146, 218],
  "wheel_pos": "other",
  "verbs": ["soften", "mellow", "lighten"],
  "description": "Papaya whip is a soft and mellow color that can be described as a light shade of peach or coral. ",
  "embedding_model": "text-embedding-ada-002-v2"
}

I have inserted two documents in the collection with this structure

this is the code with which I implemented the couchbase vector search

from couchbase.cluster import Cluster
from couchbase.auth import PasswordAuthenticator
from couchbase.exceptions import CouchbaseException
import couchbase.search as search
from couchbase.options import SearchOptions
from couchbase.vector_search import VectorQuery, VectorSearch
from couchbase.options import QueryOptions

authenticator = PasswordAuthenticator('Administrator', 'xxxxxxxx')
cluster = Cluster('couchbase://localhost?username=Administrator&password=xxxxxxxx, authenticator=authenticator, timeout=30)
# Open the bucket and collection
bucket = cluster.bucket('chatui')
collection = bucket.default_collection()
scope = bucket.default_scope() 

id = "#FFEFD5"
query = f"SELECT * FROM `{'chatui'}` WHERE id = ?"
options = QueryOptions(positional_parameters=[id])

result_list = cluster.query(query, options)
for row in result_list:
    document_data = row.get('chatui', {})
    print(f"Vectors in the document: {document_data.get('colorvect_l2')}")

query_vector = [0.255, 0.239, 0.213, 0.218, 0.197]

search_index = "color_index"

try:
    search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search(
        VectorSearch.from_vector_query(VectorQuery('colorvect_l2', query_vector, num_candidates=10))
    )

    result = scope.search(search_index, search_req, SearchOptions(limit=10, fields=["description"]))

    for row in result.rows():
        print("Found row: {}".format(row))

    print("Reported total rows: {}".format(result.metadata().metrics().total_rows()))
    search_meta_data = result.metadata()
    print(search_meta_data)
    
except CouchbaseException as ex:
    import traceback
    traceback.print_exc()

In the code you can see how a N1QL query is first made with the objective of checking the value of the document_data.get(‘colorvect_l2’) field in the two documents that we have inserted into the collection

to then do the vector search in the collection

The query is made of an array of vectors approximate to the value of the documents in the collection

and the color_index is used to perform the search

The index created is the following

{
  "type": "fulltext-index",
  "name": "chatui._default.color_index",
  "uuid": "60b8cd65a0ee6c82",
  "sourceType": "gocbcore",
  "sourceName": "chatui",
  "sourceUUID": "03324e97fac08d21b52d3354c3508270",
  "planParams": {
    "maxPartitionsPerPIndex": 1024,
    "indexPartitions": 1
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "scope.collection.type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": true,
        "properties": {
          "colorvect_l2": {
            "dynamic": false,
            "enabled": true,
            "fields": [
              {
                "dims": 5,
                "index": true,
                "name": "colorvect_l2",
                "similarity": "l2_norm",
                "type": "vector",
                "vector_index_optimized_for": "recall"
              }
            ]
          }
        }
      },
      "default_type": "_default",
      "docvalues_dynamic": false,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type"
    },
    "store": {
      "indexType": "scorch",
      "segmentVersion": 16
    }
  },
  "sourceParams": {}
}

By executing the code we can see how it has matched the two documents within the collection

Vectors in the document: [255, 239, 213, 146, 218]
Vectors in the document: [132, 228, 215, 198, 236]
Found row: SearchRow(index='chatui._default.color_index_60b8cd65a0ee6c82_4c1c5584', id='a9e8d843-2374-49b9-8f29-0a6ac491f11b_copy', score=4.759954384762136e-06, fields=None, sort=[], locations=None, fragments={}, explanation={})
Found row: SearchRow(index='chatui._default.color_index_60b8cd65a0ee6c82_4c1c5584', id='a9e8d843-2374-49b9-8f29-0a6ac491f11b', score=4.239611444910588e-06, fields=None, sort=[], locations=None, fragments={}, explanation={})
Reported total rows: 2

The conclusions are the following:

As our friend @abhinav said, the dimensions of the vector search index must match the dimensions of the array of the field on which you want to perform the comparison when searching

In this case the array of the “colorvect_l2” field is 5 elements, so the dimension of the vector index has been set to 5.

If the dimension of any of the 3 elements in play is altered, the search will not return results.

The three elements are:

List item search query.
document vector field dimension.
index dimension.

All these three elements must match in their dimension size, if any do not match then zero results, it will not even raise an exception or give an error, it will simply return zero results

Reported total rows: 0
SearchMetaData:{'client_context_id': '6f21bf-54de-0947-e174-5891108c1a63e5', 'metrics': {'took': 71866, 'total_rows': 0, 'max_score': 0.0, 'success_partition_count': 1, 'error_partition_count': 0}, 'errors': {}}

remember, each of those three must coincide in its dimension

I hope this can be of help to many more, and thank you very much for your help.

we will stay tuned

roberace · May 1, 2024, 10:58pm

remember all of these must be the same length

Topic		Replies	Views
Hi, I want to know how can I filtering vector search Couchbase Server query	1	156	September 6, 2024
Couchbase filter with python sdk Couchbase Server query	2	141	September 12, 2024
How vector search works? Couchbase Server	2	224	May 29, 2024
Couchbase server filtering Couchbase Server query , index	3	167	September 10, 2024
Llamaindex vector store query empty response Couchbase Server	7	265	October 15, 2024

Vector Search with a Python Couchbase SDK raise couchbase.exceptions.QueryIndexNotFoundException

Example: Semantic Search with Color Descriptions

Related topics