I’m working with a multi-tenant architecture in Couchbase where our documents are tagged with an account_id to identify which account each document belongs to. We are currently implementing vector search functionality and need to restrict searches to a specific subset of documents, specifically those with a particular account_id value.
For example, if a user from account 123 performs a vector search, we want to limit the search results to only the documents where account_id = "123". Our vector search currently does not have this filtering in place, and it’s returning results across all accounts. We would like to know how to integrate this filtering condition directly into the vector search query.
Has anyone tackled a similar problem or could suggest the best way to structure our vector search queries to include this kind of filtering?
Thanks in advance for any guidance or suggestions!
@PShri Like @vsr1 mentions, your ask for the support for query-time filtering of results before doing a vector search will become available in the upcoming release.
Until then an approach you could take is to use the index time document filtering to index only those documents that match the filter (type identifier) - this will not make much sense if you want to be able to filter documents on multiple account_ids.
Issue: When I try to access Jira, I get an error saying, “You don’t have access to Jira on jira.issues.couchbase.com.”
Document Filtering on Type Identifier:
We currently filter documents based on a type identifier. Could you clarify how this filtering might assist in running vector searches? I’m not sure how it applies in this context and would be good if you can explain it.
Vector Search Configuration:
I’m also struggling to get results from what should be a straightforward vector search. Here’s the setup:
• The vector is stored within a nested field in the document structure, as shown below. Despite following the steps in the basic vector search guide, the search returns no results.
• For the query, I directly used an existing vector from one of our embeddings, yet nothing matches.
Document Structure:
... (other fields)
"embedding_map": {
"text-embedding-3-small": {
"vector": [1536 elements of float value],
"updated_at": "timestamp"
},
"text-embedding-3-large": {
"vector": [3072 elements of float value],
"updated_at": "timestamp"
}
}
Query used in the web console: (have tried several values for k)
Is it possible that vector search doesn’t support nested fields like this, or is there something I’m missing? Any insights or recommendations would be appreciated. We are using the Enterprise Version 7.6.3
I got the search working. I would still like an answer for question 1 above.
In the index definition, for inner fields, we cannot directly use the insert child field option. We need to create mappings for every field until the desired field. The attached screenshot shows the working configuration. I hope this helps someone out there.
@PShri Happy to hear you figured out how to define your index to capture your document data correctly.
Let me throw more light on (1) now.
We currently filter documents based on a type identifier. Could you clarify how this filtering might assist in running vector searches? I’m not sure how it applies in this context and would be good if you can explain it.
Although your type identifier may be defaulting to use JSON type field whose default value would be type - per your index mappings you’re not really leveraging anything.
Your index mapping currently will obtain all documents from scope _default and collection content_embeddings and will index double nested fields vector and souce_document_updated_at from every document that it receives.
Now let’s say you only want to run vector search over documents that contain account_id:"123" …
You will first need to update the JSON type field’s value to account_id (keep in mind this will work if and only if account_id’s value in your JSON documents is a string.
You will update your top level mappings to hold a suffix of .<value>. So in your example that’ll change to …
_default.content_embeddings.123
The remaining nested mappings will remain as is.
So what happens with this is as documents are streamed to your index, every document is validated first for holding account_id:"123", and only those documents that match this filter will have their fields indexed.
This way when you perform vector search with this index, you’ll effectively only be considering documents with account_id:"123".
You can add as many mappings that match different account_ids as you wish but obviously this solution will get ugly after a point. So this I recommend as a limited solution until pre-filtering is available as a feature (very soon).