FTS search and vectorsearch

I have code that runs both FTS (Full-Text Search) and vector search together. Upon checking the results, it seems that vector search is not performed within the filtered values from FTS, but rather, the filtering and vector search are done separately, and then the results are combined. How can I resolve this issue? Below is my code.

from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from couchbase.auth import PasswordAuthenticator
from couchbase.exceptions import CouchbaseException
import couchbase.search as search
from couchbase.options import SearchOptions
from couchbase.vector_search import VectorQuery, VectorSearch
import csv
import json
from openai import OpenAI
import random
from couchbase.n1ql import N1QLQuery

client=OpenAI(api_key='api')



cluster = Cluster(
    "couchbase://ip",
    authenticator=PasswordAuthenticator(
        "id","pw"
    )
)
question="something"
bucket = cluster.bucket("my_bucket")

scope = bucket.scope("my_scope")

authorities = random.sample(range(1, 3001), 100)
authorities = [str(auth) for auth in authorities]


permissions_query = ' or '.join([f'authority:"{auth}"' for auth in authorities])
search_index = "my-index"


try:
    vector = client.embeddings.create(input = [question], model="text-embedding-3-small").data[0].embedding
    search_req = search.SearchRequest.create(search.MatchQuery(permissions_query)).with_vector_search(
        VectorSearch.from_vector_query(VectorQuery('title_body_vector', vector, num_candidates=100)))
        # Change the limit value to return more results. Change the fields array to return different fields from your Search index.
    result = scope.search(search_index, search_req, SearchOptions(limit=10,fields=["title","body"]))
    for row in result.rows():
        print("Found row: {}".format(row))
    print("Reported total rows: {}".format(
        result.metadata().metrics().total_rows()))
except CouchbaseException as ex:
    import traceback
    traceback.print_exc()

Hi @leolee
I think this is a result of the default setting being to OR the FTS and vector results together. You can change it to AND them instead - please see https://docs.couchbase.com/python-sdk/current/howtos/full-text-searching-with-sdk.html#combining-fts-and-vector-queries

Hi, thanx to your reply, can you show me some example How can i using vector_query_combination?

vector = client.embeddings.create(input = [question], model="text-embedding-3-small").data[0].embedding
        request = (search.SearchRequest.create(search.MatchQuery(permissions_query))
           .with_vector_search(VectorSearch.from_vector_query(VectorQuery('title_body_vector',
                                                                          vector))))
        start_time = time()
        
        result = scope.search(search_index, request,VectorSearchOptions(vector_query_combination=VectorQueryCombination.AND))

Is this Right?

@leolee I’m no expert on the Python SDK, perhaps @jcasey can check that example?

Hi @leoleeVectorQueryCombination can be added to the VectorSearch via the VectorSearchOptions. Snippet (adapted from your example) below.

request = (search.SearchRequest.create(search.MatchQuery(permissions_query))
           .with_vector_search(VectorSearch([VectorQuery('title_body_vector', vector)],
                                            VectorSearchOptions(vector_query_combination=VectorQueryCombination.AND))))

I hope this helps.

Thank you for your response. However, even when the FTS results are 0, vector search results still appear. I thought that using an AND operation would yield results that satisfy both FTS and vector search functions. Is there no such feature?

Can you show your code please?

from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from couchbase.auth import PasswordAuthenticator
from couchbase.exceptions import CouchbaseException
import couchbase.search as search
from couchbase.options import SearchOptions, VectorSearchOptions
from couchbase.vector_search import VectorQuery, VectorSearch, VectorQueryCombination
import csv
import json
from openai import OpenAI
import random
from couchbase.n1ql import N1QLQuery
import pandas as pd
from time import time


client=OpenAI(api_key='api')

df = pd.read_csv('my_csv.csv')
df["Questions"]

time_list=[]
cluster = Cluster(
    "couchbase://address",
    authenticator=PasswordAuthenticator(
        "id","pw"
    )
)
bucket = cluster.bucket("my_b")
scope = bucket.scope("my_s")

for i in df["Questions"]:
    question="my_Q"

    authorities = random.sample(range(3001, 4001), 10)
    authorities = [str(auth) for auth in authorities]


    permissions_query = ' or '.join([f'authority:"{auth}"' for auth in authorities])
    search_index = "vector-sample"

    try:
        vector = client.embeddings.create(input = [question], model="text-embedding-3-small").data[0].embedding
        request = (search.SearchRequest.create(search.MatchQuery(permissions_query)).with_vector_search(VectorSearch([VectorQuery('title_body_vector',vector)],VectorSearchOptions(vector_query_combination=VectorQueryCombination.AND))))  
        result = scope.search(search_index, request,SearchOptions(limit=5,fields=['title']))
        for row in result.rows():
            print("Found row: {}".format(row))
        print("Reported total rows: {}".format(
            result.metadata().metrics().total_rows()))
        break
    except CouchbaseException as ex:
        import traceback
        traceback.print_exc()

This is my code!
Don’t think about Question part

I’m not familiar with the SDK syntax. @jcasey / @mreiche should look into it.

But here’re certain things about our current support -

  • If you look at the documentation here - Search Request JSON Properties | Couchbase Docs , the knn (vector search) attribute is separate from the query attribute within the search request.
  • Hybrid search requests (those that contain knn and query) run in two phases under the hood and the results are currently union-ed.
  • We do intend to support intersection of the two, but in a future release.

@leolee Now about your question …

I thought that using an AND operation would yield results that satisfy both FTS and vector search functions. Is there no such feature?

We do not offer a way to specify AND between the knn and query components. There is a knn_operator component - that takes [or, and] but that is applicable to multiple knn objects (within the array). See the documentation I linked^.

Thanks to your apply!

Oops, yes, apologies for my mis-steer earlier @leolee. I misspoke when I said knn_operation affects how the FTS and query components are combined.

You can create a SINGLE FTS index with vector and scalar fields (must use keyword analyzer) and use a single N1QL statement for both conjuncts and disjuncts via N1QL syntax like below. You can also query directly via SDK/FTS.
Verify that in EXPLAIN a single request goes to FTS and all the filtering is pushed down. See blow for a sample explain.

select * from hotel use index (using FTS)
where country = 'France' and
SEARCH(titleemg,
{
    "query": {
        "match_none": {}
    },
    "knn": [{
        "field": "vec",
        "vector":[1.0, 1.2, 1.3],
        "k":3
        }]
})

Explain

{"optimizer_hints":{"hints_followed":["INDEX_FTS(hotel)"]},"plan":{"#operator":"Sequence","~children":[{"#operator":"IndexFtsSearch","bucket":"travel-sample","index":"travel-sample.inventory.searchvec","index_id":"61f85ad2dbbfd154","keyspace":"hotel","namespace":"default","scope":"inventory","search_info":{"field":"\"\"","options":"{\"index\": \"travel-sample.inventory.searchvec\"}","outname":"out","query":"{\"knn\": [{\"field\": \"vec\", \"k\": 3, \"vector\": [1, 1.2, 1.3]}], \"query\": {\"conjuncts\": [{\"field\": \"country\", \"term\": \"France\"}, {\"boost\": null, \"match_none\": {}}]}, \"score\": \"none\"}"},"using":"fts"},{"#operator":"Fetch","bucket":"travel-sample","keyspace":"hotel","namespace":"default","scope":"inventory"},{"#operator":"Parallel","~child":{"#operator":"Sequence","~children":[{"#operator":"Filter","condition":"(((`hotel`.`country`) = \"France\") and search((`hotel`.`titleemg`), {\"knn\": [{\"field\": \"vec\", \"k\": 3, \"vector\": [1, 1.2, 1.3]}], \"query\": {\"match_none\": {}}}))"},{"#operator":"InitialProject","discard_original":true,"preserve_order":true,"result_terms":[{"expr":"self","star":true}]}]}}]},"text":"select * from hotel use index (using FTS)\nwhere country = 'France' and\nSEARCH(titleemg,\n{\n    \"query\": {\n        \"match_none\": {}\n    },\n    \"knn\": [{\n        \"field\": \"vec\",\n        \"vector\":[1.0, 1.2, 1.3],\n        \"k\":3\n        }]\n})"}```

You can utilize two vectors together in your index here I assume category_vector is a one (1) dimensional vector that represents what you want to filter on. This prevents the k:3 form returning items outside of the category you are searching on.

{
  "query": {  "match_none": {} },
  "knn_operator": "and",
  "knn": [
    {   "field": "title_body_vector",
        "vector": [0,809,-0.1688,..........],
        "k": 3 
    },{ "field": "category_vector",
        "vector": [10],
        "k": 128 
    }
  ]
}

Let me know if this works for you, you may need to adjust ‘k’ for the category_vector (note is is typically large to get this to work).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.