How can I combine object and object?

I saw this page [page] Example: Running a Simple Vector Similarity Query and I read The Search Service combines the Vector search results from a knn object with the traditional query object by using an OR function. If the same documents match the knn and query objects, the Search Service ranks those documents higher in search results
docs say knn object and query object using an ‘OR’ , I want to know can I change it ‘AND’ ? ‘query’ Object ‘AND’ ‘knn’ Object

Currently you can not do this in the basic JSON syntax. However you can write a hybrid SQL++ vector search like the following:

SELECT color, verbs, brightness
FROM `vector-sample`.color.rgb AS t1
WHERE 
    brightness < 20
AND SEARCH(t1, {
  "query": {  "match_none": {} },
  "knn": [{
    "field": "colorvect_l2",
    "vector": [0.0, 0.0, 128.0],
    "k": 3 }]}
)

Alternatively you could create a 1D vector enum_vect to represent a category (or a department or an org_id) and add do something like:

SELECT id
FROM `vector-sample`.color.rgb AS t1
WHERE 
  brightness < 20 
AND 
  enum_vect[0] = 27
AND 
SEARCH(t1, {
  "query": {  "match_none": {} },
  "knn_operator": "and",
  "knn": [
    {   "field": "colorvect_l2",
        "vector": [0.0, 0.0, 127.7],
        "k": 3 
    },{ "field": "enum_vect",
        "vector": [10],
        "k": 30 
    }
  ]
})

Note that the 1D enum_vect we use a high k to get more matches. and we also use the the use of ( “knn_operator”: “and” ) to AND our vectors.

Next we use the SQL++ ( AND enum_vect[0] = 27 ) to ensure we only return the category of 27 to ensure we don’t have a leakage of a “near” category into our results because the vector side of enum_vect is still approximant.

Furthermore remember every time you use “knn” you are always doing approximate search and based on the completion order of the scatter gather operations especially with sorting the same value in the vector search you might get different results and a different number of items back. SO the intersections between the vectors might differ between runs of the same query. Yes I know this is a bit weird.

2 Likes

This sort of pre-filter hack works best when using dot_product to index all vectors yes both colorvect_l2 enum_vect else you can get some very large scores that do not sort.

SELECT id
FROM `vector-sample`.color.rgb AS t1
WHERE 
  t1.brightness < 20 
AND 
  t1.enum_vect[0] = 10
SEARCH(t1, {
  "query": {  "match_none": {} },
  "knn_operator": "and",
  "knn": [
    {   "field": "colorvect_l2",
        "vector": [0.0, 0.0, 127.0001],
        "k": 3 
    },{ "field": "enum_vect",
        "vector": [10.0001],
        "k": 300
    }
  ]
})

In all cases we add a small number to avoid a perfect vector match which makes the scores so large they don’t sort. But we don’t do this in the SQL++ part

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.