I saw this page [page] Example: Running a Simple Vector Similarity Query and I read The Search Service combines the Vector search results from a knn
object with the traditional query
object by using an OR
function. If the same documents match the knn
and query
objects, the Search Service ranks those documents higher in search results
docs say knn object and query object using an ‘OR’ , I want to know can I change it ‘AND’ ? ‘query’ Object ‘AND’ ‘knn’ Object
Currently you can not do this in the basic JSON syntax. However you can write a hybrid SQL++ vector search like the following:
SELECT color, verbs, brightness
FROM `vector-sample`.color.rgb AS t1
WHERE
brightness < 20
AND SEARCH(t1, {
"query": { "match_none": {} },
"knn": [{
"field": "colorvect_l2",
"vector": [0.0, 0.0, 128.0],
"k": 3 }]}
)
Alternatively you could create a 1D vector enum_vect to represent a category (or a department or an org_id) and add do something like:
SELECT id
FROM `vector-sample`.color.rgb AS t1
WHERE
brightness < 20
AND
enum_vect[0] = 27
AND
SEARCH(t1, {
"query": { "match_none": {} },
"knn_operator": "and",
"knn": [
{ "field": "colorvect_l2",
"vector": [0.0, 0.0, 127.7],
"k": 3
},{ "field": "enum_vect",
"vector": [10],
"k": 30
}
]
})
Note that the 1D enum_vect we use a high k to get more matches. and we also use the the use of ( “knn_operator”: “and” ) to AND our vectors.
Next we use the SQL++ ( AND enum_vect[0] = 27 ) to ensure we only return the category of 27 to ensure we don’t have a leakage of a “near” category into our results because the vector side of enum_vect is still approximant.
Furthermore remember every time you use “knn” you are always doing approximate search and based on the completion order of the scatter gather operations especially with sorting the same value in the vector search you might get different results and a different number of items back. SO the intersections between the vectors might differ between runs of the same query. Yes I know this is a bit weird.
This sort of pre-filter hack works best when using dot_product to index all vectors yes both colorvect_l2 enum_vect else you can get some very large scores that do not sort.
SELECT id
FROM `vector-sample`.color.rgb AS t1
WHERE
t1.brightness < 20
AND
t1.enum_vect[0] = 10
SEARCH(t1, {
"query": { "match_none": {} },
"knn_operator": "and",
"knn": [
{ "field": "colorvect_l2",
"vector": [0.0, 0.0, 127.0001],
"k": 3
},{ "field": "enum_vect",
"vector": [10.0001],
"k": 300
}
]
})
In all cases we add a small number to avoid a perfect vector match which makes the scores so large they don’t sort. But we don’t do this in the SQL++ part
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.