Hi,
I am a little bit perplex about the execution time of queries like SELECT META(myBucket
).id FROM myBucket
WHERE META(myBucket
).id LIKE “prefix:%”…
The execution time of this query is exploding the more documents there are in the bucket, like it was doing a full scan of the bucket instead of using the primary index, as I would expect.
My test goes as follows:
- Create 500 documents with the id being prefix:
- Doing the select as above, it executes in average around 800ms on my local 1 node vagrant cluster
- Now I had 100000 documents whose prefix is not shared. id being anotherPrefix:
- The query above takes 12s on average to be executed
I would expect the query to be executed at around the same amount of time, at least, it is my understanding of how a primary index on document id should work and allow to quickly find the right ids by pattern.
The explain on the query shows it uses the primary index:
[{"~children":[{“namespace”:“default”,“keyspace”:“myBucket”,“index”:“primaryindex”,"#operator":“PrimaryScan”,“using”:“gsi”},{"~child":{"~children":[{“namespace”:“default”,“keyspace”:“myBucket”,"#operator":“Fetch”},{"#operator":“Filter”,“condition”:"((meta(myBucket
).id
) like “prefix:%”)"},{"#operator":“InitialProject”,“result_terms”:[{“expr”:"(meta(myBucket
).id
)"}]},{"#operator":“FinalProject”}],"#operator":“Sequence”},"#operator":“Parallel”}],"#operator":“Sequence”}]
Am i doing something completely wrong?
Interestingly, I also added the document id to my document and created a secondary index on it. In this case, the query SELECT id FROM myBucket
WHERE id LIKE “prefix:%” only take 200ms on average.
The explain is similar to the primary one:
[{"~children":[{“keyspace”:“myBucket”,“spans”:[{“Range”:{“High”:["“prefix;”"],“Low”:["“prefix:”"],“Inclusion”:1},“Seek”:null}],"#operator":“IndexScan”,“using”:“gsi”,“limit”:9.223372036854776E18,“namespace”:“default”,“index”:“secondaryIndexOnId”},{"~child":{"~children":[{“namespace”:“default”,“keyspace”:“myBucket”,"#operator":“Fetch”},{"#operator":“Filter”,“condition”:"((myBucket
.id
) like “prefix:%”)"},{"#operator":“InitialProject”,“result_terms”:[{“expr”:"(myBucket
.id
)"}]},{"#operator":“FinalProject”}],"#operator":“Sequence”},"#operator":“Parallel”}],"#operator":“Sequence”}]
The main difference being the range scan added to the index scan, making the query running in constant time of 200ms, with or without the extra 100000 documents…
Why is the range query not added when using the META().id query style?
Many thanks for your help.