Some of these are missing “key1”. I want to query for all documents which are not missing “key1”, and then return the document in the original structure but with “key2”…“keyN” omitted. It needs to go into a Java object in the end, so specifically I want to return this:
I am trying to find a good way to query this. The following seems to be the obvious choice:
SELECT meta().id as id, docType, {"key1": primaryObject.`key1`} as primaryObject
FROM BUCKET_NAME
WHERE docType = "MY_TYPE" AND primaryObject.`key1` IS NOT MISSING
LIMIT [some integer]
However this runs very slowly. Even for a limit of 10, sometimes it just hits my 75 second timeout and fails. Is there a better way to run this query? As far as indices, currently I have an index on docType and nothing else.
EDIT: “key1” could be one of ~200 different values, and I need to have the ability to query using any one of them.
CREATE INDEX ix1 ON BUCKET_NAME(primaryObject.`key1`) WHERE docType = "MY_TYPE";
SELECT meta().id as id, docType, {"key1": primaryObject.`key1`} as primaryObject
FROM BUCKET_NAME USE INDEX (ix1)
WHERE docType = "MY_TYPE" AND primaryObject.`key1` IS NOT MISSING
LIMIT [some integer]
Great thank you. As a followup, is there a way to make this variable? I.e. I need to be able to run this arbitrarily on “key2” or “keyN” as well. I believe that I could make the 200 or so indices needed to keep the approach you’ve given, but I think my team will not be happy about it.
In that case use following. It must fetch those documents.
CREATE INDEX ix2 ON BUCKET_NAME(DISTINCT OBJECT_NAMES(primaryObject) ) WHERE docType = "MY_TYPE";
Set query parameters $k to the "keyN" or replace with value
SELECT meta().id as id, docType, {$k: primaryObject.[$k]} as primaryObject
FROM BUCKET_NAME
WHERE docType = "MY_TYPE" AND ANY v IN OBJECT_NAMES(primaryObject) SATISIFIES v = $k END
LIMIT [some integer]
Thanks a lot, that gets me under the timeout in all relevant cases. It takes 1 minute to get 500 of 40000 total results from 5 million docs, and for a small number I can get results in 10-20 seconds. I’m wondering now though if it’s worth modifying the document structure somehow. Is there anything I could do on that side? The keys are arbitrary strings which I cannot modify. Otherwise maybe I will cache the results for each key.
CREATE INDEX ix2 ON BUCKET_NAME(ALL OBJECT_NAMES(primaryObject) ) WHERE docType = "MY_TYPE";
Set query parameters $k to the "keyN" or replace with value
SELECT RAW meta(b).id
FROM BUCKET_NAME AS b
UNNEST OBJECT_NAMES(b) AS ok
WHERE b.docType = "MY_TYPE" AND ok =$k
LIMIT [some integer]
Use above query get the document keys (make sure it covered), then use SDK subdoc API get the relevant part of the document reactively.
Why is it that if I add another AND condition to my WHERE clause (AND meta().id LIKE [pattern]), suddenly the time skyrockets again? Each query on its own runs quickly.
Ok thanks I’ll read the source you gave. In the end will I need to drop the “meta().id LIKE” if I want to keep the performance up, or is it just a matter of adding another index? By the way, the new index you recommended is a dramatic performance improvement over the first one, so thanks again for that.
I need to do both We should probably go back and just add some data fields but right now I’m stuck searching the document Id’s, which are like “ATTRIBUTE1::ATTRIBUTE2”.