Hi again,
we are trying to achieve a covered index on nested objects that comply with certain conditions in several levels of the hierarchy. I attach the code below the reproduce our issue.
The SELECT will be covered if we remove WHEN var2.cId = "c3" from the index creation, and query without AND var2.cId = "c3", but including the condition on the second layer of nested objects seems to break the index covering. Is there a way of doing this without increasing the size of the index?
Thank you for your time.
Regards,
iago
INSERT INTO `test` ( KEY, VALUE )
VALUES(
"id1A",
{
"type":"TYPEA",
"subId":"id1",
"valueA": [
{
"cId" : "c1",
"valueB": [
{
"cId":"c3",
"valueC": "testValueA"
}
]
},
{
"cId" : "c2",
"valueB": [
{
"cId":"c4",
"valueC": "testValueB"
}
]
}
]
}
);
CREATE PRIMARY INDEX `PRIMARY_INDEX` ON `test`
CREATE INDEX `TEST_INDEX_VALUEB` ON `test` (
ALL (ARRAY (
ALL (ARRAY var2.`valueC` FOR var2 IN var1.`valueB`
WHEN `var2`.`cId` = "c3" END))
FOR `var1` IN `valueA`
WHEN `var1`.`cId` = "c1" END))
WHERE `type` = "TYPEA"
SELECT var2.valueC
FROM test AS a JOIN test AS b ON a.subId = b.subId
UNNEST a.valueA var1
UNNEST var1.valueB var2
WHERE a.type = "TYPEA"
AND b.type = "TYPEB"
AND var1.cId = "c1"
AND var2.cId = "c3"
AND var2.valueC = "testValueA"
LIMIT 1
Thank you for the quick reply. I thought MB-33485 was something else. In that issue, the engine detects the covering index, but doesn’t return the value. Here, it is the other way around, the covering index is not detected, but the value is returned. I just wanted to clarify, in case my wording on the issue was confusing. I will wait then for the issue to be fixed, any chances it will be for 6.0.2? any ETA you can share?
Covering is optimization you can use the following index.
CREATE INDEX `TEST_INDEX_VALUEB` ON `test` (
ALL (ARRAY (
ALL (ARRAY var2.`valueC` FOR var2 IN var1.`valueB`
WHEN `var2`.`cId` = "c3" END))
FOR `var1` IN `valueA`
WHEN `var1`.`cId` = "c1" END), valueA)
WHERE `type` = "TYPEA"
Hi again,
yes, I gathered this from your previous solution. Please bear in mind that the examples I give are only to explain our issue, but they do not resemble the real data we need to work with. We are assessing couchbase for usage on data with many levels of nested objects, and an object like valueA can be quite large in a real scenario. My guess is that this approach will make our indexes quite large, having a big toll on the load times and memory consumption. We are generating indexes automatically from ad-hoc data structures, which could potentially mean several thousand indexes in the database. Please, correct me if I’m wrong making this assumption.
Thank you again for your time.