How to aggregate an array with ARRAY_AGG and get performance

Hi Couchbase community :slight_smile:

I need help. I have already spent a long time to improve my request exec time.
I can’t get time under 9s.
Maybe my indexes are not good or my request has been not writted correctly.

SELECT ARRAY_AGG(META(categories).id)[0] AS _ID, ARRAY_AGG(META(categories).cas)[0] AS _CAS, ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(productIds.`COUNTRY`), 1)) AS productIds
FROM  `test-data` AS categories
LET visibleChildren = (
  SELECT RAW COUNT(*)
  FROM `test-data` AS childCategories USE KEYS categories.childIds 
  WHERE childCategories.origin = "COUNTRY" OR (childCategories.origin = "GENERAL" AND NOT ARRAY_CONTAINS(childCategories.deletedFor, "COUNTRY"))
)[0]
WHERE categories._class = "com.test.category.Category"
AND visibleChildren = 0
AND ANY p IN OBJECT_PAIRS(categories.productIds) SATISFIES p.name = "COUNTRY" AND ARRAY_COUNT(p.val) <> 0 END
AND ( ( categories.origin = "GENERAL" AND categories.metadata.configurations.GENERAL.enabled = true )  OR ( categories.origin = "COUNTRY" ))
AND ANY conf IN OBJECT_PAIRS(categories.metadata.configurations) SATISFIES [conf.name, conf.val.enabled] = ["COUNTRY", false] END
AND (NOT ARRAY_CONTAINS(deletedFor, "COUNTRY") OR categories.deletedFor IS NOT VALUED)

My indexes are:

CREATE INDEX `idx_1` 
ON `test-data`(origin, metadata.configurations.GENERAL.enabled, DISTINCT ARRAY [conf.name, conf.val.enabled] FOR conf IN OBJECT_PAIRS(metadata.configurations) END,  deletedFor, childIds) 
WHERE _class = "com.test.category.Category"
USING GSI;
 CREATE INDEX idx_2 ON `test-data`(DISTINCT ARRAY ARRAY_COUNT(p.val) FOR p IN OBJECT_PAIRS(productIds) END, origin)
    WHERE _class = "com.test.category.Category"
    USING GSI;

MY model is tree based one and I use couchbase version 5.1.

Thank you for your anwsers.

Please post the sample document. Also say if “COUNTRY” is changing or constant

Thank you for answering.
Here a sample document:

{
  "metadata": {
    "isRootCategory": false,
    "configurations": {
      "GENERAL": {
        "enabled": true,
        "order": 2147483647
      },
      "COUNTRY1": {
        "enabled": false,
        "order": 1
      },
      "COUNTRY2": {
        "enabled": true,
        "order": 1
      }
    },
    "name": "test1",
    "creationDate": 1521121826719
  },
  "deletedFor": ["COUNTRY1"],
  "origin": "GENERAL",
  "childIds": [
    "0bb400d9-b3e3-4b5c-96eb-0b78ed0c9617",
    "fd744fb2-89cf-4f1c-8915-3bf9e366aaa8",
    "d14cc61f-1ffd-4e72-a314-38be3cea2648",
    "361e64fd-e2e1-4ddb-8345-1688230f7e53"
  ],
  "productIds": {
    "COUNTRY1": [],
    "COUNTRY2": ["12345", "abcde"],
  },
  "ancestorIds": [
    "cf9e2142-8948-41f8-84bb-4cd1363b19a9",
    "3362507d-2798-4873-a972-9b877674d66c",
    "6c70c149-486d-4790-be3a-846b2da8f55a",
    "c8d8e5e6-b565-47ce-b493-83940298816a"
  ],
  "_class": "com.test.category.Category"
}

COUNTRY is a dictionary key for configurations and productIds so it is changing considering the request.

Thank you

what are various values of origin. How big is productIds?

Can you pick COUNTRY1 and post the request for this so that will know which is varying and which are constant

The productIds can hold more than 1000 elements per COUNTRY (1, 2, 3 …)
Origin various values are GENERAL, COUNTRY1, COUNTRY2, …, COUNTRY30 so the productIds and configurations can have as many keys than origins.

In this request, GENERAL is constant and not COUNTRY

post the query for “COUNTRY1” or say “US”.

SELECT ARRAY_AGG(META(categories).id)[0] AS _ID, ARRAY_AGG(META(categories).cas)[0] AS _CAS, ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(productIds.`US`), 1)) AS productIds
FROM  `test-data` AS categories
LET visibleChildren = (
  SELECT RAW COUNT(*)
  FROM `test-data` AS childCategories USE KEYS categories.childIds 
  WHERE childCategories.origin = "US" OR (childCategories.origin = "GENERAL" AND NOT ARRAY_CONTAINS(childCategories.deletedFor, "US"))
)[0]
WHERE categories._class = "com.test.category.Category"
AND visibleChildren = 0
AND ANY p IN OBJECT_PAIRS(categories.productIds) SATISFIES p.name = "US" AND ARRAY_COUNT(p.val) <> 0 END
AND ( ( categories.origin = "GENERAL" AND categories.metadata.configurations.`GENERAL`.enabled = true )  OR ( categories.origin = "US" ))
AND ANY conf IN OBJECT_PAIRS(categories.metadata.configurations) SATISFIES [conf.name, conf.val.enabled] = ["US", false] END
AND (NOT ARRAY_CONTAINS(deletedFor, "US") OR categories.deletedFor IS NOT VALUED

Is this right do u need [“US”, true]

No for this request it’s always [“US”, false] but for another request indeed I have [ “US”, true].

Will be able to use latest version 6.0.0

Yes in the near future but currently we need to have a solution working on 5.1

I am not sure if this will work in 5.1  but you can try it

CREATE INDEX ix1 ON `test-data`  (origin,
          DISTINCT ARRAY [p.name, ARRAY_COUNT(p.val) > 0, metadata.configurations.[p.name].enabled,
                          ARRAY_CONTAINS(IFMISSINGORNULL(deletedFor, []) ,p.name)]
                   FOR p IN OBJECT_PAIRS(productIds) END,
          metadata.configurations.`GENERAL`.enabled
         ) WHERE _class = "com.test.category.Category";

SELECT ARRAY_AGG(META(c).id)[0] AS _ID,
       ARRAY_AGG(META(c).cas)[0] AS _CAS,
       ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(c.productIds.`US`), 1)) AS productIds
FROM  `test-data`  AS c USE INDEX (ix1)
LET visibleChildren = (
  SELECT RAW COUNT(1)
  FROM `test-data` AS cc USE KEYS c.childIds
  WHERE cc.origin = "US" OR (cc.origin = "GENERAL" AND NOT ARRAY_CONTAINS(cc.deletedFor, "US"))
)[0]
WHERE c._class = "com.test.category.Category"
     AND ((c.origin = "GENERAL" AND c.metadata.configurations.`GENERAL`.enabled = true )  OR ( c.origin = "US" ))
     AND visibleChildren = 0
     AND ANY p IN OBJECT_PAIRS(c.productIds)
         SATISFIES [p.name, ARRAY_COUNT(p.val) > 0, c.metadata.configurations.[p.name].enabled,
                   ARRAY_CONTAINS(IFMISSINGORNULL(c.deletedFor, []) ,p.name)] = ["US", true, false, false] END;
1 Like

Thank you! Il will try and give you a feedback quickly.

@vsr1 Thank you a lot. It take now 5s. Any idea on how you would go to improve it.

Use request profiling Look page 341 in https://blog.couchbase.com/wp-content/uploads/2017/10/N1QL-A-Practical-Guide-2nd-Edition.pdf

The query is complex and have limited options. If 6.0.0 can think few more improvements.

Try the following if helps anything further. If you are not interested _ID, _CAS values from same document, you can use MAX or MIN instead of ARRAY_AGG() which is costly to form array if lot of rows.

CREATE INDEX ix2 ON `test-data` (origin,
          DISTINCT ARRAY [p.name, ARRAY_COUNT(p.val) > 0, metadata.configurations.[p.name].enabled,
                          ARRAY_CONTAINS(IFMISSINGORNULL(deletedFor, []) ,p.name)]
                   FOR p IN OBJECT_PAIRS(productIds) END,
          metadata.configurations.`GENERAL`.enabled, childIds
         ) WHERE _class = "com.test.category.Category";

SELECT ARRAY_AGG(META(c).id)[0] AS _ID,
       ARRAY_AGG(META(c).cas)[0] AS _CAS,
       ARRAY_DISTINCT(ARRAY_FLATTEN(ARRAY_AGG(c.productIds.`US`), 1)) AS productIds
FROM ( SELECT RAW c.id
       FROM ( SELECT META(c).id, c.childIds
              FROM  `test-data` AS c USE INDEX (ix2)
              WHERE c._class = "com.test.category.Category"
                    AND c.origin = "GENERAL" AND c.metadata.configurations.`GENERAL`.enabled = true
                   AND ANY p IN OBJECT_PAIRS(c.productIds)
                       SATISFIES [p.name, ARRAY_COUNT(p.val) > 0, c.metadata.configurations.[p.name].enabled,
                          ARRAY_CONTAINS(IFMISSINGORNULL(c.deletedFor, []) ,p.name)] = ["US", true, false, false] END
              UNION
              SELECT META(c).id, c.childIds
              FROM  `test-data` AS c USE INDEX (ix2)
              WHERE c._class = "com.test.category.Category"
                   AND c.origin = "US"
                   AND ANY p IN OBJECT_PAIRS(c.productIds)
                       SATISFIES [p.name, ARRAY_COUNT(p.val) > 0, c.metadata.configurations.[p.name].enabled,
                          ARRAY_CONTAINS(IFMISSINGORNULL(c.deletedFor, []) ,p.name)] = ["US", true, false, false] END) AS c
       LET visibleChildren = ( SELECT RAW COUNT(1)
                               FROM `test-data` AS cc USE KEYS c.childIds
                               WHERE cc.origin = "US" OR (cc.origin = "GENERAL"
                                     AND NOT ARRAY_CONTAINS(cc.deletedFor, "US"))
                            )[0]
       WHERE visibleChildren == 0 ) AS ckeys
JOIN `test-data` AS c ON KEYS ckeys;
1 Like

Excellent @vsr1. Now the execTime is around 2s . Thank you a lot !