Cbq_engine consumes all the memory available and crashes eventually

sparky · August 11, 2019, 5:13am

Following is a simple query that expected to return aggregated results from an array within the documents based on a few predicates.
SELECT ARRAY_AGG({items.sessionid,items.text,items.topic,items.workflow, items.workflownodeid,items.sentiment, items.score,items.topmatches,items.sendername,items.sendertype,items.senderemail,items.timestamp,items.ts}) as items FROM `orders` AS r UNNEST r.items AS items WHERE r._type = "orders" AND r.storeid = 147 AND items.ts >= "1552238267" AND items.ts <= "1564804800" GROUP BY r.agentid

I have around 35K documents in the db of which only 1500 documents satisfy this requirement. The bucket ram usage is around 348MB and disk usage is 95MB.

The document structure is:

      "order": {
          "_id": "o.12345.12",
          "_type": "orders",
          "storeid": 1234,
          "agentid": "123c345",
          "items" : [
              {
                  "id" : "1234",
                  "customer" : "John Doe",
                  "customerid" : 1234,
                  "itemid": 12344,
                  "itemdesc" : "Coffee table",
                  "ts": "1559721510883",
                  "category": "furniture"
              },
              {
                  "id" : "1235",
                  "customer" : "John Doe",
                  "customerid" : 1234,
                  "itemid": 12345,
                  "itemdesc" : "Desk lamp",
                  "ts": "1559721510900",
                  "category": "Lights and shades"

              }
          ]
      }
  }

The query is expected to return an aggregated document in the form.

[
  {
      "items": [
        {
            "id" : "1234",
            "customer" : "John Doe",
            "customerid" : 1234,
            "itemid": 12344,
            "itemdesc" : "Coffee table",
            "ts": "1559721510883",
            "category": "furniture"
        },
        {
            "id" : "1235",
            "customer" : "John Doe",
            "customerid" : 1234,
            "itemid": 12345,
            "itemdesc" : "Desk lamp",
            "ts": "1559721510900",
            "category": "Lights and shades"

        }
      ]
  }  
]

As soon as the query is issued (testing with Couchbase console now), cbq-engine takes all server resources and crashes soon.
cbq-engine-memory

This is a two node cluster and both nodes are having data, index, query and search services configured. Server is community edition 6.0.0 build 1693. I’ve also created indexes on _type, _storeid and also an array index on ts along with _type and _storeid. I can see that these indexes are used in explain.

I’ve data for last 6 months in the db. Query returns with results in reasonable time if time duration in the query is small (for eg. 3-10 days). This happens mostly for large duration. ie a couple of months or more.

Is this an expected behavior from the db? Looking at the number of documents, I’m sure that it shouldn’t take more than a couple of GB even if the server has to remap entire data in memory. Is there a way I can optimise the query (sorry… i’m a novice in N1QL and couchbase). Appreciate some help here as our roll out is delayed due to this issue.

vsr1 · August 11, 2019, 4:06pm

Query using Unnest and aggregation and may required memory.

You can also ty the following

CREATE INDEX ix1 ON `orders`( storeid, DISTINCT ARRAY item.ts FOR item IN items END) WHERE _type = "orders";

SELECT ARRAY_FLATTEN(aitems,2) AS items, r.agentid
FROM `orders` AS r
WHERE r._type = "orders"
      AND r.storeid = 147
      AND ANY item IN r.items SATISFIES item.ts BETWEEN "1552238267" AND  "1564804800" END
GROUP BY r.agentid
LETTING aitems = ARRAY_AGG (ARRAY {item.sessionid, item.text, item.topic, item.workflow,
                                   item.workflownodeid, item.sentiment, item.score,
                                   item.topmatches, item.sendername, item.sendertype,
                                   item.senderemail, item.timestamp, item.ts}
                            FOR item IN r.items WHEN item.ts BETWEEN "1552238267" AND  "1564804800" END);

sparky · August 11, 2019, 5:42pm

Thank you. Memory usage reduced by half (3.6GB for the same data set) Couchbase doesn’t crash now and I do get the search result (yet to verify the result set). The query took 1 minute. I noticed the query is not using the index created, instead using another index that uses (type, agentid). I’ll try dropping that index and rerunning the query. Any other suggestions to improve performance? I feel 1 minute is too long for such a small dataset.

vsr1 · August 11, 2019, 10:50pm

The query and index should be optimal. If not using that index try with USE INDEX option (FROM orders AS r USE INDEX (ix1)).
If you want whole object of array ARRAY_AGG(ARRAY item FOR item IN r.items WHEN …)

Checkout Optimize N1QL Performance Using Request Profiling Page 341 of https://blog.couchbase.com/wp-content/uploads/2017/10/N1QL-A-Practical-Guide-2nd-Edition.pdf

sparky · August 12, 2019, 10:02am

Thanks for your help. Though it takes time, I don’t see the memory build up beyond certain limit now.

Topic		Replies	Views
How to correctly create new documents with group by Couchbase Server n1ql	2	582	February 12, 2019
N1ql query with unnest array_concat against large dataset consumes "all" memory SQL++ query , n1ql , index	8	655	June 15, 2023
Rather fast subqueries make original query really slow even with empty data sets SQL++	9	3141	June 1, 2016
How to aggregate an array with ARRAY_AGG and get performance SQL++ query , java , n1ql , index	15	2929	June 28, 2019
Cbq-engine consuming massive memory for simple queries Couchbase Server	14	4522	January 24, 2017

Cbq_engine consumes all the memory available and crashes eventually

Related topics