Couchbase query nodes keep crashing

Hi all,

we have a cluster in MDS with the following topology:

  • Data service nodes
  • Analytics service nodes
  • Index/Query services nodes

We are experiencing an every day failover of the Index/Query nodes with the warnings of Critical Memory consumption.
We suspect that the main problem is the Query service (since we already had experience of query nodes consuming too much memory and failing over).

The memory assigned to the services respect the threshold of the ASK node.
But the memory keeps going over the 90%.

Is there a best practice in the MDS architecture to avoid this type of disruption? Or some limit to the Query service?

Do you suspect any other issue, instead of the Query service?

Do you have any indexes that you don’t need? Do you have any primary indexes?

Yeah it’s a production environment with 3 index/query nodes.

Indexes are in the standard storage mode (Plasma) and about 100GiB is reserved to the index service for each node.
We have hundreds of GSI indexes, some of them with 1 replica.
And tipically a primary index for each bucket (about 28) + primary indexes for specific collections.

Do you suspect the problem resides in the indexes?

yes - from what you stated.

We are experiencing an every day failover of the Index/Query nodes with the warnings of Critical Memory consumption.

Please read What is the Couchbase Primary Index? Learn Primary Uses

It could also be the execution of queries. But generally issues would first appear from the callers of those queries.

If EE try which queries causing memory consumption by checking system:completed_requests
(higher requestTime, high fetch cout, indexscan count) and see if you can optimize those queries