Couchbase query nodes keep crashing

fabiosst · March 8, 2024, 8:16pm

Hi all,

we have a cluster in MDS with the following topology:

Data service nodes
Analytics service nodes
Index/Query services nodes

We are experiencing an every day failover of the Index/Query nodes with the warnings of Critical Memory consumption.
We suspect that the main problem is the Query service (since we already had experience of query nodes consuming too much memory and failing over).

The memory assigned to the services respect the threshold of the ASK node.
But the memory keeps going over the 90%.

Is there a best practice in the MDS architecture to avoid this type of disruption? Or some limit to the Query service?

Do you suspect any other issue, instead of the Query service?

mreiche · March 8, 2024, 8:22pm

Do you have any indexes that you don’t need? Do you have any primary indexes?

fabiosst · March 8, 2024, 8:32pm

Yeah it’s a production environment with 3 index/query nodes.

Indexes are in the standard storage mode (Plasma) and about 100GiB is reserved to the index service for each node.
We have hundreds of GSI indexes, some of them with 1 replica.
And tipically a primary index for each bucket (about 28) + primary indexes for specific collections.

Do you suspect the problem resides in the indexes?

mreiche · March 8, 2024, 8:51pm

yes - from what you stated.

We are experiencing an every day failover of the Index/Query nodes with the warnings of Critical Memory consumption.

Please read What is the Couchbase Primary Index? Learn Primary Uses

It could also be the execution of queries. But generally issues would first appear from the callers of those queries.

vsr1 · March 8, 2024, 9:39pm

If EE try which queries causing memory consumption by checking system:completed_requests
(higher requestTime, high fetch cout, indexscan count) and see if you can optimize those queries

Topic		Replies	Views
MDS - setting memory limit for Index and Data Services Couchbase Server	1	1751	April 28, 2016
High CPU usage on query services Couchbase Server	2	2282	April 26, 2016
Service 'query' exited with status 134. Restarting. Messages: Couchbase Server query , n1ql , server	4	1454	September 19, 2018
Strange behavior in clustered environment while creating new index SQL++ n1ql , server , index	5	1117	June 6, 2018
Couchbase 4.0 nodes keep FAILING Couchbase Server	2	1582	February 11, 2016

Couchbase query nodes keep crashing

Related topics