Hi I am running a 4 node cluster of Couchbase Version 3.0.1.
Sometimes I get 1 or 2 out of the nodes failing.
I see in dmesg that memcached was killed due to OOM, for example:
[9601142.036601] Out of memory: Kill process 16358 (memcached) score 575 or sacrifice child
[9601142.041487] Killed process 16358 (memcached) total-vm:9115908kB, anon-rss:8847792kB, file-rss:0kB
- Do you know the reason?
- How can this be fixed?
I should add that each node (EC2) in the cluster has 15GB and 10GB is allocated for the only bucket.
Based on what you posted, the bucket is staying inside the 10GB quota you’ve assigned to it. The likely reason is you have other stuff running on this system that is driving it to OOM, then the OOM killer is just looking for the process with the largest heap and killing it.
Best to fix it by finding the process that is going outside the expected range.
By the way, if you find another process running as user ‘couchbase’ that is the cause (which is possible), it’d be best to upgrade to a much newer release. 3.0.1 is quite dated now.