We have a problem with a 3 node clusters running couchbase community 4.1.1.
A second such cluster running the same configuration does not exhibit this issue.
The cluster is regularly reporting nodes losing contact with each other, then in the same second, reporting that they are up again. This happens over and over, probably between every 2 to 6 minutes.
The third node of the cluster isn’t mentioned in the logs.
It’s suspect that the recovery is immediately after the lost connectivity.
Is there a way to determine if this is some underlying network issue or some problem with Couchbase clustering?
I had to increase the auto failover period to 90 seconds, as trying 60 or less just caused the cluster to get in a mess due to this problem. We want to run this cluster with much shorter failover period (e.g. 30 seconds) so this is a nuisance.
This is a copy from the admin web page logs showing 2 incidents of the problem I refer to:
Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverA.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverB.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
connection_closed}] ns_node_disco005 ns_1@serverA.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
net_tick_timeout}] ns_node_disco005 ns_1@serverB.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverA.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverB.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017
Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
connection_closed}] ns_node_disco005 ns_1@serverA.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
net_tick_timeout}] ns_node_disco005 ns_1@serverB.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017