Couchbase cluster reports frequent nodes down/up

We have a problem with a 3 node clusters running couchbase community 4.1.1.
A second such cluster running the same configuration does not exhibit this issue.

The cluster is regularly reporting nodes losing contact with each other, then in the same second, reporting that they are up again. This happens over and over, probably between every 2 to 6 minutes.
The third node of the cluster isn’t mentioned in the logs.
It’s suspect that the recovery is immediately after the lost connectivity.
Is there a way to determine if this is some underlying network issue or some problem with Couchbase clustering?

I had to increase the auto failover period to 90 seconds, as trying 60 or less just caused the cluster to get in a mess due to this problem. We want to run this cluster with much shorter failover period (e.g. 30 seconds) so this is a nuisance.

This is a copy from the admin web page logs showing 2 incidents of the problem I refer to:

Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverA.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverB.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
connection_closed}] ns_node_disco005 ns_1@serverA.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
net_tick_timeout}] ns_node_disco005 ns_1@serverB.nyk.mycompany.com 04:42:28 - Tue Feb 21, 2017

Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverA.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ came up. Tags: [] ns_node_disco004 ns_1@serverB.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017
Node ‘ns_1@serverA.nyk.mycompany.com’ saw that node ‘ns_1@serverB.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
connection_closed}] ns_node_disco005 ns_1@serverA.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017
Node ‘ns_1@serverB.nyk.mycompany.com’ saw that node ‘ns_1@serverA.nyk.mycompany.com’ went down. Details: [{nodedown_reason,
net_tick_timeout}] ns_node_disco005 ns_1@serverB.nyk.mycompany.com 04:36:43 - Tue Feb 21, 2017

Couchbase clustering logic has to be really bad, or really broken to be given a hugely generous 60 seconds timeout window and to still ALWAYS be failing over when the underlying servers are running with no issues.
What is wrong with it?

Hi Leon,
Sorry you didn’t get a response earlier. What you’re seeing isn’t normal. You might check to see if a service on serverB is crashing and restarting. It’s also possible that one or more of your servers are overloaded and being slow to respond. It’s also possible that there’s something going wrong in your network config, but I would start by looking carefully at the logs on serverB.
-Will