We are doing some test with multiple nodes configuration and we are not able to understand the implemented logic.
In a 2 nodes configuration with 1 bucket and 1 replica we can see per node we have the total number of items/document in each node (active + replica). It is interested thought it is not split in a 50/50% configuration in some situations, but fair.
In this configuration if we kill one node (systemctl stop couchbase-server.service), no operation can be done to the other still alive node.
all of them return error.
Why the query (using the GUI) trying to connect to the dead node?
For the 3 nodes configuration we have a similar issue. if we kill one node, the cluster doesn’t response successfully to any operation (query, insert, etc) - same error are above, until a fail-over happens of the dead node.
Is this really the expected behaviour? We know we can configure the autofail over to 5 sec for a cluster with more than 2 node (with 2 seems no failover is performed). But still, 5 seconds of a non-responsive DB sounds a problem for us.
For a 2 nodes cluster, it seems only manual intervention to be able to using the DB again… doing a manual failover of the dead node. It is odd though when we try to do that a prompt tells us we are going to lose data!!.. Why? the other node should have all the vBuckets (Active + Replica).
Doing some other test we see the statement of the replicas can be used to READ when one node is down is not happening.
In a 10 document bucket within a 3 nodes with 1 replica, if one node goes down, some of the documents are not retrievable using key. In our test, 2 of 10 returned “Internal Server Error” using GUI Document menu. SQL Query always fails.: In this scenario
Only when failover is executed all doc are accessible. Why not allow to read from replica? I can understand preventing writting…
Waiting for the failover is a real problem for High Availability. Set it to the min failover time will drop node all the time and using a high value will make the cluster almost completely unavailable.
Writing is only available for the vBucket active in the other nodes, writting a key with a vBucket in the node down return error. We understand the hashing for the vBucket, but if a node goes down… the option to write to other node should be possible. After the node is back or failover, it can rebalance de vBuckets with the new Inserts…
Summarising: it seems when a node goes down, the cluster is mostly not usable.
Hi willthrom. Were you able to find answers to your questions?
I recently faced the exact same issue with 2 nodes during our failover tests on Couchbase 6.5. We also tried with 3 nodes.
What is the point of replication if data can be queried from the working node.
We receive “Error performing bulk get operation - cause: unable to complete action after 6 attemps” when SDK query the working node until the auto-failover.
How this can be avoided?