In a multi node cluster configured with 1 replica,my understanding is that if a master node crashes, a specific key will lose the data if it hasn’t been replicated to replication node nor persisted to disk locally.
Consider following scenario.
Instead of Master node, if the replication node is crashed before applying the replica.
My undrstanding is that as the master node didnt crashed, the data will get persisted to disk.
What will happen to the replica data in following cases:
(i) If the crashed replication node is brought back before a failover is triggerred; will the data gets replicated? Till this time, will the replica data waits in the replication queue of Master node? If so, is there any threshold for the data to be preserved in the replication queue?
(ii) Consider before the replication node crashed, the replica data got transferred from the replication queue of Master. Just before it got applied to the replication node, the node crashed. In this scenario, there is no data in the replication queue to apply to replication node when it comes back. When the replication node is back and running(before failover), it will be part of the cluster. But how we will come to know that replica of the specific data is not present in the cluster? Will couchbase discovers automatically that replica for the master data is not present in the cluster?
Yes - once the replica that crashed is back online, it will catch up from where it left off and catch the master. there is no replication queue that serves this purpose - so if a replica is down, master carries on without a hiccup. there is no major overhead of a down replica on the master. What we do is, something called a backfill that will ensure we rewind a send more data that may not be actively in memory.
There won’t be an issue here. DCP is an ordered protocol and so when the replica wakes up, it will communicate the last seqno it has received and master will replicate data from that point on. the seqno will not be updated if replica does not receive the data or the vbucket cannot process the replicated data etc.