Handling durability requirements failure in Couchbase

Recently I’ve started investigating Couchbase Server(for the first time) as a candidate for the project. A particular scenario I’m looking at right now is to how to make Couchbase acting as a reliable “source of truth” which is why I’m digging into the durability aspect.

So here is a snippet from ACID Properties and Couchbase:

If the durability requirements fail , then Couchbase may still save the document and eventually distribute it across the cluster. All we know is that it didn’t succeed as far as the SDK knows. You can choose to act on this information to introduce more ACID properties into your application.

So imagine next. I insert/update a document and primary node fails until data made it to any replica. Let’s say primary is gone for a long time. Now, I don’t know at this point whether data was written to disk… So a scary part here is that “Couchbase may still save the document and eventually distribute it across the cluster”. Meaning, as far as client can tell, the data didn’t make it, so a user would see an error, but then all of a sudden it may appear in the system if primary goes back online.

Am I reading this statement correctly? If I am, what’s the best practice to handle it with Couchbase?

This was discussed on StackOverflow: https://stackoverflow.com/questions/52648356/handling-durability-requirements-failure-in-couchbase

The challenge in answering this one is that it depends on the actions you take operationally. The system is flexible here and, in effect, how you handle it is partially in your control.

One scenario could be that after it has been acknowledged as persisted, but before it’s been replicated, the node fails. In that case, if you do not failover the node, when it comes back online, that item will be replicated.

One other scenario is that you could have autofailover enabled and after it’s received by the primary but before it’s replicated or persisted, autofailover kicks in and brings a replica to primary. In this case, your application will have seen the failure to achieve the durability requirement requested. If the previous primary does come back online, before it rejoins the cluster it will resync with the state of the current cluster meaning the location where the item is active is now the current state.

There isn’t a single best practice, arguably, as there isn’t a single “what do I do in this situation?” answer. For many of our users, they prioritize availability over higher levels of durability in the face of failure. That means they turn on autofailover, allowing applications which have more context to manage what to do in particular data mutation cases. For some other users, they prefer to not use auto-failover and have a human involved in deciding what to do next. For yet others, they may have some longer unavailability while they attempt to recover a node.

I hope that helps. Is there is a specific goal you have with this update? We can maybe give some options in that case.

2 Likes

Yeah, I know, as I was the one who kicked it off :smiley:

Nice explanation! Thank you!

So please help me to understand the second scenario with the autofailover. When the former primary gets back online with locally persisted but not replicated items and starts resyncing, would these items get wiped off or something?

As for the goal I have, it’s just about a general investigation on how Couchbase deals with node failures.

Yes, and that’s really intentional. You could look at those previously persisted items as an “alternate history” that didn’t play out. When the failure occurred, the cluster picked a new starting place, got everyone to agree, and started the universe moving forward from there. When the old node recovers and tries to join this universe, it has to do so with a shared understanding of that universe, which potentially means dropping data that wasn’t passed along.

Of course, in practice, since replication is memory-to-memory and Disk IO tends to be higher latency (the replication of an item and the persistence of items are both scheduled concurrently), things will tend to be replicated more than persisted, but there is no guarantee. Also, the app (through the SDK) has some ability to influence outcomes too with the Durability Requirements features we were talking about.

There are some other things we’re thinking about adding in the future (can share more on it later), but I’d expect that’ll just increase flexibility over what we’re talking about here. If that fits into your “investigation”, we can certainly chat directly.

1 Like

Thank you so much! It’s really awesome and very good news! I’ll share details you’ve provided on StackOverflow.

Thanks for the clarification but I still feel the original requirement of durability is not met similar to other technologies.

The call to couchbase clearly says what is the durability requirement.
Insert document, persist to disk, replicate to replica and persist to replica disk.
The actual expectation is that you wait on the call so that these requirements are met. i.e you would persist to disk and replicate to replica before you say that insert is success.
I still don’t like the two options it should have been
Persist to disk, Persist to Memory
If you give persist to disk the persist to memory should be default
If its only persist to memory then the persist to disk should be asynchronous like currently.

So if you say persist to disk (all nodes) it includes the replicas and the memory
if we say persist to memory (all nodes) it should include the replicas

The customer knows that this would be slower than the default operation and they are taking this hit knowingly. So why is the API not honoring the call. In this case the call should come back saying that the document couldn’t be inserted and roll back the insert.

So the concept that one node was written and the other node was not doesn’t arise during this call. You would do the persist to node1 then persist to node 2 and before sending status you again check both nodes are active before returning the result. If any nodes goes down before you send error and rollback it another node. Basically a write ahead log concept or similar.
The similar logic is available in Kafka which gives this option of making the record to be in sync with all replica before returning success. The caller has to wait asynchronously for all the brokers to acknowledge that it received the record and written to. If the required durability is not met the record is removed in all brokers.

The whole point is you should make sure record is available in all nodes before returning success and after that if the nodes go down the usual mechanism kick in.

Let us know what should be done currently to get this pattern. i.e when we insert we want to wait till all the nodes are in sync and persisted before returning success. If not return error so that we can insert the records again after a retry timeout or can return to the user that the record was not persisted.