Implementing transactions in Couchbase

We are using transactions for updating documents in Couchbase. The logic to update the document is something like this:

  1. fetch the document
  2. update it with the required data
  3. create an anonymous function to update the document in Couchbase
  4. call the transaction.run method to execute the anonymous function (inside this, we only update the document. There is no document fetch operation.)
  5. commit the transaction / in case of an exception roll it back. (this is done inside the anonymous function)

We are observing that inspite of sending the CAS value during the update operation, the document is being overwritten by a parallel update operation.

I wanted to know if we need to fetch the document (done in step 1 mentioned above) inside the transaction, or can it be done outside of the anonymous function to execute the transaction (before we start executing the anonymous function)? The following link shows an example of the fetch being done inside the transaction : https://docs.couchbase.com/java-sdk/current/howtos/distributed-acid-transactions-from-the-sdk.html#replace

PS: We are using optimistic lock for updating documents.

The fetch and the update must be done inside the anonymous function called in the transaction framework. - just like in the example. All other updates must be done similarly - with a fetch and update done inside an anonymous function called in the transaction framework.

Yes @mreiche is correct, and please see the docs for the reason why (essentially we need to be able to rollback and retry the full transaction to handle some failure scenarios, so we need all operations and business logic to be inside the lambda).

1 Like

Thanks for replying.
We do have a retry wherein once we receive a CasMismatchException from couchbase, we fetch the document once again and then again restart the transaction. I believe this should be sufficient, although it might be a little inefficient, because we are not relying on the couchbase SDK itself to retry, but rather, we are doing it from the application.

My deeper question is related to the understanding the mechanics of how conflicting updates of a document are handled:

  1. Is the CAS value of a document checked to detect conflicting transactions?
  2. What is the significance of the ctx.get(collection, "doc-id"); as mentioned here https://docs.couchbase.com/java-sdk/current/howtos/distributed-acid-transactions-from-the-sdk.html#replace? Does it register the transaction-id into the extended attributes of the document ? How does it help in checking that the document is not involved in another transaction?
  3. Most importantly, what if an application does not perform a ctx.get(collection, "doc-id"); and instead directly does a ctx.replace(doc, content); ? Will couchbase not be able to detect conflicting transactional writes ?

The mechanics of Couchbase transactions as explained here does not explain how does Couchbase acquire a lock and how does it ensure that conflicts in updates within a transaction does not take place.

Does it use only CAS, or does it use only the transaction-id in the extended attributes of the document or does it use a combination of both ?

I know that we cannot mix transactional writes and non-transactional writes to a document as mentioned here. So, it seems like CAS probably is not used or, by itself, probably, cannot guarantee data-integrity.

Can you please help because we are facing financial loss in production due to this problem?

Detecting conflicts with other transactions involves a number of steps, that yes does include CAS checking.

What is the significance of the ctx.get(collection, "doc-id"); as mentioned here Using Couchbase Transactions | Couchbase Docs? Does it register the transaction-id into the extended attributes of the document ? How does it help in checking that the document is not involved in another transaction?

The doc needs to be fetched inside transaction for reasons including that we need to check if another transaction has locked it. This is part of why you can’t use a regular non-transactional GetResult passed in from outside the transaction.

Most importantly, what if an application does not perform a ctx.get(collection, "doc-id"); and instead directly does a ctx.replace(doc, content); ? Will couchbase not be able to detect conflicting transactional writes ?

Yes it would prevent us detecting such conflicting writes, but it is not allowed at compile-time by the API: ctx.replace() intentionally takes a TransactionalGetResult from ctx.get(), not a GetResult.

The mechanics of Couchbase transactions as explained here does not explain how does Couchbase acquire a lock and how does it ensure that conflicts in updates within a transaction does not take place.

Locking is currently done through a combination of CAS checks and checking if the doc has metadata from another transaction. But we don’t want users depending on these undocumented internals, as we need to have flexibility to make future improvements and features.

Thanks for your reply.
We use our organisation’s foundation-teams’s thin-wrapper (adaptor) to interact with the couchbase’s SDK. I noticed that they, indeed, are running a ctx.get() , just before the ctx.replace() operation.

I believe that you will agree that if the ctx.get() does not place some kind of a atomic “lock” on the document, then, it won’t guarantee that write-conflicts are detected before the ctx.replace() operation. Because, if not, then two parallel threads can simultaneously run a ctx.get() operation and both will not “see” the other transaction and will decide to move ahead with the ctx.replace(). If this is indeed, the case, then couchbase will have to serialize the ctx.replace() requests and perform an atomic conflict check and write operation to ensure data-loss does not take place.

My question was whether:

  • it is of utmost important to run a ctx.get() operation before we add any business data to the couchbase document; or
  • can we run a simple non-transactional KV-get operation and add the business updates to the document; and then eventually, just before we update the document using a transaction, we run a ctx.get() and then a ctx.replace() with the prepared document? I know that we will lose the retry mechanism benefits that couchbase provides. But we have a retry from the application code, which repeats the entire sequence of steps as mentioned before starting from the non-transactional KV-get.

“I believe that you will agree that if the ctx.get() does not place some kind of a atomic “lock” on the document, then, it won’t guarantee…”

As Graham said, don’t rely on how transactions are implemented. Or how you conjecture they are implemented.

If you’re just updating one document and that document is not otherwise updated in a transaction, then just using CAS is sufficient. No need for a transaction.

Only within the transaction.run will be a transaction.

The document from your non-tranactional get may have been updated before the ctx.get on your transaction - suppose it was a bank account balance that was increased by a deposit. And your current transaction was a withdrawal - starting with the balance from the get outside the transaction (before the deposit). The deposit will be lost and your customer will be unhappy.

You could actually use the non-tranactional get followed by the ctx.get - if you compare the cas from both gets and if they are different then retry your whole operation from the non-tranactional get. But it seems that is more complicated that just using transactions the way they were meant to be used.

I can’t possibly work through every use case that you might have, and all the ways that “clever” use of transactions would cause you to lose data. So follow the documentation. In a single transaction.run - ctx.get, modify the data then ctx.replace.

From what you told me, it seems that ctx.get() registers the transaction-id in the extended attributes of the document. And that serves like a lock on the document. Is my understanding correct ?

Another question, is, if i run ctx.get() multiple times on the same document within a transaction, then will it cause a problem ?

"From what you told me, it seems that ctx.get() registers the transaction-id in the extended attributes of the document. And that serves like a lock on the document. Is my understanding correct

The implementation is internal. The repository is public - you are free to explore it.

Ok. Can you please answer my second question:

if i run ctx.get() multiple times on the same document within a transaction, then will it cause a problem ?

PS: We are not casual users of your product. We are deploying it worldwide and also paying for it.

There’s nothing in the documentation that advises against multiple ctx.get of a document in the same transaction.

“PS: We are not casual users of your product. We are deploying it worldwide and also paying for it”

This makes it even more critical that you don’t start relying on implemention details that might change. I am not the implentor of transactions - its not like I am withholding implementation secrets - I actually don’t know the implementation details.

Yes, but the documentation didn’t advice against what we are doing. And yet, we are expected to follow it to the ‘T’.

I don’t need the exact details. But i would expect that logically the API was designed to do something.

My basic question is whether the ctx.get() is designed to be a stateful or stateless API and if its stateful, then is it idempotent ?

“My basic question is whether the ctx.get() is designed to be a stateful or stateless API and if its stateful, then is it idempotent ?”

I read the question as “will it cause a problem” which is not the same as “is it idempotent”. Without knowing what “a problem” is, I can only defer to the documentation.

ctx.get within a transaction is not idempotent. Within a transaction, the changes that have been made, but not yet committed are what is “seen”. A ctx.replace in the transaction would result in a second ctx.get returning the updated document. If there was no modification, then it would return the same value. i suppose if you wanted to make further modifications, you woud need a new GetResult to modify and pass into replace. [ Also - if the document was modified by a non-tranactional replace - I don’t know what the value would be (Couchbase Transactions only provides isolation from other transactions)]

“the documentation didn’t advice against what we are doing”

Doing what? Overwriting the result of a ctx.get with the (modified) result of a prior non-transactional get? Which results in the loss of a deposit as described above the bank account example? The documentation does not advise against that as that might be the desired outcome - if it was a change of address, for instance. Neither does the transaction documentation give anything like that as an example as it is inherently dangerous and not very “transactional” - if the application is just going to overwrite regardless of previous values or other updates, it can just use upsert.

I’m not the authority on transactions, and some of my assertions may not be completely accurate, but -

Going back to your original post -

“PS: We are using optimistic lock for updating documents.”

That’s usually sufficient for single document updates. If there is no additional requirements - then maybe you don’t need transactions.

It would help to see the actual code because I’m having difficulty understanding the description.

" 4. call the transaction.run method to execute the anonymous function (inside this, we only update the document. There is no document fetch operation.)"

This sounds like it is doing a non-tranactional update. ctx.replace takes a TransactionGetResult which comes from ctx.get. Since you haven’t called any get in the transaction you wouldn’t have a TransactionGetResult to pass to ctx.replace - are you using the non-tranactional collection.replace()? That executes independently of the transaction (It could still give CasMismatch, but it’s not from the transaction)

" 5. commit the transaction / in case of an exception roll it back. (this is done inside the anonymous function)"

transaction.run will call commit and rollback; the application code should not be calling them (I’m not even sure that it can). If you are calling collection.replace() instead of ctx.replace, then there are no changes in the transaction; commit (called explicitly or by transaction.run) will be a no-op and always succeed. An exception (such as CasMismatch from collection.replace) thrown in the anonymous method can (maybe?) cause transaction.run to re-execute the anonymous method - but it would continue to fail with CasMismatch as the old cas (which was obtained from the collection.get from outside the transaction.run) will never match.

“We are observing that inspite of sending the CAS value during the update operation, the document is being overwritten by a parallel update operation”

Let’s see… ok.

“We do have a retry wherein once we receive a CasMismatchException from couchbase, we fetch the document once again and then again restart the transaction.”

Yes. So when the CasMismatch is thrown - it’s preventing a previous modification from being overwritten. But when you retry - you re-fetch the (now modified) document, and reapply the changes (or use the complete, modified document that failed previously with CasMismatch) you overwrite the changes that gave the CasMismatch.

In case it didn’t get mentioned - if you are updating a document in a transaction in one place, then every place that document is updated should be in a transaction. The transaction mechanism only works when every update uses it.

And using collection.replace (or anything method that is not ctx.*) inside a transaction.run is not transactional. While nothing is preventing calls to collection.get etc inside transaction.run, its probably not the call you should be making. ctx.get etc are most likely the correct calls.

To better see what is going on, print out the document and cas after every get operation and before every replace operation.

I haven’t fully caught up on this now-long thread, so apologies if the below has been covered already by Mike.
I would say this is maybe getting beyond the realms of a forum discussion, and since you’re an enterprise customer @pradeep1 it might be an idea to open a support ticket where we can go into more detail, have a look at your code, etc.

From what you told me, it seems that ctx.get() registers the transaction-id in the extended attributes of the document. And that serves like a lock on the document. Is my understanding correct ?

I don’t want to get too much into the implementation details, because as above we don’t want users to depend on internal factors that may change when future improvements and features are made.

So in broad strokes - we use a mix of optimistic and pessimistic techniques to detect any write-write conflicts with other transactions, and generally those will involve rolling back and retrying one of the transactions. In your specific scenario above we would detect it when one of the transactions does ctx.replace() and discovers that the CAS has been changed by another transaction. It would then trigger a rollback+retry.

We do not put a lock on the document at the ctx.get() point (this would be a fully pessimistic approach). That’s technically an implementation detail and subject to change, but this would turn a cheap read into a more expensive write, and we are unlikely to.

  • it is of utmost important to run a ctx.get() operation before we add any business data to the couchbase document; or

It is, for reasons hopefully covered. And the API requires it at compile-time - it’s not possible to transactionally modify a document without it.

if i run ctx.get() multiple times on the same document within a transaction, then will it cause a problem ?

It won’t cause any technical problems, though seems inefficient.

can we run a simple non-transactional KV-get operation and add the business updates to the document; and then eventually, just before we update the document using a transaction, we run a ctx.get() and then a ctx.replace() with the prepared document

It’s somewhat outside the intended use-case, and though it might be possible to get it to work, it’s fighting the platform to a degree. I think I’m struggling to understand why you can’t apply the same business logic between the ctx.get() and the ctx.replace()? Why does it need to be applied to a non-transactional GetResult and then copied to the TransactionalGetResult?

1 Like

We are familiar with your account. You already have ticket #62377 for the transactions questions.

yes. We have opened a ticket, but we are not seeing enough traction over there.

Thanks for your time. From what i understand:

  1. its important to invoke ctx.read(). Anyway the code will not compile if this is not done. I don’t know if this is a stateful or stateless API, but never mind.
  2. we can invoke ctx.read() any number of times before invoking ctx.replace()

This is sufficient information for me to move ahead.

We have opened a ticket, but we are not seeing enough traction over there.

Support is aware of this thread. There’s nothing for them to provide that is not already in this thread.

I don’t know if this is a stateful or stateless API

I apologize if my previous answer wasn’t sufficient - If it can be called twice, and the second time it returns something different from the first time - then it is stateful.

“A ctx.replace in the transaction would result in a second ctx.get returning the updated document.”

we can invoke ctx.read() any number of times before invoking ctx.replace()

It must be invoked at least once before a ctx.replace() as the result from ctx.get() is a required input to ctx.replace(). Furthermore - if the document is modified in the the transaction, another ctx.get() must be issued to get the modified TransactionGetResult if another ctx.replace() will be called. The original TransactionGetResult will not work as the cas will have changed.

If you want to see a framework that uses Couchbase Transactions internally see spring-data-couchbase/src/main/java/org/springframework/data/couchbase/core/ReactiveReplaceByIdOperationSupport.java at main ¡ spring-projects/spring-data-couchbase ¡ GitHub

The cas from the ctx.get must match the cas of the earlier ctx.get to ensure the document was not modified. Otherwise retryTransactionOnCasMismatch is returned, resulting in the whole lambda being re-executed. The reason for repeating the ctx.get at that point is because there is no place for the original TransactionGetResult to be held in the spring-data model. The original ctx.get() would have been here spring-data-couchbase/src/main/java/org/springframework/data/couchbase/core/ReactiveFindByIdOperationSupport.java at main ¡ spring-projects/spring-data-couchbase ¡ GitHub

According to me, the definition of stateful is that it changes the state of the DB anytime that its called. I am not referring to audit or logging or observability related changes on the DB, but rather something related to the data or metadata.

And, the definition of idempotency is that the same API can be called consecutively multiple times and which leaves the database with the same state after each call and there are no exceptions thrown back to the caller.

ctx.get does not change the document data. What happens to the metadata is internal to the implementation and the application cannot make assumptions. It can be called multiple times.