Setting Durability for collection.insert does not seem to work

We are experiencing a few anomalies with the python sdk. One that presently has me scratching my head has to do with durability. Specifically, our use case is that our web front end code is creating a document and requesting it to be inserted into a collection in the _default scope of one of our buckets. Upon successful completion and response from the back end, it then issues a get request to retrieve all documents for that user. Strangely, the list usually does not include the most recently added document. We have tried all durability settings on the insert and the issue persists. Are we doing something wrong?

Here is the most recent iteration in the code (by the way, we had to add the .value even though the docs did not indicate this as being required. If we do not add that, the first line of code below fails - separate issue I would certainly like to understand - are the docs/examples are just wrong?)

        opts = InsertOptions(durability=ServerDurability(Durability.PERSIST_TO_MAJORITY.value))
        result = self.collection.insert(str(document_id), workflow, opts)

We are presently running Couchbase Community Edition, v 7.0.0 on Ubuntu 20.04 LTS

@lroder welcome to the community. Which version of the SDK are you using ?
Durability ensures data writes and I don’t think that is your problem (If my understanding is correct based on your description).
So you are doing KV operations first is an insert followed by a get ?
Is it possible to share the complete code that also has get ?
@jcasey something to keep track of.

Arun,
Thanks!

SDK version 3.0.10. Curiously, newer SDK versions fail badly in other ways on Ubuntu 20.04 interacting with Couchbase CE 7.0 Beta. I’ve had to stick at 3.0.10, but that’s a discussion for another time. Here’s an excerpt from the code that interacts with Couchbase (separate from the code processing incoming web requests).

Oh, I should mention that the connection to Couchbase is persistent and is a synchronous connection presently for a couple reasons. The first being I could not get async to work (SDK v3.0.10 and various v3.1.x), even following the documentation and attempting a couple permutations based on stuff I found on the web. The second being that I don’t think it really needs to be async in the end because the app itself is being served up via uvicorn async, and all of the enpoints are implemented using fastapi async. So, I don’t think forcing the Couchbase connection to be async really changes anything – comments welcome!


def connect(self):
self.cluster = Cluster(self.host, ClusterOptions(PasswordAuthenticator(self.user, self.passwd)))
self.bucket = self.cluster.bucket(self.bucket_name)
self.collection = self.bucket.scope(self.scope_name).collection(self.collection_name)


def query(self, sql, params=, autocommit=False):

try:
result = self.cluster.query(sql, QueryOptions(positional_parameters=params))


def add_workflow_instance_to_cache(self, user_id, university_id, workflow_instance):

opts = InsertOptions(durability=ServerDurability(Durability.MAJORITY_AND_PERSIST_TO_ACTIVE.value))
result = self.collection.insert(str(document_id), workflow, opts)


def retrieve_all_workflows_for_uni(self, university_id):
rows = self.query(“SELECT *, meta().id AS _id FROM workflows._default.instances WHERE university = $1”,
[university_id])

The scenario is that the front end code hits an endpoint which, after validation the incoming request/payload calls ‘add_workflow_instance_to_cache()’. Upon successful response from Couchbase, an HTTP OK is returned to the front end. The front end then makes a call to ‘retrieve_all_workflows_for_uni()’ to get the list of workflow instances.

In theory, the one that was just added should be in the list. In actuality, it is not always present. For the time being, the front end devs have introduced a time delay before launching the GET request. Not a good solution. Basically, our hope/objective is that there is a way to ensure that the newly inserted document is present at the time of return from the POST request that precipitated the INSERTion of a new workflow document.

Hi @lroder

The issue here is that when you run your N1QL query, it’s checking an index - and that index isn’t necessarily up to date with the mutation you just did yet. What you want are the query consistency levels, AtPlus or RequestPlus, which require the query to be up to date with either a specific set of mutations, or all mutations at the time of the query, respectively. This lets you choose your consistency level (because of course, higher consistency can mean more latency) at read time, rather than having to pay the consistency cost on every write.

Please see docs for more details:

Graham,
Thanks for the explanation! That does help. Needing a bit of guidance as to ‘best practice’ for our use case. Since the UI is the originator of the GET request, there is no continuity/knowledge of the index state when the back end receives the get request. Are you suggesting that perhaps as part of the INSERT, I issue the RYOW and then return HTTP OK? If that’s the case, can I drop the ‘durability’ on the INSERT?

I think you have two options here.

One is to create a new combined endpoint that does the insert, and then does the N1QL query with AtPlus consistency level, passing in the MutationToken returned from the insert. Then returns that query result. This would be the most efficient approach.

The other is to keep your current setup, but have the query endpoint use RequestPlus. Since that’s a little more expensive than the default Unbounded consistency level, you might want to pass that in as a hint from the frontend, which has the context of knowing its just done an update.

And yes, durability is orthogonal to this. Durability is instead about how resistant you need that data to be to various forms of failure. The durability docs are a useful read here.

Graham,
Thanks! Yeah, I’ve already modified the code to do the first option. Am going to test shortly (multitasking at the moment!). Meanwhile, sort of a dumb question perhaps, but would it make sense if there were some way on the INSERT itself to stipulate that we need the index updated prior to returning a result? It just seems a bit strange/inefficient to me that two separate requests need to be launched across the network to achieve what theoretically should be possible with a single request (clearly, that’s an SDK change, I get it, just wondering aloud since I’m all about efficiency!).
Thanks

Hi @lroder

Well, it’s always going to be two separate requests I think? One for the KV insert, one for the query. Under the hood there’s an efficient replication stream (DCP) from the KV service to the indexing service. So it’s just a question of whether you’re waiting for the indexer to have that mutation either at the mutation point (as you’re suggesting) or at the read point (e.g. AtPlus or RequestPlus) - either way, it’s the same number of requests. And I’d argue that it makes sense to do the wait at the read point. Because it makes sense to me to leave the decision up to the reader, whether they want to wait for that mutation (or all mutations) to be in the index, rather than the writer, which doesn’t necessarily have that context. And it keeps a nice segregation of concerns between the KV layer and the query/index layer.

OK, I attempted to add the mutation state stuff and got the following error in my select statement:
AttributeError: ‘MutationResult’ object has no attribute ‘_mutinfo’

I started paying careful attention to differences between this link:
https://docs.couchbase.com/python-sdk/3.0/howtos/n1ql-queries-with-sdk.html
and the one posted above:
https://docs.couchbase.com/python-sdk/current/howtos/n1ql-queries-with-sdk.html#scan-consistency

and realized that scan consistency is not discussed in v3.0 if the SDK. This is problematic in that I have not been able to get v3.1 (any of the point releases) working in our environment. Should I open another request/thread for that?

HI @lroder - Unfortunately mutation tokens are not available in the Python 3.x SDK at the moment. See ticket here that is in the works. I can reply to this thread once we have updates in the master branch.

Also, I do not know if or when the CE 7.0 Beta version will be updated, but until then, if you want to use more recent versions of the Python SDK, you can pip install with the following: LCB_TAG=3.1.0 python3 -m pip install couchbase. Probably the biggest thing to note with using the older LCB (the C library the Python SDK wraps) version is that a CASMismatchException will not be raised. LCB did not raise that exception until v 3.1.1.

I do apologize for the state of the acouchbase API (i.e. couchbase + asyncio), we do have some updates in the works here, here and here. Once these tickets are resolved the acouchbase API will be in much better shape. Also, there is a fastAPI example I created not too long ago…maybe that can be of some help.