Random ObjectThreadException errors in CouchBase Python driver when using N1QL queries from multiple threads

When using Python SDK version 3.0.1 or later, a cluster.query call fails randomly when used from multiple threads.
A simple script below can be used to reproduce this quite easily (increase number of threads if not reproduced, but normally at 10 threads more than half of queries fail).

COUCHBASE_URI="changeme"
COUCHBASE_USER="changeme"
COUCHBASE_PASS  ="changeme"

import traceback
from concurrent.futures import ThreadPoolExecutor
from couchbase.cluster import Cluster, ClusterOptions
from couchbase_core.cluster import PasswordAuthenticator

cluster = Cluster(
    COUCHBASE_URI,
    ClusterOptions(PasswordAuthenticator(COUCHBASE_USER, COUCHBASE_PASS)),
)

def query():
    try:
        q = cluster.query("SELECT * FROM ['test'] as ks")
        print(list(q))
    except:
        traceback.print_exc()

pool = ThreadPoolExecutor(10)
for i in range(10):
    pool.submit(query)

A traceback observed is quite notable:

Traceback (most recent call last):
  File "/srv/ota-lite/ota_lite/test_couchbase_lockmode.py", line 16, in query
    q = cluster.query("SELECT * FROM ['test'] as ks")
  File "/usr/lib/python3.8/site-packages/couchbase/cluster.py", line 592, in query
    return self._maybe_operate_on_an_open_bucket(CoreClient.query,
  File "/usr/lib/python3.8/site-packages/couchbase/cluster.py", line 611, in _maybe_operate_on_an_open_bucket
    if self._is_6_5_plus():
  File "/usr/lib/python3.8/site-packages/couchbase/cluster.py", line 547, in _is_6_5_plus
    response = self._admin.http_request(path="/pools").value
  File "/usr/lib/python3.8/site-packages/couchbase/management/admin.py", line 159, in http_request
    return self._http_request(type=LCB.LCB_HTTP_TYPE_MANAGEMENT,
couchbase.exceptions.ObjectThreadException: <Couldn't lock. If LOCKMODE_WAIT was passed, then this means that something has gone wrong internally. Otherwise, this means you are using the Connection object from multiple threads. This is not allowed (without an explicit lockmode=LOCKMODE_WAIT constructor argument, C Source=(src/oputil.c,661)>

A problem lies inside a check for _is_6_5_plus which internally creates a shared thread-unsafe bucket to verify a CouchBase server version. Although a better approach would be to allow a user to pass this as a configured option.

There are two workarounds:

  1. Set LOCKMODE_WAIT on a cluster object which would obviously result into a dramatic performance hit.
  2. Use a CoreClient.query directly and pass a per-thread bucket object to it (requires a dozen lines of obscure hackish code).

At the moment we’ve used the second workaround in our code, though it would be nice if CouchBase maintainers fix a Python SDK itself.

It looks like this bug was introduced while fixing an even worse bug https://issues.couchbase.com/browse/CCBC-1204.
Although that migth be a relief for single-threaded applications, it doesn’t really help for multi-threaded applications as a more dangerous traceback has been introduced with that fix. Moreover, investigating a code a little bit it seems that this fix might result into normal KV operations failing randomly when using both KV and N1QL in one application (and we’ve seen that in our application logs quite a few).

Hello @vkhoroz first of all thank you very much for looking into what would have potentially caused a defect.
@jcasey can you look at this ?

Also @vkhoroz just for your awareness there is a async couchbase python library (acouchbase) that you can use for async operations.

Yes, this is an issue that we are tracking that has a dependency upon LCB.

In the interim, if possible, you might look into the acouchbase API. Docs on querying can be found here.