Query is fast in couchbase query window but slow in C# code

Hi all,
I am working on a C# program that pulls data from our Couchbase server. I’m running into a very confusing issue where I have two queries that I need to run.

One is both very fast in code and very fast in the Query window on the Couchbase console.

The second one is extremely slow in code, but is just as fast in the Query window on the Couchbase console.

Query A

select meta(mybucket).id from mybucket
where _deleted is missing
and organization=“myorganization”
and type=“Apple”

Query B:

select meta(mybucket).id from mybucket
where _deleted is missing
and organization=“myorganization”
and type=“compilation”
and formTemplate=“AppleCompilation”

In this particular case, each piece of information has two associated records. One is the original data (type Apple), and the other is a compilation of data from the first (ie. AppleCompilation). So they are a 1 to 1.

As such, each query pulls the exact same amount of data.
Also both have an accompanying index in Couchbase.

When I run Query A in the Couchbase Query window, I get 250 results and it takes 30ms.

In C#, if I run the command:

var results = await ClusterHelper.GetBucket(“mybucket”).QueryAsync(query);

It takes 30ms.

Essentially the exact same amount of time.

If I run Query B. It also takes 30ms in the Query window.
However, in the C# code, it takes 900ms.

It takes 30x longer.

I simply have no idea why. Some of these queries have 300,000 results, so when it takes 20 - 30 seconds in the Query window, it’s taking 10 to 15 minutes in C#!

I cannot find any reasoning for this, and after two days of chugging through it, I haven’t gotten any further.

Can someone help shed some light on this subject? Things I can examine?

Thanks

Can you provide a reproducer? Don’t worry that I don’t have the data.
Also ‘raw’ will return strings (versus json) and might be faster.

“select raw meta(mybucket).id from …”

Do you see the same difference if you just return the count(meta(mybucket).id) ?
That would indicate the performance difference is in the results processing.

What version of the .NET SDK are you using? There is no ClusterHelper class in the latest version. Are you using an older version? Or is this some custom wrapper code? If it’s an older SDK, I’d suggest upgrading. If it’s the latest SDK, please post the ClusterHelper code (or perhaps try it without using the ClusterHelper and see if that helps). My suspicions is that there’s an asynchronous coding problem somewhere.

Thanks for the response.

I tried select raw and just used strings, and the result time was the same.
Also, count(meta(mybucket).id) took the same amount of time as pulling the results.

It seems to simply be the line and formTemplate="AppleCompilation"

If I remove it, instead of 250 results, I’ll get 15,000 results. But instead of 900ms, it’ll only take 90.

I’ve tried several different variations of indexes, and they’re all blazing fast in the workbench Query window in Couchbase (one was even as low as 9ms to get my results) but it’s always around 900 - 1000ms in C#. But only for this one query. All of the other queries are just as fast as their Query window counterpart.

The question though is why. It makes no sense to me.

As for providing a reproducer, I’m not 100% sure what you’re asking for. Can you elaborate? Thanks.

It is an older version of the .NET SDK. 2.2.401
So, this is the original Couchbase.ClusterHelper.

We’re simply not at a place where we can be updating to newer versions at the moment.

As I mentioned in another response, nearly every other query is extremely fast, and is just as fast in the C# code as it is when I query the database directly.

It’s simply this one query that takes a massive amount of extra time.

But there’s nothing special or unique about it, so I’m stumped what to do about it or what’s causing it. Especially since the QueryAsync code is in a dll file that I can’t step through and examine during execution.

That tells me the query is executing slower, not the SDK processing results.

That tells me the query executed in the SDK is not the same as the query executed in the UI.

Can you run the query in the ui and in C# and then run this query to see the timing?

SELECT * from system:completed_requests order by requestTime desc limit 2

Augmenting @mreiche’s suggestion, remember to adjust your completed-threshold ( queryCompletedThreshold if using couchbase-cli) to capture both - likely set to 0, run the statement in C# and in the UI then set it back to 1000 (or whatever value you had for it originally).

1 Like

Alright, I was able to figure out the issue!

@mreiche The query you supplied didn’t quite get me the info it was supposed to. Not sure why, but if I did select count(*) from system:completed_requests there were only about 8 results, even though I’ve done dozens.

But it did lead me in the right direction.

I did see one of the queries I’d sent in, and it was of the format select meta(mybucket).id from mybucket where _deleted is missing and organization=$1 and type=$2 and formTemplate=$3

This is how the QueryRequest object stores a query statement. Then, the arguments are stored in an object array. The QueryRequest is then sent in as the parameter for the QueryAsync function.

I decided to try converting the original statement back to actually having the values in it, and then leave the list of arguments blank.

So, instead of sending in organization=$1 and type=$2 and formTemplate=$3 with a list of 3 arguments, I sent in organization="myorganization" and type="compilation" and formTemplate="AppleCompilation" and a list of 0 arguments.

It was lightning fast and just as fast as the UI query window.

So, it seems to me that it’s simply an issue in the Couchbase.ClusterHelper code. Maybe reasons as to why it was deprecated? Maybe having an argument list of more than 2 arguments slowed it way down?
Not sure.

But either way, I found the root cause and have a way around it at the moment. Thanks for pointing me in the right direction.

The issue is that the query plan can’t take into account variables like $1 when selecting an index. For example, if the index has a predicate such as WHERE organization = "myorganization" then you must supply it on the query in that syntax, not using a $1 parameter, or it can’t use the index. This is because it has no way to know at the planning stage that the variable will match that index, so it selects a different index instead.

@btburnett3 That makes perfect sense, and explains a lot of issues. Thanks for the explanation.

See dh’s comment. Only queries excedding the threshold are captured.