N1QL queries slower on M1 mac (16gb) than a free tier ec2 returning similar results in mysql

Hi all,
I am tasked with porting our back end systems from mysql to couchbase (major rewrite of all apps). CB is pretty darn cool when it comes to clusters, replication, etc.

However, I’ve hit a road block. We have ~60k products in our db. Simple products - barcodes, descriptions, prices etc.

Mysql has appropriate indexes, and runs on EC2 small (2gb + swap).

When doing a mysql query on ec2 (ebs nas storage): “select * from products” the results typically take 0-1ms (effectively instant).

I am running couchbase on a M1 max with 32gb. I have allocated 18gb to bucket, 4gb to indexing, and so on, so that should not be a limiting factor.

I understand couchbase is different, (which is why we want to move towards it), but doing an equivalent n1ql query (“select meta().id, products.* from products”) on my M1 it is taking 700ms - 1.1s!

I (belieive I…) have the right indexes, and I understand under the hood it’s retrieving results differently (fetching each k-v), but I’d like to resolve this latency/slowness issue and I’ve hit a wall.

Any input would be greatly appreciated!

Can you post the query plan? From the webui, the query tab. Also - in both scenarios - where is the client in relation to the server? The network latency will play a big part.
Thanks.

Thanks for the quick response.

As far as network latency - this is all on localhost, executing the queries through the web gui.

Here is the query plan:

{
#operator”: “Authorize”,
#stats”: {
#phaseSwitches”: 4,
“execTime”: “192.207µs”,
“servTime”: “18.042µs”
},
“privileges”: {
“List”: [
{
“Target”: “default:devbucket.dev.products”,
“Priv”: 7,
“Props”: 0
}
]
},
“~child”: {
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 2,
“execTime”: “3.166µs”
},
“~children”: [
{
#operator”: “PrimaryScan3”,
#stats”: {
#heartbeatYields”: 1,
#itemsOut”: 60021,
#phaseSwitches”: 240087,
“execTime”: “47.361804ms”,
“kernTime”: “441.306666ms”,
“servTime”: “20.381571ms”
},
“bucket”: “devbucket”,
“index”: “#primary”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “products”,
“namespace”: “default”,
“scope”: “dev”,
“using”: “gsi”,
#time_normal”: “00:00.067”,
#time_absolute”: 0.067743375
},
{
#operator”: “Fetch”,
#stats”: {
#heartbeatYields”: 261,
#itemsIn”: 60021,
#itemsOut”: 60021,
#phaseSwitches”: 240334,
“execTime”: “38.642939ms”,
“kernTime”: “28.578401ms”,
“servTime”: “447.998827ms”
},
“bucket”: “devbucket”,
“keyspace”: “products”,
“namespace”: “default”,
“scope”: “dev”,
#time_normal”: “00:00.486”,
#time_absolute”: 0.486641766
},
{
#operator”: “InitialProject”,
#stats”: {
#itemsIn”: 60021,
#itemsOut”: 60021,
#phaseSwitches”: 180068,
“execTime”: “79.833966ms”,
“kernTime”: “380.501263ms”,
“state”: “running”
},
“discard_original”: true,
“result_terms”: [
{
“expr”: “(products.barcode)”
}
],
#time_normal”: “00:00.079”,
#time_absolute”: 0.079833966
},
{
#operator”: “Stream”,
#stats”: {
#itemsIn”: 60021,
#itemsOut”: 60021,
#phaseSwitches”: 60022,
“execTime”: “48.589217ms”
},
#time_normal”: “00:00.048”,
#time_absolute”: 0.048589217
}
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.000003166
},
“~versions”: [
“7.2.1-N1QL”,
“7.2.1-5934-enterprise”
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.00021024899999999997
}

And how are you measuring the mysql?
Btw- using ‘raw’ in the select will make ot a little faster.

Thanks - I tried raw, it came down to ~450ms.

In mysql:

“select barcode from products where active = 1;”
… yields 65698 rows in set (0.025 sec)

what’s interesting is that this is a ec2.small which shouldn’t touch the sides of an m1!

For the mysql - is it timing the retrieving of the rows to the client? The couchbase timing is.

Also - if you want the fastest performance from Couchbase, it’s usually fastest to get only the document ids from the query, and stream those into the reactive kv-api. This avoids a (potential) extra hop from the query-executing node to the node where the active document resides. If you have a multi-node cluster, that will make a difference.

For retrieving all the documents in a collection, range-scan has the potential to be even faster (the initial query us not needed)

It is, yeah.

I agree re trying to get id only - and typically this is how i’d query the database. Unfortunately sometimes I need to get the whole lot. In one case, for example, I’m performing a jaccard search on each (most) of the fields within the document. I even looked at UDF’ing the jaccard function but it was still very slow.

For retrieving all the documents in a collection, range-scan has the potential to be even faster (the initial query us not needed)

What do you mean? sounds like it might be where I need to look next…?

If you are just doing select * from without any filtering every key is fetched from the primary index then each complete document is retrieved from the KV. If you have an index on (active,barcode) (or a partial index on (barcode) where active = 1) and you test with select barcode from product where active = 1 no full document fetch will be needed (covered query) and you should see improved performance.

create index ix1 on products(active, barcode)
create index ix2 on products(barcode) where active = 1

HTH.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.