Misconception about scaling (query speed) with several nodes

benjamin_glatzeder · August 30, 2019, 4:30pm

Setup: Couchbase Server 6.5 Beta EE on two nodes. Nodes connected via switch with 1 Gbps.

This post is about my misconception. It isn’t about “this query is slow” or “how can I create an index to speed things up”.

I intentionally excecuted a slow long-running query. Setup is two nodes, both have the query service. Below are screenshots of how many gets per second are executed. As I understand gets per second is a good measure on how fast this particular query runs. The query is the last one run in the screenshot which took several minutes. Both nodes average around 12.5k gets per second. Together around 25k gets per second.

Setup is a single node. Average gets per second is ~ 30k gets per second.

Setup is two nodes. Only one node has query service. Average gets per seconds and graph spikes are very similar to the first setup.

Conclusion is that a single node is faster than 2 nodes. My conception was that several nodes share the query execution. I also hoped that it scales nearly linearly. Clearly I’m wrong and I’m looking for some insight.

Thanks!

benjamin_glatzeder · September 1, 2019, 8:15pm

I did some more testing. For the following graphs I went back to Couchbase Server 6.0 CE on Ubuntu 16.04. Hardware setup is:

2 nodes
same hard drive
same RAM
1x 6 core / 12 thread CPU
1x 4 core / 8 thread CPU

In all three tests following below 100% of documents were resident (in RAM). There were no cache misses, hence the hard drive is not important. This time I ran a query which only took a few secods. The query was similar to this query:

SELECT COUNT(*) 
FROM my_bucket
WHERE type = 'myType'
AND myInteger = 0;

The query used this index:

CREATE INDEX `idx_type` ON `my_bucket`(`type`)

Test A

with 6 core CPU
single node cluster
gets per second: 60k - 70k

Test B

with 4 core CPU
single node cluster
gets per second: 30k

Test C

both CPUs
cluster with 2 nodes. Replication set to 1
gets per second: > 40k

Test C 2.png1144×400 34 KB

Some notes:

Nodes should have equal hardware, especially the CPU
Multidimensional scaling is not supported in CE
gets per second from Test A + gets per second from Test B is not equal to gets per second from Test C
2 nodes are slower than 1 node. Although the 1 node is part of the 2 node cluster.

Question

How many gets per second can be expected if both nodes had the same 6 core CPU? Would it be 2x 60 - 70k -> 120k - 140k?

househippo · September 1, 2019, 10:24pm

You can alway change the query service setting to get more keys per batch. Currently since you index doesn’t fully cover your query it must fetch the keys and the count them.

I would recommend you create a better index like:
CREATE INDEX idx_type_1 ON my_bucket(type) WHERE type = "myType" and myInteger = 0

re-run your query and check for index service time in row 3 , column 4 below and index cache miss ratio in row 4 , column 2. Since you’re running all the service the machines, its better to make the service do less work and move less data around when you want to query them.

You’ll find that you probably have to scale to bigger boxes. You’ll see it in you query with “kernTime”. It will grow and grow as your cluster gets bigger and the service fight for resource from the kernel. The Couchbase service will need to be isolated on different machines for better performance.

benjamin_glatzeder · September 2, 2019, 11:14am

@househippo I understand that covering all by an index is going to speed up the query by a lot. My current dev setup was a single node and I assumed by adding another node the query time would be cut in half. I was wrong about that. 2 nodes are slower than a single node. My findings can be found in my previous post in this thread.

I believe this option does not exist in CE. In EE there are:

query pipeline batch
query pipeline cap
query scan cap
query timeout
query prepared limit
query completed threshold
query log level
query max parallelism
What is the setting to get more keys per batch please?

I created the index. Querying for type and myInteger = 0 shows in the statistics that there was 1 request/sec. All other graphs don’t change.
Querying for type and myInteger = 1 shows 60M bytes returned / sec for 1 second. Scan latency is 4000ns for 1 second. I will need to read some documentation or experiences as I wouldn’t know if this is good performance for my particular hardware.

I was unable to find kernTime in CE. Bigger boxes can mean many different things. My goal for development purposes is to try out many queries. I might not get a query right the first time, need to edit the query run again, and again. So I won’t have everything indexed. This is all to save time while developing. Once I have the right query then I can create and index for production. So for the (document) gets per second should I invest in:

several nodes vs single node
high core count vs high base/boost frequency (i.e. 5Ghz)
fast RAM vs tighter timings

Since I import a backup of the production bucket into my dev cluster I could also only backup a small part of the bucket. That might be the best option. Still interested in the other things though. And in case I didn’t mention it before I’m not complaining about any slowness but rather interested in how to speed things up (without creating an index).

Thanks!

benjamin_glatzeder · September 5, 2019, 4:33pm

Some more info provided by @vsr1:

This answers at least this bullet point:

high core count vs high base/boost frequency (i.e. 5Ghz)

Topic		Replies	Views
N1QL run fast in one node and slow in other node of couchbase cluster with 2 nodes Couchbase Server	13	2407	October 17, 2016
Couchbase : 160x faster with only one node : why? Couchbase Server	1	1727	May 13, 2014
Performance Basic Query Update Couchbase Server	12	2275	September 13, 2019
N1qL query running too slow on cluster SQL++ query , n1ql , index	1	557	August 4, 2022
How much queries can I run in one second? SQL++	6	361	March 18, 2024

Misconception about scaling (query speed) with several nodes

Test A

Test B

Test C

Question

Related topics