We are seeing issues with the couchbase 2.7 latest version where it is not able to scale up.
When I hit the normal vertx health url returning a json, I get somewhere around 60k tps. If I replace it with reactive couchbase SDK get call, it comes down to 5k. The server machine on which application is running is having 1 core cpu and 4 GB RAM.
The couchbase cluster is 3 node cluster with 100% data in memory and there is no cpu increase on couchbase node during the test.
I can share a reproducer.
I would like to know how to get this resolved using this forum.
Does this mean it performed differently with an earlier 2.7?
From my read of the description of your test, you were earlier returning some in memory structure directly and now you’re making a Couchbase get() call which obviously is going to involve some network IO, even if on localhost. My guess is you’re testing this with a simple loop fetching it as fast as you can?
If so, what you’re seeing is probably expected. The additional latency to fetching the item from Couchbase would drop your throughput. If you add more concurrency, your throughput would go up. There are more tuning options, but that’s more about optimization than understanding why you see a difference.
See also Little’s Law. Since the average wait time is relatively fixed (e.g., your network IO whether localhost or actual, plus processing time) adding concurrency means turning up the arrival time, which improves throughput until something deeper in the system becomes the tall pole.
Popping this up a level, are you trying to implement a health check? If so, fetching a single document probably isn’t the best way to do this.
We have a couple of health check APIs: ping() and diagnostics() that use noops and can fan out to different services on the system. You may want to check that out in the docs.
Also, if you’re hitting one key, you’re also forcing some overhead internal to the Data service to access that item. You may not see significant CPU usage because it’s all in memory and you may have a lot of cores, but guaranteed there is a hot lock with a busy loop like this.
Could you explain the difference between test 1, 2,3? What is different and what is outside what you expect to see?
The point I am trying to prove is that when you increase the number of connection, couchbase start taking time and that screws the overall metrics. The health checkis not doing any processing but still doing the JSON creation work but still able to give performance.
Now vertx and couchbase both having the reactive libraries and are meant for cloud, is this the best we can get with the constrained hardware requirement ?
The test I am running are on VM with 1 core CPU and 4GB RAM and the results that are shared are from the VM. I get worse performance when I run the same code on openshift 3.11 with pod having the same size as VM.
Is there any optimization that we can do( I am using JsonDocument and can still live with RawDocument) or any other considerations so that we can achieve at least 10000 tps with lets say 100 concurrent users with 99%tile being within 10 msec ? That way we can set the number of pods based on the number of connections that the application is going to get.
Does this mean you ran with multiple concurrent requests? If so, how many?
I’m glad to try to help, of course. I don’t have the time at the moment to read the code and try to repro it for you. We can try to guide you in your investigation.
This sounds like it should be quite doable, yes. I don’t think you need to change the transcoder just yet, unless you have profiled and know you’re CPU bound. My suspicion is still that the difference between your two environments is that the latency goes up (which is not unreasonable) and that drops the throughput on a small number of tight loops. But if you have your 100 concurrent users, the kind of throughput you describe seems quite doable.