Uneven number of active requests in index/query node
4 index / query nodes exits
one of the node have double number of requests there by CPU usage alerts are triggering
How are the requests being issued? From the SDK in a client app? Which SDK, which version? Is the Cluster connection created multiple times? Is the app executed multiple times? Is it just the CPU that is unevenly distributed or the number of reqiests? Does system:completed_sessions also show the unbalanced distribution?
The requests are executed via SDK
Go-Client SDK , Version 1.6
Cluster connection established once
CPU is not evenly distributed along with the requests
Completed request shows different pattern
Below includes the request count details
node 3 is the one having issue
#Active requests
active_requests | node |
---|---|
528 | node 3 |
185 | node 4 |
137 | node 2 |
128 | node 1 |
Completed requests
completed_requests | node |
---|---|
133 | node 1 |
103 | node 2 |
99 | node 3 |
62 | node 4 |
528 active_requests on one node seems like a lot. Suppose node 3 has 16 cores, each of those 528 requests would get 1/33 of a core if all the requests were cpu-bound. It’s worth looking into what those requests are and maybe try to optimize them.
If you are using transactions, the requests within the transaction are sent to the same node.
The completed requests look like they are weighted towards the nodes in order - something that would happen if the queries were balanced to nodes in order, and (re)started from node 1 multiple times.
Thanks for the insights
Will review on the CPU bound queries if persists
and check whether transactions are used
Will a graceful restart of the node 3 will help in reassemble the requests evenly across all nodes in the cluster
As far as I know, there is no mechanism to preserve query requests across restarts. The requests would disappear and node 3 might have 0 active requests, but the SDK might retry them.
What timeout is the SDK using on the requests? The default is 75 seconds for query requests. After 75 seconds the query service and/or the SDK should cancel the requests.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.