We are facing frequent timeout issues in our application while fetching data from couchbase(memcached bucket).
Adding application logs:
Causing: net.spy.memcached.OperationTimeoutException: Timeout waiting for bulk values: waited 400 ms. Node status: Connection Status { /10.52.67.80:11210 active: true, authed: true, last read: 17 ms ago /10.52.51.29:11210 active: true, authed: true, last read: 83 ms ago /10.52.18.254:11210 active: true, authed: true, last read: 85 ms ago /10.51.18.213:11210 active: true, authed: true, last read: 14 ms ago /10.50.99.13:11210 active: true, authed: true, last read: 78 ms ago /10.50.163.60:11210 active: true, authed: true, last read: 105 ms ago /10.49.162.254:11210 active: true, authed: true, last read: 11 ms ago }
Here 400ms is the timeout that we have set on application-side. This is observed at load testing, So whenver we cross a threshold of qps we start seeing these timeouts.
One more thing that we observed was while checking the logs of couchbase nodes we found that in only 1 machine we could see lots of
Slow SET operation on connection (10.50.246.249:44598 => 10.50.163.60:11210): 2513 ms
WARNING 168: Slow SET operation on connection (10.50.37.200:43754 => 10.50.163.60:11210): 1149 ms
WARNING 196: Slow SET operation on connection (10.49.117.39:60748 => 10.50.163.60:11210): 1039 ms
WARNING 410: Slow SET operation on connection (10.52.179.44:34924 => 10.50.163.60:11210): 1069 ms
WARNING 79: Slow GET operation on connection (10.53.132.32:43836 => 10.50.163.60:11210): 1590 ms
So we thought of removing this machine from the cluster and doing the load testing again thinking this node can be causing issues.
But again during load testing, we observed the same Slow SET operations on a different node and only on this node.
Can anyone help out in this?