Very large "last read: xxx ms ago"

We are having problems with application timing out on reads. We are seeing the following in the logs:

Caused by: net.spy.memcached.OperationTimeoutException: Timeout waiting for value: waited 2,500 ms. Node status: Connection Status { sg01-cb1dat-005.core.cvent.org/172.21.80.44:11210 active: true, authed: true, last read: 2,790 ms ago sg01-cb1dat-004.core.cvent.org/172.21.80.43:11210 active: true, authed: true, last read: 1,494 ms ago sg01-cb1dat-002.core.cvent.org/172.21.80.17:11210 active: true, authed: true, last read: 329,362 ms ago sg01-cb1dat-006.core.cvent.org/172.21.80.45:11210 active: true, authed: true, last read: 1,494 ms ago sg01-cb1dat-001.core.cvent.org/172.21.80.16:11210 active: true, authed: true, last read: 329,363 ms ago sg01-cb1dat-003.core.cvent.org/172.21.80.18:11210 active: true, authed: true, last read: 329,363 ms ago }
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1241)

The values here for some nodes are absurdly large…

Other applications are not reporting issues (right now, anyway).

Ugh… This is likely due to a load test on reduced set of resources. Still not sure about why we are seeing the timeouts, but the long times are probably accounted for.

Note that the read times can be long just because somehting hasn’t gone to that particular node. As to “cause”, it’s not really possible to put enough info into the exception to show directly the cause. It could be a JVM GC pause, a network issue, virtualization overprovisioning, or a scheduler issue. The SDK sits on a pretty tall list of abstractions, so it doesn’t actually know why. It just knows that so much time has passed and it’s not complete yet.

2 Likes