Consecutive get_locked warnings resulting in application slow response

Hi All,

We are have intermittent spike on CAS and upon digging memcached.log , having slow application response times whenever we receive consecutive set of below WARNINGS.

2020-04-03T17:21:10.558695-05:00 WARNING 384: Slow GET_LOCKED operation on connection: 531 ms ([ x.x.x.x.000 - x.x.x.x:0000 ]) opaque:0x0000000

crosschecked resident docs and disk, everything looks good. can someone please shed some light here.

Any couchbase experts have any input about this error message? We also have seem this error message during the compaction period which results a performance issue. What might have caused this problem?

Generally, the message there is a result of our internal checking for response times related to what we expect. We added quite a bit of this back in Server 5.5 (and the SDK Response Time Observability) to help narrow the scope when performance issues are reported. So, that message on it’s own isn’t the cause, it just helps to correlate to determine cause.

If you’re seeing this alongside compaction, I might ask if you’re using ext4 as the filesystem? We have found that, not through any fault of Couchbase but just how ext4 and IO scheduling in Linux work, there is a big lock when using ext4 that may prevent other threads which are not involved in that IO from proceeding. Using xfs, at least as of when I did some investigation on this, seems to be the best fix.

This is in the documentation (but not in 6.x, which I’ve reopened the doc issue for, DOC-1758) but not a very strong statement. Many details in MB-16750, MB-18845 and some others.

I don’t know that’s the cause here, but it very much sounds like what I’ve seen before. You might check/confirm in your environment.

Thanks for the explanation. Appreciated very much!

No problem. I’ll be curious to see what you find!