@chvck
I see these timeouts occurring in 2 different use cases -
One case is where timeouts occur throughout the day for random operations. This occurs across all clients. Haven’t been able to figure out a reason for this. The logs attached previously are for this case
Second case is where timeouts occur as a spike on individual clients. This affects about 4k-5k operations and spans a duration of 2-3 seconds. These clients are those with high load so it could be an impact of that. But I would like to know why it occurs at high load too. Attaching the logs for this use case too. (And adding the filtered logs in the message for quick reference). It looks like client got the response back after 3 seconds but it should not ideally take that long. Our end to end 99%tile latency is < 50 ms
Couchbase Errors.zip (27.0 KB)
17475778ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 2491 ms. This is not an error
17476583ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 2500 ms. This is not an error
17478268ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 2500 ms. This is not an error
17479132ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 2500 ms. This is not an error
17480771ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 2494 ms. This is not an error
17482296ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 1803 ms. This is not an error
17484514ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 622 ms. This is not an error
17484607ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 1057 ms. This is not an error
17487837ms [I0] {23096} [WARN] (server - L:459) <X:11210> (SRV=0x25d76b0,IX=0) Failing command (pkt=0xcc217b0, opaque=11550189, opcode=0x1d) with error LCB_ETIMEDOUT (0x17)
17488673ms [I0] {23096} [TRACE] (confmon - L:252) Refreshing current cluster map
17488673ms [I0] {23096} [ERROR] (server - L:534) <X:11210> (SRV=0x25d76b0,IX=0) Server timed out. Some commands have failed
17488673ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 0 ms. This is not an error
17488799ms [I2] {23096} [TRACE] (confmon - L:252) Refreshing current cluster map
17488799ms [I2] {23096} [ERROR] (server - L:534) <X:11210> (SRV=0x2718780,IX=0) Server timed out. Some commands have failed
17488799ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 2500 ms. This is not an error
17489275ms [I0] {23096} [INFO] (confmon - L:145) Not applying configuration received via CCCP. No changes detected. A.rev=4816, B.rev=4816
17489275ms [I0] {23096} [TRACE] (confmon - L:239) Attempting to retrieve cluster map via CCCP
17489275ms [I0] {23096} [INFO] (cccp - L:143) Re-Issuing CCCP Command on server struct 0x25d76b0 (X:11210)
17491568ms [I0] {23096} [ERROR] (server - L:534) <X:11210> (SRV=0x25d76b0,IX=0) Server timed out. Some commands have failed
17491568ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 0 ms. This is not an error
17491683ms [I0] {23096} [ERROR] (cccp - L:164) <NOHOST:NOPORT> Could not get configuration: LCB_ETIMEDOUT (0x17)
17491683ms [I0] {23096} [INFO] (confmon - L:177) Provider 'CCCP' failed
17491683ms [I0] {23096} [TRACE] (confmon - L:201) Maximum provider reached. Resetting index
17491824ms [I0] {23096} [INFO] (bootstrap - L:164) Not requesting a config refresh because of throttling parameters. Next refresh possible in 6849ms or 99 errors. See LCB_CNTL_CONFDELAY_THRESH and LCB_CNTL_CONFERRTHRESH to modify the throttling settings
17492660ms [I0] {23096} [WARN] (server - L:325) <X:11210> (SRV=0x25d76b0,IX=0) Server sent us reply for a timed-out command. (OP=0x1d, RC=0x0, SEQ=11550189)
17492660ms [I0] {23096} [WARN] (server - L:325) <X:11210> (SRV=0x25d76b0,IX=0) Server sent us reply for a timed-out command. (OP=0x1d, RC=0x0, SEQ=11550190)
17494297ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 788 ms. This is not an error
17494937ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 2450 ms. This is not an error
17495095ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 2264 ms. This is not an error
17497359ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 2497 ms. This is not an error
17497388ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 2500 ms. This is not an error
17499857ms [I0] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x25d76b0,IX=0) Scheduling next timeout for 2500 ms. This is not an error
17499888ms [I2] {23096} [TRACE] (server - L:538) <X:11210> (SRV=0x2718780,IX=0) Scheduling next timeout for 2500 ms. This is not an error