Unrecoverable error in socket timeouts to KeyValueEndpoint on 11210

bsiggers · July 7, 2017, 8:19pm

Hello all,

We’re currently experiencing some intermittent errors in our java application which uses spring-data-couchbase (2.0.0.RELEASE). We’re running couchbase community edition 4.5.0-2601 on a four-node cluster.

What we’re seeing, is that every 3-7 days, is that whenever the following error shows up in our java logs, that it is basically the ‘kiss of death’, and that most database connectivity for key-value operations grinds to a halt from the java component that throws the error:

2017-07-07 08:13:09.107 WARN [cb-io-1-23] c.c.client.core.endpoint.Endpoint : [xxxxxxx/10.224.165.186:11210][KeyValueEndpoint]: Socket connect took longer than specified timeout.

After this comes a litany of ConcurrentTimeoutExceptions for any key-value operations. Bouncing the java application resolves the issue.

As some additional info, we have started keeping track of open tcp connections on port 11210 - what we have noticed is almost always have 16 connections open (which makes sense - kvServiceEndpoints = 4 * 4 couchbase nodes = 16) but once that log entry above shows up in the logs, we end up with a permanent maximum of 15 connections, which never recover.

Our config is as follows:

2017-07-07 17:51:16.314 INFO [main] com.couchbase.client.core.CouchbaseCore : CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword='null', queryEnabled=false, queryPort=8093, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=40, computationPoolSize=40, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=4, viewServiceEndpoints=1, queryServiceEndpoints=4, ioPool=NioEventLoopGroup, coreScheduler=CoreScheduler, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-jvm-core/1.2.3 (git: 1.2.3), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=300000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, queryTimeout=300000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, disconnectTimeout=25000, dnsSrvEnabled=false}

Has anyone else faced similar issues?

daschl · July 12, 2017, 9:23am

@bsiggers would you be open to try a newer version of spring-data-couchbase (or ift hats not possible, at least manually bump the SDK version to something newer (like 2.4.6 or so)) and see if the issue persists?

If it does it might make sense to grab debug/trace logs for around that time and take a closer look to see whats causing the disruption.

bsiggers · July 12, 2017, 4:29pm

Thanks @daschl - we’ll give updating spring-data-couchbase to 2.4.6 a try. It’s something we were also considering but because some of the APIs have changed we were putting it off due to some of the refactoring involved.

Topic		Replies	Views
NodeUnavailableException / Timeout / existing connection was forcibly closed .NET SDK	8	5251	September 2, 2016
Frequent TimeoutException from Java SDK Java SDK query , connections , java	7	3675	January 13, 2021
[RxComputationScheduler-1] Endpoint - [][KeyValueEndpoint]: Socket connect took longer than specified timeout Java SDK java	4	2505	May 14, 2018
Couchbase server randomly disconnects KV endpoints/KeepAlive not working Java SDK java	13	7677	October 5, 2016
Couchbase connection timeout exception Kubernetes connections , java	0	987	July 30, 2020

Unrecoverable error in socket timeouts to KeyValueEndpoint on 11210

Related topics