Hello all,
We’re currently experiencing some intermittent errors in our java application which uses spring-data-couchbase (2.0.0.RELEASE). We’re running couchbase community edition 4.5.0-2601 on a four-node cluster.
What we’re seeing, is that every 3-7 days, is that whenever the following error shows up in our java logs, that it is basically the ‘kiss of death’, and that most database connectivity for key-value operations grinds to a halt from the java component that throws the error:
2017-07-07 08:13:09.107 WARN [cb-io-1-23] c.c.client.core.endpoint.Endpoint : [xxxxxxx/10.224.165.186:11210][KeyValueEndpoint]: Socket connect took longer than specified timeout.
After this comes a litany of ConcurrentTimeoutExceptions for any key-value operations. Bouncing the java application resolves the issue.
As some additional info, we have started keeping track of open tcp connections on port 11210 - what we have noticed is almost always have 16 connections open (which makes sense - kvServiceEndpoints = 4 * 4 couchbase nodes = 16) but once that log entry above shows up in the logs, we end up with a permanent maximum of 15 connections, which never recover.
Our config is as follows:
2017-07-07 17:51:16.314 INFO [main] com.couchbase.client.core.CouchbaseCore : CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword='null', queryEnabled=false, queryPort=8093, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=40, computationPoolSize=40, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=4, viewServiceEndpoints=1, queryServiceEndpoints=4, ioPool=NioEventLoopGroup, coreScheduler=CoreScheduler, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-jvm-core/1.2.3 (git: 1.2.3), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=300000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, queryTimeout=300000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, disconnectTimeout=25000, dnsSrvEnabled=false}
Has anyone else faced similar issues?