We are using Couchbase as our main database, so we want to minimize the downtime as much as we can.
Sometimes one of the Couchbase suffer a restart, it may sometimes change the node IP address
When it happen, the java sdk enters in a loop trying to reconnect but it never could re-connect since the IP address changed:
Could not connect to endpoint on reconnect attempt 31, retrying with delay 4096 MILLISECONDS: com.couchbase.client.deps.io.netty.channel.ConnectTimeoutException: connection timed out: couchbase-set-0.couchbase.****.cluster.local/***.***.***.***:11210
Socket connect took longer than specified timeout: connection timed out: couchbase-set-0.couchbase.****svc.cluster.local/***.***.***.***:11210
Is there any way of automatically refresh the couchbase IP without having any downtime on our couchbase calls?
Best regards!
Further details:
Couchbase community 5.1.1
Couchbase java sdk 2.5.9
The SDKs regularly retrieve a cluster map from the cluster and adjust topology to what the cluster map sasy. If you see that in the log, it would appear to indicate the cluster is telling the SDK that there is a node there. You could probably confirm this by increasing the log level.
I think weâd need more info on the exact scenario, but I donât believe the cluster itself supports/handles IPs changing on cluster nodes. You can probably make it work by going to hostnames that track the changing IPs. @artem may be able to offer more info on whatâs happening with the cluster.
The client needs at least one of the nodes in the list/connection string supplied to be responding for the cluster. As long as they donât all restart and take on new IPs, this would work.
But, as you identified, this would cause a problem of maintaining the list. And yes, our solution for that is DNS SRV records. You can maintain an SRV record with the nodes in the cluster and thatâll work.
Note also since you brought up K8s, thereâs a newer feature called Multi-Network Configurations for the case where the nodes may move and the application is outside the pod. You can read more about it as #39 in the sdk-rfcs and @chaitra.ramarao is working on further requirements for the feature.
We configured the SRV record in kubernetes, with success:
Loaded seed nodes from DNS SRV [couchbase-set-0.couchbase.****-staging.svc.cluster.local, couchbase-set-1.couchbase.****-staging.svc.cluster.local].
But after the couchbase pod restart, the IP changed from 10.52.3.6 to 10.52.6.219, and our client was retrying forever the old IP address (only a restart to our application can solve this)
[couchbase-set-0.couchbase.****-staging.svc.cluster.local/10.52.3.6:11210][KeyValueEndpoint]: Could not connect to endpoint on reconnect attempt 2102, retrying with delay 4096 MILLISECONDS: com.couchbase.client.deps.io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: couchbase-set-0.couchbase.*****-staging.svc.cluster.local/10.52.3.6:11210
Itâs already merged in client 2.7.2 which shipped in December 2018.
The property is com.couchbase.forceDnsLookupOnReconnect as indicated in the changeset. Set it to true and you should have what you need (the docs cover how to set these). Looks like weâll need to add it to the docs as well. Filed DOC-4718.