Max Idle Connections error

giyer7 · March 27, 2024, 12:19am

Hi,

I am using the Kotlin SDK v1.3.0 but I don’t see a way to set the maxPoolSize or the maxIdleConnections. We are seeing the below error on the server. Could you please help?

Number of connections on XXXX on node [xxxxxx] is 30769.00 which is greater than 1500

We have two clusters and we do the below for both clusters:

Cluster.connectUsingSharedEnvironment(
            connectionString = connection.primaryCluster,
            username = bucketConfig.username,
            password = bucketConfig.password,
            env = clusterConfig
        )

Here is my config:

ClusterEnvironment.builder()
            .timeoutConfig {
                TimeoutConfig.Builder()
                    .kvTimeout(Duration.ofMillis(1000))
                    .connectTimeout(Duration.ofMillis(2000))
                    .disconnectTimeout(Duration.ofMillis(2000)
            }
).build()

mreiche · March 27, 2024, 4:30pm

The default maximum number of kv connections is 1 and the maximum number of http connections is 12. So 30769 connections is not from a single instance of the SDK’s cluster. It would appear that the application is creating many multiple instances of the cluster.

giyer7 · March 27, 2024, 4:43pm

Thanks! But based on the documentation here the Max concurrent key-value connections can be 60k. Can you help us figure out the root cause of this and how can we reduce the number of connections.

We have two clusters and our application is deployed across 200 different K8s pods. So, I’d imagine there should be 200 x 2 = 400 connections.

mreiche · March 27, 2024, 5:11pm

That’s on the server. That is not controlled by the SDK. Is it even an error message or is it just informational or a warning? Is there any degraded behavior - such as the server rejecting new connections?

We have two clusters and our application is deployed across 200 different K8s pods. So, I’d imagine there should be 200 x 2 = 400 connections.

Actually 800 connections as the SDK cluster will have one connection to fetch the configuration plus one connection for kv requests. But 30796 is a long ways from 800, so apparently you have something else going on. Look in the server logs (memcached.log) to see where all those connections are coming from.

The connections will look something like this. The server ip address is the one with port 11210 (or 11207 for ssl), the other address is the. client.

2024-02-23T04:41:07.008058+00:00 INFO 267: HELO [{"a":"java/3.2.6 (Linux 4.18.0-372.71.1.el8_6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_272-b10)","i":"14A0399200000001/0000000084C7F809"}] Mutation seqno, XATTR, XERROR, Select bucket, Snappy, Unordered execution, Tracing, AltRequestSupport, SyncReplication, Collections, PreserveTtl, VAttr, SubdocCreateAsDeleted [ {"ip":"10.130.8.3","port":52378} - {"ip":"10.13 0.2.14","port":11210} (not authenticated) ]

david.nault · March 27, 2024, 5:24pm

Hi @giyer7,

Thanks for using the Kotlin SDK

That timeoutConfig method you’re calling takes a Consumer<TimeoutConfig.Builder>. The SDK expects your code to customize the builder passed to the consumer. Instead, the code sample you shared creates a new builder that ends up getting ignored. As a result, the SDK is not configured with your desired settings.

Here’s what the code should look like after the fix:

val env = ClusterEnvironment.builder()
    .timeoutConfig { builder ->
        builder
            .kvTimeout(Duration.ofMillis(1000))
            .connectTimeout(Duration.ofMillis(2000))
            .disconnectTimeout(Duration.ofMillis(2000))
    }
    .build()

And here’s the equivalent, using some Kotlin sugar:

val env = ClusterEnvironment.builder {
    timeout {
        kvTimeout = 1.seconds
        connectTimeout = 2.seconds
        disconnectTimeout = 2.seconds
    }
}.build()

To also configure the HTTP pool size and idle time:

val env = ClusterEnvironment.builder {
    timeout {
        kvTimeout = 1.seconds
        connectTimeout = 2.seconds
        disconnectTimeout = 2.seconds
    }
    io { 
        // Default is 12 connections.
        // Setting it higher might be good. 
        maxHttpConnections = 64
        
        // Default is 1 second. Anything higher
        // than 4 seconds risks triggering the
        // server's Slowloris mitigation.
        idleHttpConnectionTimeout = 2.seconds
    }
}.build()

But none of that answers the question of why there are too many connections.

In addition to following the advice of my colleague @mreiche , you could check to make sure you are calling cluster.disconnect() when you are done with the cluster, and calling env.shutdown() after all clusters using your shared ClusterEnvironment have been disconnected.

You could also consider not using a shared ClusterEnvironment. If you don’t share the environment, you don’t need to shut it down yourself – the SDK will take care of that for you.

Thanks,
David

giyer7 · March 27, 2024, 8:28pm

Yes, we are Kotlin all the way!

Thanks, I will make the builder changes as you suggested! It was kind of unclear that the builder was unused, maybe there is a way to better indicate that at compile-time.
Also, we are using Cluster.connectUsingSharedEnvironment instead of Cluster.connect. We can switch over to Cluster.connect instead.

Just curious, how/where do we disconnect the cluster in our application? Our application is always up and running, ready to serve requests, it never shuts-down.

david.nault · March 27, 2024, 9:12pm

If you use the Cluster until the process terminates, then there’s no need to disconnect it.

I’ll second @mreiche 's suggestion to look in memcached.log to see where all the connections are coming from.

Thanks,
David

mreiche · March 27, 2024, 10:01pm

Where are you seeing that message? I don’t find any such message in couchbase. Is there any other information around it?

giyer7 · March 27, 2024, 10:16pm

This might be a message from our internal alerting systems. But, we checked the Couchbase server admin console and the number of connections on our cluster with 3 nodes was indeed hovering around 30k/cluster.

mreiche · March 27, 2024, 10:54pm

[quote=“giyer7, post:9, topic:38183”]This might be a message from our internal alerting systems
[/quote]

So to avoid that message, the solution would be to increase wherever that 1500 is configured in your internal alerting systems to the 60,000 limit of couchbase (?)

mreiche · March 27, 2024, 11:06pm

If you want to see where those connections are coming form, use an OS utility like netstat or lsof…

netstat -anvp tcp | awk 'NR<3 || /11210/ || /11207/'

or lsof or similar as below. Hmmm - in my environment it shows there are many connections which are not coming from SDK clients. Mine shows there are 107 connections from various couchbase processes - indexer, eventing, cbft etc. And I only have one data node. In an environment with multiple data nodes, I would also expect additional connections between datanodes for exchanging configuration information.

% lsof -n | grep 11210
eventing- 86522 michaelreiche    9u     IPv4 0x502524d6bf6a4627         0t0                 TCP 127.0.0.1:55737->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   10u     IPv4 0x502524d6bef52def         0t0                 TCP 127.0.0.1:63380->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   11u     IPv4 0x502524d6befd71af         0t0                 TCP 127.0.0.1:63389->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   12u     IPv4 0x502524d6bf7d7fcf         0t0                 TCP 127.0.0.1:63400->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   14u     IPv4 0x502524d6bf74c8bf         0t0                 TCP 127.0.0.1:63440->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   15u     IPv4 0x502524d6c0a971af         0t0                 TCP 127.0.0.1:63383->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   16u     IPv4 0x502524d6bf6b8b57         0t0                 TCP 127.0.0.1:63392->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   17u     IPv4 0x502524d6bf80d267         0t0                 TCP 127.0.0.1:63408->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   18u     IPv4 0x502524d6bf754d37         0t0                 TCP 127.0.0.1:63416->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   19u     IPv4 0x502524d6bf696fcf         0t0                 TCP 127.0.0.1:63443->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   10u     IPv4 0x502524d6bf7d8b57         0t0                 TCP 127.0.0.1:63384->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   11u     IPv4 0x502524d6bef1dfcf         0t0                 TCP 127.0.0.1:63393->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   12u     IPv4 0x502524d6c0a9c267         0t0                 TCP 127.0.0.1:63409->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   13u     IPv4 0x502524d6be892b57         0t0                 TCP 127.0.0.1:63417->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   14u     IPv4 0x502524d6bf699267         0t0                 TCP 127.0.0.1:63444->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   27u     IPv4 0x502524d6
.
.
.

giyer7 · March 27, 2024, 11:26pm

Hmm. Interesting! Having huge number of connections is still okay, but the idle connections should be closed and removed from the pool. In our case, majority of the connections were idle.

mreiche · March 27, 2024, 11:34pm

Which connections are idle? Ones from the SDK or others?

david.nault · March 28, 2024, 12:04am

Thanks for this feedback! We’re acting on it by deprecating the static factory methods that return new config builders. Tracking as JVMCBC-1504 and scheduled for inclusion in the April release.

Looking back at your code I see you were calling the public constructors directly. I suppose we should find a way to deprecate those too.

Thanks,
David

giyer7 · March 28, 2024, 4:17pm

Now, I see the below error:

Exception in thread "main" java.lang.IllegalStateException: ClusterEnvironment.Builder.build() may only be called once.
        at com.couchbase.client.kotlin.env.ClusterEnvironment$Builder.build(ClusterEnvironment.kt:229)
        at com.couchbase.client.kotlin.Cluster$Companion.connect(Cluster.kt:772)
        at com.couchbase.client.kotlin.Cluster$Companion.connect(Cluster.kt:745)

I changed my configs to use Cluster.connect() and passing in the Builder

private fun primaryCluster() = with(connectionProperties) {
        Cluster.connect(
            connectionString = connection.primaryCluster,
            username = bucketConfig.username,
            password = bucketConfig.password,
            envBuilder = clusterConfig
        )
    }

private fun secondaryCluster() = with(connectionProperties) {
        connection.secondaryCluster?.let {
            Cluster.connect(
                connectionString = it,
                username = bucketConfig.username,
                password = bucketConfig.password,
                envBuilder = clusterConfig
            )
        }
    }

// In a separate class
val clusterConfig = ClusterEnvironment.builder {
            timeout {
                kvTimeout = keyValueTimeout.toLong().milliseconds
                connectTimeout = timeout.toLong().milliseconds
                disconnectTimeout = timeout.toLong().milliseconds
            }
            io {
                // Default is 12 connections per pool.
                // Setting it higher might be good.
                maxHttpConnections = maxHttpConnectionsPerPool.toInt()

                // Default is 1 second. Anything higher
                // than 4 seconds risks triggering the
                // server's Slowloris mitigation.
                idleHttpConnectionTimeout = timeout.toLong().milliseconds
            }
            this.meter = OpenTelemetryMeter.wrap(openTelemetry)
            this.requestTracer = OpenTelemetryRequestTracer.wrap(openTelemetry)
            this.jsonSerializer = MoshiJsonSerializer(
                    Moshi.Builder()
                    .addLast(KotlinJsonAdapterFactory())
                    .build()
                )
        }

david.nault · March 28, 2024, 5:04pm

Yes, that’s by design. The comment where that exception is thrown explains the [somewhat unfortunate] reason for this limitation:

// Prevent passing the same builder to Cluster.connect()
// multiple times with different connection strings.
// Don't want the connection string params to leak into the builder.

Perhaps you could replace the clusterConfig property with a function that returns a new ClusterEnvironment.Builder each time it’s called.

giyer7 · March 28, 2024, 5:07pm

But, it will still be two ClusterEnvironment.Builder.build() calls for two clusters. Is that okay?

david.nault · March 28, 2024, 5:27pm

Sure, it’s fine. The build() method may only be called once per builder.

giyer7 · March 28, 2024, 5:31pm

Okay. Just to confirm, this should work?

private fun primaryCluster() = with(connectionProperties) {
        Cluster.connect(
            connectionString = connection.primaryCluster,
            username = bucketConfig.username,
            password = bucketConfig.password,
            envBuilder = primaryClusterConfig
        )
    }

    private fun secondaryCluster() = with(connectionProperties) {
        connection.secondaryCluster?.let {
            Cluster.connect(
                connectionString = it,
                username = bucketConfig.username,
                password = bucketConfig.password,
                envBuilder = secondaryClusterConfig
            )
        }
    }

// Part of a config class
val primaryClusterConfig = clusterEnvironmentBuilder(),
val secondaryClusterConfig = clusterEnvironmentBuilder(),

private fun clusterEnvironmentBuilder() =
        ClusterEnvironment.builder {
            timeout {
                kvTimeout = keyValueTimeout.toLong().milliseconds
                connectTimeout = timeout.toLong().milliseconds
                disconnectTimeout = timeout.toLong().milliseconds
            }
            io {
                // Default is 12 connections per pool.
                // Setting it higher might be good.
                maxHttpConnections = maxHttpConnectionsPerPool.toInt()

                // Default is 1 second. Anything higher
                // than 4 seconds risks triggering the
                // server's Slowloris mitigation.
                idleHttpConnectionTimeout = timeout.toLong().milliseconds
            }
            this.meter = OpenTelemetryMeter.wrap(openTelemetry)
            this.requestTracer = OpenTelemetryRequestTracer.wrap(openTelemetry)
            this.jsonSerializer = MoshiJsonSerializer(
                Moshi.Builder()
                    .addLast(KotlinJsonAdapterFactory())
                    .build()
            )
        }

david.nault · March 28, 2024, 6:23pm

Sure, that should work.

Topic		Replies	Views
How to find number of connection to cluster? Couchbase Server connections	8	5385	October 10, 2017
CouchbaseEnvironment configuration Java SDK	5	1270	November 9, 2017
Nodejs sdk 3.1.1 - kvConnectTimeout is not being passed in options object from Cluster to Connection Node.js SDK connections	3	1461	November 8, 2021
Couchbase - Memcached bucket maximum connection .NET SDK	2	4269	March 25, 2015
No connections currently available .NET SDK	4	2087	January 9, 2015

Max Idle Connections error

Related topics