Max Idle Connections error

Hi,

I am using the Kotlin SDK v1.3.0 but I don’t see a way to set the maxPoolSize or the maxIdleConnections. We are seeing the below error on the server. Could you please help?

Number of connections on XXXX on node [xxxxxx] is 30769.00 which is greater than 1500

We have two clusters and we do the below for both clusters:

Cluster.connectUsingSharedEnvironment(
            connectionString = connection.primaryCluster,
            username = bucketConfig.username,
            password = bucketConfig.password,
            env = clusterConfig
        )

Here is my config:

ClusterEnvironment.builder()
            .timeoutConfig {
                TimeoutConfig.Builder()
                    .kvTimeout(Duration.ofMillis(1000))
                    .connectTimeout(Duration.ofMillis(2000))
                    .disconnectTimeout(Duration.ofMillis(2000)
            }
).build()

The default maximum number of kv connections is 1 and the maximum number of http connections is 12. So 30769 connections is not from a single instance of the SDK’s cluster. It would appear that the application is creating many multiple instances of the cluster.

1 Like

Thanks! But based on the documentation here the Max concurrent key-value connections can be 60k. Can you help us figure out the root cause of this and how can we reduce the number of connections.

We have two clusters and our application is deployed across 200 different K8s pods. So, I’d imagine there should be 200 x 2 = 400 connections.

That’s on the server. That is not controlled by the SDK. Is it even an error message or is it just informational or a warning? Is there any degraded behavior - such as the server rejecting new connections?

We have two clusters and our application is deployed across 200 different K8s pods. So, I’d imagine there should be 200 x 2 = 400 connections.

Actually 800 connections as the SDK cluster will have one connection to fetch the configuration plus one connection for kv requests. But 30796 is a long ways from 800, so apparently you have something else going on. Look in the server logs (memcached.log) to see where all those connections are coming from.

The connections will look something like this. The server ip address is the one with port 11210 (or 11207 for ssl), the other address is the. client.

2024-02-23T04:41:07.008058+00:00 INFO 267: HELO [{"a":"java/3.2.6 (Linux 4.18.0-372.71.1.el8_6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_272-b10)","i":"14A0399200000001/0000000084C7F809"}] Mutation seqno, XATTR, XERROR, Select bucket, Snappy, Unordered execution, Tracing, AltRequestSupport, SyncReplication, Collections, PreserveTtl, VAttr, SubdocCreateAsDeleted [ {"ip":"10.130.8.3","port":52378} - {"ip":"10.13 0.2.14","port":11210} (not authenticated) ]

1 Like

Hi @giyer7,

Thanks for using the Kotlin SDK :heart:

That timeoutConfig method you’re calling takes a Consumer<TimeoutConfig.Builder>. The SDK expects your code to customize the builder passed to the consumer. Instead, the code sample you shared creates a new builder that ends up getting ignored. As a result, the SDK is not configured with your desired settings.

Here’s what the code should look like after the fix:

val env = ClusterEnvironment.builder()
    .timeoutConfig { builder ->
        builder
            .kvTimeout(Duration.ofMillis(1000))
            .connectTimeout(Duration.ofMillis(2000))
            .disconnectTimeout(Duration.ofMillis(2000))
    }
    .build()

And here’s the equivalent, using some Kotlin sugar:

val env = ClusterEnvironment.builder {
    timeout {
        kvTimeout = 1.seconds
        connectTimeout = 2.seconds
        disconnectTimeout = 2.seconds
    }
}.build()

To also configure the HTTP pool size and idle time:

val env = ClusterEnvironment.builder {
    timeout {
        kvTimeout = 1.seconds
        connectTimeout = 2.seconds
        disconnectTimeout = 2.seconds
    }
    io { 
        // Default is 12 connections.
        // Setting it higher might be good. 
        maxHttpConnections = 64
        
        // Default is 1 second. Anything higher
        // than 4 seconds risks triggering the
        // server's Slowloris mitigation.
        idleHttpConnectionTimeout = 2.seconds
    }
}.build()

But none of that answers the question of why there are too many connections.

In addition to following the advice of my colleague @mreiche , you could check to make sure you are calling cluster.disconnect() when you are done with the cluster, and calling env.shutdown() after all clusters using your shared ClusterEnvironment have been disconnected.

You could also consider not using a shared ClusterEnvironment. If you don’t share the environment, you don’t need to shut it down yourself – the SDK will take care of that for you.

Thanks,
David

1 Like

Yes, we are Kotlin all the way! :hand_with_index_finger_and_thumb_crossed:

Thanks, I will make the builder changes as you suggested! It was kind of unclear that the builder was unused, maybe there is a way to better indicate that at compile-time.
Also, we are using Cluster.connectUsingSharedEnvironment instead of Cluster.connect. We can switch over to Cluster.connect instead.

Just curious, how/where do we disconnect the cluster in our application? Our application is always up and running, ready to serve requests, it never shuts-down.

If you use the Cluster until the process terminates, then there’s no need to disconnect it.

I’ll second @mreiche 's suggestion to look in memcached.log to see where all the connections are coming from.

Thanks,
David

1 Like

Where are you seeing that message? I don’t find any such message in couchbase. Is there any other information around it?

This might be a message from our internal alerting systems. But, we checked the Couchbase server admin console and the number of connections on our cluster with 3 nodes was indeed hovering around 30k/cluster.

[quote=“giyer7, post:9, topic:38183”]This might be a message from our internal alerting systems
[/quote]

So to avoid that message, the solution would be to increase wherever that 1500 is configured in your internal alerting systems to the 60,000 limit of couchbase (?)

If you want to see where those connections are coming form, use an OS utility like netstat or lsof…

netstat -anvp tcp | awk 'NR<3 || /11210/ || /11207/'

or lsof or similar as below. Hmmm - in my environment it shows there are many connections which are not coming from SDK clients. Mine shows there are 107 connections from various couchbase processes - indexer, eventing, cbft etc. And I only have one data node. In an environment with multiple data nodes, I would also expect additional connections between datanodes for exchanging configuration information.

% lsof -n | grep 11210
eventing- 86522 michaelreiche    9u     IPv4 0x502524d6bf6a4627         0t0                 TCP 127.0.0.1:55737->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   10u     IPv4 0x502524d6bef52def         0t0                 TCP 127.0.0.1:63380->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   11u     IPv4 0x502524d6befd71af         0t0                 TCP 127.0.0.1:63389->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   12u     IPv4 0x502524d6bf7d7fcf         0t0                 TCP 127.0.0.1:63400->127.0.0.1:11210 (ESTABLISHED)
eventing- 86522 michaelreiche   14u     IPv4 0x502524d6bf74c8bf         0t0                 TCP 127.0.0.1:63440->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   15u     IPv4 0x502524d6c0a971af         0t0                 TCP 127.0.0.1:63383->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   16u     IPv4 0x502524d6bf6b8b57         0t0                 TCP 127.0.0.1:63392->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   17u     IPv4 0x502524d6bf80d267         0t0                 TCP 127.0.0.1:63408->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   18u     IPv4 0x502524d6bf754d37         0t0                 TCP 127.0.0.1:63416->127.0.0.1:11210 (ESTABLISHED)
cbft      86530 michaelreiche   19u     IPv4 0x502524d6bf696fcf         0t0                 TCP 127.0.0.1:63443->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   10u     IPv4 0x502524d6bf7d8b57         0t0                 TCP 127.0.0.1:63384->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   11u     IPv4 0x502524d6bef1dfcf         0t0                 TCP 127.0.0.1:63393->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   12u     IPv4 0x502524d6c0a9c267         0t0                 TCP 127.0.0.1:63409->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   13u     IPv4 0x502524d6be892b57         0t0                 TCP 127.0.0.1:63417->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   14u     IPv4 0x502524d6bf699267         0t0                 TCP 127.0.0.1:63444->127.0.0.1:11210 (ESTABLISHED)
indexer   86543 michaelreiche   27u     IPv4 0x502524d6
.
.
.

Hmm. Interesting! Having huge number of connections is still okay, but the idle connections should be closed and removed from the pool. In our case, majority of the connections were idle.

Which connections are idle? Ones from the SDK or others?

Thanks for this feedback! We’re acting on it by deprecating the static factory methods that return new config builders. Tracking as JVMCBC-1504 and scheduled for inclusion in the April release.

Looking back at your code I see you were calling the public constructors directly. I suppose we should find a way to deprecate those too.

Thanks,
David

2 Likes

Now, I see the below error:

Exception in thread "main" java.lang.IllegalStateException: ClusterEnvironment.Builder.build() may only be called once.
        at com.couchbase.client.kotlin.env.ClusterEnvironment$Builder.build(ClusterEnvironment.kt:229)
        at com.couchbase.client.kotlin.Cluster$Companion.connect(Cluster.kt:772)
        at com.couchbase.client.kotlin.Cluster$Companion.connect(Cluster.kt:745)

I changed my configs to use Cluster.connect() and passing in the Builder

private fun primaryCluster() = with(connectionProperties) {
        Cluster.connect(
            connectionString = connection.primaryCluster,
            username = bucketConfig.username,
            password = bucketConfig.password,
            envBuilder = clusterConfig
        )
    }

private fun secondaryCluster() = with(connectionProperties) {
        connection.secondaryCluster?.let {
            Cluster.connect(
                connectionString = it,
                username = bucketConfig.username,
                password = bucketConfig.password,
                envBuilder = clusterConfig
            )
        }
    }

// In a separate class
val clusterConfig = ClusterEnvironment.builder {
            timeout {
                kvTimeout = keyValueTimeout.toLong().milliseconds
                connectTimeout = timeout.toLong().milliseconds
                disconnectTimeout = timeout.toLong().milliseconds
            }
            io {
                // Default is 12 connections per pool.
                // Setting it higher might be good.
                maxHttpConnections = maxHttpConnectionsPerPool.toInt()

                // Default is 1 second. Anything higher
                // than 4 seconds risks triggering the
                // server's Slowloris mitigation.
                idleHttpConnectionTimeout = timeout.toLong().milliseconds
            }
            this.meter = OpenTelemetryMeter.wrap(openTelemetry)
            this.requestTracer = OpenTelemetryRequestTracer.wrap(openTelemetry)
            this.jsonSerializer = MoshiJsonSerializer(
                    Moshi.Builder()
                    .addLast(KotlinJsonAdapterFactory())
                    .build()
                )
        }

Yes, that’s by design. The comment where that exception is thrown explains the [somewhat unfortunate] reason for this limitation:

// Prevent passing the same builder to Cluster.connect()
// multiple times with different connection strings.
// Don't want the connection string params to leak into the builder.

Perhaps you could replace the clusterConfig property with a function that returns a new ClusterEnvironment.Builder each time it’s called.

But, it will still be two ClusterEnvironment.Builder.build() calls for two clusters. Is that okay?

Sure, it’s fine. The build() method may only be called once per builder.

Okay. Just to confirm, this should work?

private fun primaryCluster() = with(connectionProperties) {
        Cluster.connect(
            connectionString = connection.primaryCluster,
            username = bucketConfig.username,
            password = bucketConfig.password,
            envBuilder = primaryClusterConfig
        )
    }

    private fun secondaryCluster() = with(connectionProperties) {
        connection.secondaryCluster?.let {
            Cluster.connect(
                connectionString = it,
                username = bucketConfig.username,
                password = bucketConfig.password,
                envBuilder = secondaryClusterConfig
            )
        }
    }

// Part of a config class
val primaryClusterConfig = clusterEnvironmentBuilder(),
val secondaryClusterConfig = clusterEnvironmentBuilder(),

private fun clusterEnvironmentBuilder() =
        ClusterEnvironment.builder {
            timeout {
                kvTimeout = keyValueTimeout.toLong().milliseconds
                connectTimeout = timeout.toLong().milliseconds
                disconnectTimeout = timeout.toLong().milliseconds
            }
            io {
                // Default is 12 connections per pool.
                // Setting it higher might be good.
                maxHttpConnections = maxHttpConnectionsPerPool.toInt()

                // Default is 1 second. Anything higher
                // than 4 seconds risks triggering the
                // server's Slowloris mitigation.
                idleHttpConnectionTimeout = timeout.toLong().milliseconds
            }
            this.meter = OpenTelemetryMeter.wrap(openTelemetry)
            this.requestTracer = OpenTelemetryRequestTracer.wrap(openTelemetry)
            this.jsonSerializer = MoshiJsonSerializer(
                Moshi.Builder()
                    .addLast(KotlinJsonAdapterFactory())
                    .build()
            )
        }

Sure, that should work.

1 Like