What determines a NodeDisconnectedEvent?

jberglund · June 13, 2016, 10:01pm

This section of the documenation http://developer.couchbase.com/documentation/server/4.1/sdks/java-2.2/event-bus-metrics.html
mentions monitoring and watching for NodeDisconnectedEvent.

I’m attempting integrate this with out monitoring system but am having difficulty understanding what and how a NodeDisconnectedEvent is triggered, in order to test this type of notification.

I’m running Couchbase 4.1.0-5005 Enterprise Edition (build-5005). I have tested with Java SDK 2.2.3 and 2.2.6

I’m running a 3 node cluster. After bringing up my environment I see the 3 NodeConnectedEvents on my event bus.

I then block access to one of the nodes by dropping all packets from/to it, from the client machine.
iptables -A INPUT -s IP_HERE -j DROP
iptables -A OUTPUT -d IP_HERE -j DROP

While trying to use the SDK, every third request times out. I don’t see a NodeDisconnectedEvent until about 20-25 minutes later.

ingenthr · June 14, 2016, 2:02am

Dropping packets is a bit different than terminating the connection. Is there a regular workload? The way we’ve approached this is that once we see a continuous number of timeouts to a given node (tuneable by a threshold), we drop and attempt to rebuild the connection at the client.

Normally this will happen within seconds or minutes, but it could take as long as 20-25 minutes later if there isn’t any workload.

This is a great test by the way. We do something like this regularly.

One way you can probably simulate a NodeDisconnectedEvent is if you kill the memcached process on one of the nodes. That will terminate the TCP connections (sending TCP FINs) and the client would then have to rebuild them.

jberglund · June 14, 2016, 1:38pm

Thanks for the information! I’m not doing a regular workload but am running some n1ql queries ad-hoc against the cluster after enabling the iptables firewall rules to block one of the nodes.

I’m trying to simulate the connection being broken from the perspective of the client SDK, for example, a firewall cutting a stale TCP connection, but not isolating the node the rest of the cluster, so am reluctant to kill the memcached process on the node.

I was expecting the Socket Keepalive Interval (set to the default 30s) mentioned here http://developer.couchbase.com/documentation/server/current/sdks/java-2.2/env-config.html would trigger the NodeDisconnectedEvent after some (shorter) period of time.

What configuration represents the number of timeouts to a given node that you mentioned above? I would like to try tuning it.

jberglund · June 20, 2016, 2:03pm

Perhaps https://issues.couchbase.com/browse/JVMCBC-340 is contributing to my confusion.

ingenthr · June 20, 2016, 10:47pm

Maybe @daschl can weigh in here when he has a moment.

daschl · June 22, 2016, 9:05am

the NodeDisconnectedEvent is triggerd the same time you’d see a node disconnect in the logs. That is when its internal state goes into DISCONNECTED from being previously CONNECTED. Most commonly this is the case when all sockets of a node go down (shutdown, failover, rebalance out).

We don’t do tcp level keepalive but the app level keepalive (sending various msgs over the app protocol in idle states) has no effect on this directly.

Make sure to not cut 1 stale TCP connection but rather perfom actual actions that will make the node removed, like a failover or a rebalance out with a node. if you just cut down one tcp socket the client will try to reconnect (since the node is still part of the server config) and you won’t see the event!

jberglund · July 5, 2016, 7:52pm

thanks for the information!

daschl · July 8, 2016, 8:05am

@jberglund does my explanation align with the behavior you see?

jberglund · July 11, 2016, 6:01pm

yes, that does align with what i am seeing. i was thinking the client would trigger this event on a broken tcp connection but that is not the case. i see the event when turning off the couchbase service, on failover, and removal.

Topic		Replies	Views
EventBus and NodeDisconnectedEvent Java SDK	5	2252	April 28, 2015
Connecting to initially disconnected/off cluster nodes Java SDK connections , java	16	4544	July 13, 2016
Client connection monitoring Java SDK	3	2875	June 22, 2018
Java client don't connect another cluster node when first is down Java SDK	37	12173	September 18, 2017
Question for connection timeout Java SDK	3	2308	September 19, 2016

What determines a NodeDisconnectedEvent?

Related topics