I’m monitoring the NodeConnectedEvent and NodeDisconnectedEvents for a cluster in an attempt to log/alarm on potential issues from the java client.
I notice on initialization we will connect to a single node and it will discover the rest of the nodes in the cluster. However, if I start with one of those nodes down and bring it up sometime later, I don’t see a NodeConnectedEvent for that node, and the client doesn’t appear to attempt to connect to the now-up node.
Is this behavior design intent? Perhaps I have a configuration problem?
It appears new nodes introduced to the cluster do get connected but I still find no activity against the node brought up sometime after the client starts.
@jberglund hmm so you are saying you add and rebalance a node into the cluster after the connections have been established and it is not picking it up?
How does your cluster setup look like, which SDK version, what kind of workload are you running?
@daschl Couchbase 4.1, three node cluster (call them node155, node158, and node159), java sdk 2.2.8
I turn off node158 before starting the client, via /etc/init.d/couchbase-server stop
Then I start the client and wait for it to spin up, observing the system events as they come in through the environment event bus. I see the NodeConnectedEvent for node155 and the ConfigUpdatedEvent after it connects to the it.
Then it discovers that the other two nodes exist, but I only see the NodeConnectedEvent for node159, which I expected.
I wait a minute or two then start the other node158 with /etc/init.d/couchbase-server start
//10.17.4.158 brought up, no more SYSTEM events recieved
side note: it would be useful to use if the ConfigUpdatedEvent contained the cluster name that it is reporting the nodes for. We can configure one couchbase environment so only get the one event bus, but can configure multiple clusters to share the environment. When we receive the config updated event with a new list of nodes (like on a failover-rebalance) we have to map that back to the affected cluster.
Hm we currently don’t get a cluster name as part of the server config, since its “self contained”. Would you be able to do some kind of identification checking on the nodes that are part of the cluster?
Cluster in ConfigUpdatedEvent: We were thinking since its not in the event, of looking up each of the update list nodes in a cached list of the ClusterManager.rawInfo nodes until we get a hit, so we can detect from the client when, say, a node we saw disconnected event for is removed from a cluster (in other words, we dont care anymore and can lower an alarm) or a new node is added that we need to watch for, we can cache against the right cluster.
Some more detailed version info:
Couchbase Version: 4.1.0-5005 Enterprise Edition (build-5005) on CentOS release 6.7
couchbase-core-io-1.2.9.jar
couchbase-java-client-2.2.8.jar
rxjava-1.0.17.jar
I wonder how should the client be informed that the node is now available? Perhaps I can enable a log on the client?
Actually I think you hit a bug there, I’m currently investigating
During bootstrap we’re swallowing an error on the socket and the endpoint stays in CONNECTING all the time, as a result you don’t get the event. This logic is different from the one used when already connected and something happens during runtime, so as far as I can see its an issue isolated to bootstrap of the client.
I’m chasing another issue that I see intermittently that I wonder could be related to the same root cause, where I get a NodeConnectedEvent followed immediately by a NodeDisconnectedEvent within a ridiculously low amount of time, typically during the openBucket operation. I never see a NodeConnectedEvent for it again.
(my logs weren’t putting out the full event string, but this is our event bus subscriber consuming the NodeConnectedEvent)
2016.07.13 12:29:58:013 EDT
Couchbase Node Connected: 10.17.4.155
I actually have two more apps connecting to .155 at the same time, in this instance, only one of the apps saw this, the other two connected to 155 just fine.
Okay, so to give you some of the background: those events are sent out based on state transitions in the underlying components… is it by chance the node you bootstrap from in the list on the SDK?