Hi all,
I’ve discovered an issue with Couchbase where by the .NET client is constantly reporting the error ‘Unable to locate node’. This occurs after some time of my application working fine, and appears to correlate perfectly when the Couchbase server I’m connecting to begins a ‘Rebalance’. There are 2 Couchbase servers in the cluster that I’m connecting to.
The problem does not resolve until I recycle the IIS application pool to restart the application.
Can anyone suggest what might be causing this issue?
Thanks.
MartinBlore -
During a rebalance, it’s not uncommon to get to temporarily get error messages like “NotMyVBucket” and possibly, very occasionally “Unable to locate node”, which is really just the client being in a transition state between config updates. However, once a rebalance is completed, the client should continue to perform operations without errors.
Can you provide a little more information on your environment and perhaps an example project that we can run tests against? Also, enabling logging with a file appender set to DEBUG level while the client is transition between states and attaching it to a jira ticket would be helpful. Be warned, the file will get huge quickly, so you only want to do this for a short amount of time.
http://docs.couchbase.com/couchbase-sdk-net-1.3/#configuring-logging
You can create a jira ticket here:
http://www.couchbase.com/issues/browse/NCBC
Thanks,
Jeff
I just experienced same error message when using ExecuteGet method.
Turned out entering the missing password solved it for me.
Don’t think the message is very helpful if that indeed is the root cause.
Hey,
I work with martin and i just want to clarify what your saying Jeff, can you confirm that if the client cannot locate the node where the data is stored (failover) and a rebalance is in oiperation that the client will continue to throw the “unable to locate node” error?
in our test scenarios here with a 4 node cluster it is taking up to 10 minutes to rebalance when we drop a node, I understand that this is dependant on the volume of data. but for a client not to get to the data for this length of time seem a little strange to me?
/Carl
CarlLambert -
What I am saying is that this should be a temporary state. Internally, the client will retry the operations on the other nodes.
You may see the message in the logs, but the actual result of the operation more often than not should be successful. In other words, an operation may internally temporarily fail, and this will be logged, but should succeed in subsequent retries. In the end, the IOperationResult.Success should be true.
“in our test scenarios here with a 4 node cluster it is taking up to 10 minutes to rebalance when we drop a node, I understand that this is dependant on the volume of data. but for a client not to get to the data for this length of time seem a little strange to me?”
No, the client should be successfully performing operations at this time, albeit at a lower rate of throughput and you’ll definitely see some warn or even error messages in your logs as the client adjusts to the new topology.
What version of the client and server are you using?
Note, server versions 2.5 and later have a feature called CCCP, which mitigates much of the problems associated with swap/rebalance on the clients. Unfortunately, the 1.3.X client will not support it, but the forthcoming 2.X client will.
-Jeff