We have an internal testing environment for couchbase of 1 cluster containing 3 server nodes. This morning when I checked out the couchbase console, 1 node was stuck on pending and another was down. I checked the two machines in question and they were doing fine. So I checked the logs and all it mentioned was something like: 'Shutting down bucket “XXXXX” on
‘ns_1@xxx.xxx.xxx.xxx’ for server shutdown’. So I failed that node over and immediatly the pending node came back up and running perfectly.
I figured it might have been something like a scheduled task that hogged too much memory and had the ‘down’ node crash or something so I just went ahead and rebalanced the cluster and then tried to re-add that 3rd node but all I’m getting is an:
Attention - Prepare join failed. Could not connect to “xxx.xxx.xxx.xxx” on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers.
All firewalls are deactivated on all 3 machines internally to avoid those issues - this has never happened before. I’m unsure if I should try uninstalling couchbase and then reinstalling it instead.
one thing you can do is try to ping the new name/ip you are adding from the existing nodes and make sure all traffic and name resolution makes it through.
-cihan
There are no issues pinging the ip from one of the other node machines - the whole issue seems only to be with couchbase as I have test websites and a bunch of other stuff, all running fine on the machine
A few weeks back I thought I updated to Couchbase 4.0rc (specifically on the current problematic node) - and through some hocus pocus thought it was all updated.
I’m now suspecting I was wrong, and that when the node failed, it attempted to restart as 4.0rc (and now 4.1) while the rest of the cluster(nodes) are still in 4.0dev.
Would I be correct in assuming this would cause the problem I am having, and would there be a way to update both remaining nodes without losing data? (Worse case scenario I know I can just do a backup, but recopying all the views is a 10 min hassle. Aren’t we all lazy?)
This still however does not explain why I can’t reach the couchbase console from the problematic node itself (it isnt in the cluster at the moment so shouldnt it let me reach the couchbase setup page?)
UPDATE::
So I ended up doing a complete uninstall of couchbase, rebooting, reinstalling, and now it seems to be properly set up. When I try to add the node I get this error:
Most likely because it has couchbase 4.1 when the other nodes existing already on the cluster are 4.0dev . So this comes back to, how can I update the cluster to 4.1 without losing the data (other than doing a backup)
Fully uninstalling couchbase everywhere and doing a fresh installation + cbrestore had everything work in the end. Not a pleasant solution for sure.