How to stop the cluster safely

ksafonov · April 19, 2019, 10:03am

Hi,

We got a 4.5 cluster of about 50 nodes, it is quite full with data. We’re going to replace some core switches in our local network, so connectivity between all nodes will be disrupted for some minutes or hours. We’re afraid the cluster can become unusable, so we’d like to gracefully stop it before we break the network.

Is there a way to stop all nodes “at where they are” so they don’t try to elect new master, failover all others, rebalance etc?

Thanks,
Kirill

davef · April 19, 2019, 8:28pm

Hi Kirill

Best way is to:

Disable auto-failover to prevent a node getting accidentally failed over while you’re shutting down or restarting the nodes in the cluster. (Can be done in the UI or via the CLI https://docs.couchbase.com/server/4.5/rest-api/rest-cluster-autofailover-intro.html)
Shutdown each node. No problem doing this in parallel. https://docs.couchbase.com/server/4.5/install/startup-shutdown.html.

-dave

ksafonov · April 26, 2019, 5:50pm

Thanks Dave,

We did like you suggested and it generally worked, the cluster has booted up after shutdown and all data seems to be in place.

The only issue we got is some nodes did not terminate for about 20 minutes, they were rendered as “pend” in Web UI and “service couchbase-server status” was showing the service was still running. As our maintenance window was running out we had to force kill such nodes (kill -9 respective processes).

How quick does a node typically shut down (given there’s no requests coming in to it)?

Thanks,
Kirill

eldorado · January 2, 2020, 5:37am

is this server shutdown process applicable to stop and start node when Clusters are in K8s Pod’s ?
As I am getting bunch of errors while running this inside POD:

root@-cluster-0001:/# systemctl stop couchbase-server
Failed to connect to bus: No such file or directory

root@-cluster-0001:/# service couchbase-server stop
couchbase-server: unrecognized service
And I am on Ubuntu release:

cat /etc/os-release
NAME=“Ubuntu”
VERSION=“16.04.6 LTS (Xenial Xerus)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 16.04.6 LTS”
VERSION_ID=“16.04”
HOME_URL=“http://www.ubuntu.com/”
SUPPORT_URL=“http://help.ubuntu.com/”
BUG_REPORT_URL=“http://bugs.launchpad.net/ubuntu/”
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Topic		Replies	Views
Stop server didn't incur to failover Couchbase Server	0	1607	March 27, 2014
Checking if a Couchbase node is down Couchbase Server	1	2329	June 17, 2014
Auto failover on 1 of 4 nodes in a cluster - weird behaviour Couchbase Server	4	1450	May 25, 2017
One node crash will cause several minutes failure of total cluster Couchbase Server	3	2344	October 28, 2013
Automatic failover in an environment where any server could die at any time Couchbase Server	1	1280	April 27, 2017

How to stop the cluster safely

Related topics