Rebalance/Architecture

fischer7047 · January 17, 2014, 7:24pm

This is an admittedly general question, but looking for some advice. We have a three node couchbase 2.1 community cluster. We have big memory nodes and big buckets. Each node is 192GB and our largest bucket is ~130 million keys and takes up about 100GB. We have never successfully failed over a node our rebalanced the cluster with the big bucket. We recently rebooted a node and that bucket can’t warmup. It tries for 10 minutes or so, status shows it loading the keys, and then it throws a bunch of these:

Control connection to memcached on ‘ns_1@cclnxcouch1.pfizer.com’ disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
stats_recv,
4},
{mc_client_binary,
stats,
4},
{ns_memcached,
has_started,
1},
{ns_memcached,
handle_info,
2},
{gen_server,
handle_msg,
5},
{ns_memcached,
init,
1},
{gen_server,
init_it,
6},
{proc_lib,
init_p_do_apply,
3}]}

and starts the warmup process over again. So the node appears to be forever stuck in pend. Aside from this specific problem, we’ve just found that for the large buckets we have a system that works very well UNTIL anything untoward happens. Is our mistake the big memory nodes? The nodes are local gigabit interconnect, is that insufficient to support the cluster? Couch is running on raid-ed SSD’s, we seem to do pretty well i/o wise. Any suggestions on how to make rebalance functional?

Thanks!

househippo · January 20, 2014, 7:27am

You are likely having a access log issue.
You can see it when look into the cbstats like this.

/opt/couchbase/bin/cbstats localhost:11210 warmup -b bucketname -u Administrator -p bucket_password

When you run the command there you will see Estimated time to load keys ,how many keys are loaded and more. Run the command every few minutes for about 15 minutes. if the data does not change or goes up and down then CB is having a problem with access logs.

Solution:
shut down the whole cluster. go to the data folder of CB /data/name_of_bucket … there you should see access.log and access.log.old … delete these files in all the nodes.
Restart the cluster and it should be back up.

Topic		Replies	Views
Rebalancing is taking lot of time on couchbase server (several days) Couchbase Server	3	3883	February 8, 2023
CB Server 5.1.1 rebalance has made my server unusable Couchbase Server	5	1132	January 30, 2019
Rebalancing taking lot of time and never completes Couchbase Server	4	2052	May 31, 2017
Continous Rebalance Failure, Memcached taking Very High CPU Couchbase Server	8	5945	January 21, 2016
Failure during rebalance Couchbase Server	5	5704	July 2, 2013

Rebalance/Architecture

Related topics