Afternoon,
We have just attempted to add a 6th node to our existing Couchbase cluster:
Cluster
5 server nodes
5 buckets (single replica)
~100 opsecs on 3 of these buckets
Version: 3.0.1 Community Edition (build-1444)
with mixed results. The first time we attempted the rebalance (at 00:05) it loaded both the ‘default’ and ‘cache’ buckets (stated in logs) but then immediately failed 2 seconds later:
- Bucket “cache” loaded on node ‘ns_1@node6.mydomain.com’ in 0 seconds.
- Bucket “default” loaded on node ‘ns_1@node6.mydomain.com’ in 0 seconds.
- Bucket “default” rebalance does not seem to be swap rebalance
Followed by:
<0.15466.6837> exited with {unexpected_exit,
{‘EXIT’,<0.26009.6833>,
{dcp_wait_for_data_move_failed,“default”,
215,
‘ns_1@node1.mydomain.com’,
[‘ns_1@node3.mydomain.com’,
‘ns_1@node6.mydomain.com’],
{error,no_stats_for_this_vbucket}}}}
Rebalance exited with reason {unexpected_exit,
{‘EXIT’,<0.26009.6833>,
{dcp_wait_for_data_move_failed,“default”,215,
‘ns_1@node1.mydomain.com’,
[‘ns_1@node3.mydomain.com’,
‘ns_1@node6.mydomain.com’],
{error,no_stats_for_this_vbucket}}}}
Further investigation into the error.log on node1 shows further information:
[> ns_server:error,2017-01-26T0:05:52.296,ns_1@node1.mydomain.com:<0.3538.6839>:dcp_replicator:wait_for_data_move_loop:134]No dcp backfill stats for bucket “default”, partition 215, connection “replication:ns_1@node1.mydomain.com->ns_1@node3.mydomain.com:default”
[ns_server:error,2017-01-26T0:05:52.299,ns_1@node1.mydomain.com:<0.15466.6837>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.26009.6833>,
[ns_server:error,2017-01-26T0:05:52.300,ns_1@node1.mydomain.com:<0.15466.6837>:misc:sync_shutdown_many_i_am_trapping_exits:1434]Shutdown of the following failed: [{<0.26009.6833>,
[ns_server:error,2017-01-26T0:05:52.300,ns_1@node1.mydomain.com:<0.15466.6837>:misc:try_with_maybe_ignorant_after:1470]Eating exception from ignorant after-block:
[rebalance:error,2017-01-26T0:05:52.431,ns_1@node1.mydomain.com:<0.22374.6838>:ns_vbucket_mover:handle_info:203]<0.15466.6837> exited with {unexpected_exit,
[ns_server:error,2017-01-26T0:05:52.434,ns_1@node1.mydomain.com:<0.1489.6816>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
[ns_server:error,2017-01-26T0:05:52.435,ns_1@node1.mydomain.com:<0.4179.6826>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
[ns_server:error,2017-01-26T0:05:52.436,ns_1@node1.mydomain.com:<0.25831.6827>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
[ns_server:error,2017-01-26T0:05:52.436,ns_1@node1.mydomain.com:<0.27215.6838>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
[ns_server:error,2017-01-26T0:05:52.436,ns_1@node1.mydomain.com:<0.21901.6834>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
[ns_server:error,2017-01-26T0:05:52.436,ns_1@node1.mydomain.com:<0.16343.6835>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
[ns_server:error,2017-01-26T0:05:52.435,ns_1@node1.mydomain.com:<0.27790.6835>:ns_single_vbucket_mover:spawn_and_wait:129]Got unexpected exit signal {‘EXIT’,<0.22374.6838>,
We attempted the rebalance another 5 times (compacting the default bucket in between) which kept generating the same error (with the same partition - 215) as above. However, on the last attempt, the following was recorded in the UI:
Updated bucket default (of type membase) properties:
[{num_replicas,1},
{ram_quota,8912896000},
{auth_type,sasl},
{autocompaction,false},
{purge_interval,undefined},
{flush_enabled,true},
{num_threads,3},
{eviction_policy,value_only}]
At which point the UI reported the ‘default’ (and another) bucket is now run by all 6 nodes. However, our other 3 buckets are still running on 5 nodes.
Any suggestions as to the cause of the “No dcp backfill stats for bucket” error? The UI stats a rebalance is still required - should we attempt another to transfer data from the other 3 buckets?