I am trying to upgrade an existing 2-node 4.6.4-4590 Enterprise Edition (build-4590) cluster to 5.0.1 using the rolling upgrade process. I failed over to a node 1, uninstall on node 2, install 5.0.1 on node 2 and join the cluster on node 1. Everything looks good until I try to re-balance - the process starts but then stops with the “Rebalance failed. See logs for detailed reason. You can try again.” message. The logs from the UI are below. I captured the other log files as well - let me know what additional data would be useful. I rebuilt the node as 4.6.4 and the re-balance worked fine.
Rebalance exited with reason {unexpected_exit,
{‘EXIT’,<0.7433.1>,
{{{child_interrupted,
{‘EXIT’,<0.6180.1>,socket_closed}},
[{dcp_replicator,spawn_and_wait,1,
[{file,“src/dcp_replicator.erl”},
{line,231}]},
{dcp_replicator,handle_call,3,
[{file,“src/dcp_replicator.erl”},
{line,109}]},
{gen_server,handle_msg,5,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/gen_server.erl”},
{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,
"c:/cygw …show
ns_orchestrator 000
ns_1@Prd-CCNode-02.sc.com
10:56:08 PM Sun Feb 11, 2018
<0.7058.1> exited with {unexpected_exit,
{‘EXIT’,<0.7433.1>,
{{{child_interrupted,
{‘EXIT’,<0.6180.1>,socket_closed}},
[{dcp_replicator,spawn_and_wait,1,
[{file,“src/dcp_replicator.erl”},{line,231}]},
{dcp_replicator,handle_call,3,
[{file,“src/dcp_replicator.erl”},{line,109}]},
{gen_server,handle_msg,5,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/gen_server.erl”},
{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/proc_lib.erl”},
{line,239}]}]},
{gen_server,call,
[{‘janitor_agent-sync_attachment’,
‘ns_1@Prd-CCNode-02.sc.com’},
{if_rebalance,<0.32138.0>,
{wait_index_updated,874}},
infinity]}}}} hide
ns_vbucket_mover 000
ns_1@Prd-CCNode-02.sc.com
10:56:03 PM Sun Feb 11, 2018
Haven’t heard from a higher priority node or a master, so I’m taking over.
mb_master 000
ns_1@Prd-CCNode-01.sc.com
10:56:02 PM Sun Feb 11, 2018
Bucket “sync_attachment” rebalance appears to be swap rebalance
ns_vbucket_mover 000
ns_1@Prd-CCNode-02.sc.com
10:54:43 PM Sun Feb 11, 2018
Started rebalancing bucket sync_attachment
ns_rebalancer 000
ns_1@Prd-CCNode-02.sc.com
10:54:43 PM Sun Feb 11, 2018
Starting rebalance, KeepNodes = [‘ns_1@Prd-CCNode-01.sc.com’,
‘ns_1@Prd-CCNode-02.sc.com’], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
ns_orchestrator 004
ns_1@Prd-CCNode-02.sc.com
10:54:43 PM Sun Feb 11, 2018
Rebalance exited with reason {unexpected_exit,
{‘EXIT’,<0.20069.0>,
{{{child_interrupted,
{‘EXIT’,<0.19595.0>,socket_closed}},
[{dcp_replicator,spawn_and_wait,1,
[{file,“src/dcp_replicator.erl”},
{line,231}]},
{dcp_replicator,handle_call,3,
[{file,“src/dcp_replicator.erl”},
{line,109}]},
{gen_server,handle_msg,5,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/gen_server.erl”},
{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/proc_lib.erl”},
{line,239}]}]},
{gen_server,call,
[{‘janitor_agent-sync_attachment’,
‘ns_1@Prd-CCNode-02.sc.com’},
{if_rebalance,<0.12388.0>,
{wait_index_updated,958}},
infinity]}}}} hide
ns_orchestrator 000
ns_1@Prd-CCNode-02.sc.com
10:50:28 PM Sun Feb 11, 2018
<0.19609.0> exited with {unexpected_exit,
{‘EXIT’,<0.20069.0>,
{{{child_interrupted,
{‘EXIT’,<0.19595.0>,socket_closed}},
[{dcp_replicator,spawn_and_wait,1,
[{file,“src/dcp_replicator.erl”},{line,231}]},
{dcp_replicator,handle_call,3,
[{file,“src/dcp_replicator.erl”},{line,109}]},
{gen_server,handle_msg,5,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/gen_server.erl”},
{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/proc_lib.erl”},
{line,239}]}]},
{gen_server,call,
[{‘janitor_agent-sync_attachment’,
‘ns_1@Prd-CCNode-02.sc.com’},
{if_rebalance,<0.12388.0>,
{wait_index_updated,958}},
infinity]}}}} hide
ns_vbucket_mover 000
ns_1@Prd-CCNode-02.sc.com
10:50:27 PM Sun Feb 11, 2018
Bucket “sync_attachment” rebalance appears to be swap rebalance
ns_vbucket_mover 000
ns_1@Prd-CCNode-02.sc.com
10:49:06 PM Sun Feb 11, 2018
Started rebalancing bucket sync_attachment
ns_rebalancer 000
ns_1@Prd-CCNode-02.sc.com
10:49:06 PM Sun Feb 11, 2018
Starting rebalance, KeepNodes = [‘ns_1@Prd-CCNode-01.sc.com’,
‘ns_1@Prd-CCNode-02.sc.com’], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
ns_orchestrator 004
ns_1@Prd-CCNode-02.sc.com
10:49:05 PM Sun Feb 11, 2018
Haven’t heard from a higher priority node or a master, so I’m taking over. (repeated 1 times)
mb_master 000
ns_1@Prd-CCNode-01.sc.com
10:48:58 PM Sun Feb 11, 2018
IP address seems to have changed. Unable to listen on ‘ns_1@Prd-CCNode-02.sc.com’. (POSIX error code: ‘nxdomain’)
menelaus_web_alerts_srv 000
ns_1@Prd-CCNode-02.sc.com
10:48:12 PM Sun Feb 11, 2018
Client-side error-report for user undefined on node ‘ns_1@Prd-CCNode-02.sc.com’:
User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Got unhandled javascript error:
message: The transition errored;
menelaus_web 102
ns_1@Prd-CCNode-02.sc.com
10:48:12 PM Sun Feb 11, 2018
Rebalance exited with reason {unexpected_exit,
{‘EXIT’,<0.9777.0>,
{{{child_interrupted,
{‘EXIT’,<0.9154.0>,socket_closed}},
[{dcp_replicator,spawn_and_wait,1,
[{file,“src/dcp_replicator.erl”},
{line,231}]},
{dcp_replicator,handle_call,3,
[{file,“src/dcp_replicator.erl”},
{line,109}]},
{gen_server,handle_msg,5,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/gen_server.erl”},
{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,
"c:/cygw …show
ns_orchestrator 000
ns_1@Prd-CCNode-02.sc.com
10:48:12 PM Sun Feb 11, 2018
<0.9628.0> exited with {unexpected_exit,
{‘EXIT’,<0.9777.0>,
{{{child_interrupted,
{‘EXIT’,<0.9154.0>,socket_closed}},
[{dcp_replicator,spawn_and_wait,1,
[{file,“src/dcp_replicator.erl”},{line,231}]},
{dcp_replicator,handle_call,3,
[{file,“src/dcp_replicator.erl”},{line,109}]},
{gen_server,handle_msg,5,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/gen_server.erl”},
{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,
“c:/cygwin64/home/vagrant/OTP_SR~1/lib/stdlib/src/proc_lib.erl”},
{line,239}]}]},
{gen_server,call,
…show
ns_vbucket_mover 000
ns_1@Prd-CCNode-02.sc.com
10:48:12 PM Sun Feb 11, 2018
Haven’t heard from a higher priority node or a master, so I’m taking over.
mb_master 000
ns_1@Prd-CCNode-01.sc.com
10:47:44 PM Sun Feb 11, 2018
Bucket “sync_attachment” rebalance appears to be swap rebalance
ns_vbucket_mover 000
ns_1@Prd-CCNode-02.sc.com
10:47:30 PM Sun Feb 11, 2018
Started rebalancing bucket sync_attachment
ns_rebalancer 000
ns_1@Prd-CCNode-02.sc.com
10:47:30 PM Sun Feb 11, 2018