I have a 4 node cluster running 4.0.0-4051 Community Edition (build-4051) each with 32GB RAM and 240GB SSD. I am using about 17% disk space without indexes and about 26GB ram per server is allocated to buckets, 28GB RAM is set for DATA and 1GB RAM for Index.
Each node is setup as Data, Index & Query.
I begun performing a new XDCR but started to get unstable nodes. I did a full restart of the cluster now it wont come back up at all, each node stays in amber state, or intermittently goes green.
I have switched off XDCR (which increased the frequency of the green nodes) as well as deleted all the production views.
I have tried restarting the cluster together and tried firing up a single machine, but with no luck.
I am not entirely sure what I am looking for in the error log, but here is a few lines which may be relevent
[ns_server:error,2015-10-13T10:57:16.769+02:00,ns_1@10.1.34.219:index_status_keeper_worker<0.385.0>:index_rest:get_json:45]Request to http://127.0.0.1:9102/getIndexStatus failed: {error,
{econnrefused,
[{lhttpc_client,
send_request,1,
[{file,
"/home/couchbase/jenkins/workspace/sherlock-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
{line,220}]},
{lhttpc_client,
execute,9,
[{file,
"/home/couchbase/jenkins/workspace/sherlock-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
{line,169}]},
{lhttpc_client,
request,9,
[{file,
"/home/couchbase/jenkins/workspace/sherlock-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
{line,92}]}]}}
=========================INFO REPORT=========================
{net_kernel,{'EXIT',<0.65.2>,shutdown}}
[error_logger:info,2015-10-13T11:01:57.047+02:00,ns_1@10.1.34.219:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================INFO REPORT=========================
{net_kernel,{'EXIT',<0.1798.2>,shutdown}}
[ns_server:debug,2015-10-13T11:02:38.024+02:00,ns_1@10.1.34.219:<0.1663.2>:janitor_agent:query_vbucket_states_loop:109]Exception from query_vbucket_states of "geoip":'ns_1@10.1.34.218'
{'EXIT',{{nodedown,'ns_1@10.1.34.218'},
{gen_server,call,
[{'janitor_agent-geoip','ns_1@10.1.34.218'},
query_vbucket_states,infinity]}}}
[ns_server:debug,2015-10-13T11:02:38.024+02:00,ns_1@10.1.34.219:<0.1663.2>:janitor_agent:query_vbucket_states_loop_next_step:114]Waiting for "geoip" on 'ns_1@10.1.34.218'
[error_logger:info,2015-10-13T11:02:38.024+02:00,ns_1@10.1.34.219:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
==> babysitter.log <==
memcached<0.76.0>: 2015-10-13T10:41:18.276859+02:00 WARNING (record) Engine warmup is complete, request to stop loading remaining database
Any help would be much appericiated