Index node keep down every day

lsianturi · February 7, 2020, 8:51am

Dear all,
We have Couchbase server with 4 node data, 4 node index, 2 node query. The thing is one node of index is keep down several times a day and we could not find out the cause by looking at the log.

Here I copy part of error log, sorry for too long post.

[ns_server:error,2020-02-07T00:12:06.584Z,ns_1@cbserver01.intranet.asia:<0.20976.1693>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:12:31.644Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”], those of them are active [“Analytics”]
[ns_server:error,2020-02-07T00:12:41.646Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“app_data”], those of them are active [“app_data”]
[ns_server:error,2020-02-07T00:12:46.652Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Attachment”], those of them are active [“Attachment”]
[ns_server:error,2020-02-07T00:13:01.663Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“app_data”], those of them are active [“app_data”]
[ns_server:error,2020-02-07T00:13:11.621Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”], those of them are active [“Analytics”]
[ns_server:error,2020-02-07T00:14:06.621Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”], those of them are active [“Analytics”]
[ns_server:error,2020-02-07T00:14:26.643Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“MiCare”,
“app_data”], those of them are active [“MiCare”,
“app_data”]
[ns_server:error,2020-02-07T00:15:06.646Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Larangan”,
“PRIOTIPS”], those of them are active [“Larangan”,
“PRIOTIPS”]
[ns_server:error,2020-02-07T00:15:46.629Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“app_data”], those of them are active [“app_data”]
[ns_server:error,2020-02-07T00:16:05.214Z,ns_1@cbserver01.intranet.asia:<0.3123.1697>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:16:26.621Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“MiCare”], those of them are active [“MiCare”]
[ns_server:error,2020-02-07T00:16:41.621Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“app_data”], those of them are active [“Analytics”,
“app_data”]
[ns_server:error,2020-02-07T00:16:45.492Z,ns_1@cbserver01.intranet.asia:<0.14364.1697>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:16:46.665Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Leads”,
“MiCare”,
“eventing_metadata”], those of them are active [“Leads”,
“MiCare”,
“eventing_metadata”]
[ns_server:error,2020-02-07T00:16:51.688Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“content”], those of them are active [“content”]
[ns_server:error,2020-02-07T00:16:56.754Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“Attachment”,
“PRIOTIPS”,
“MiCare”,
“result”], those of them are active [“Analytics”,
“Attachment”,
“PRIOTIPS”,
“MiCare”,
“result”]
[ns_server:error,2020-02-07T00:17:01.701Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“entityState”], those of them are active [“entityState”]
[ns_server:error,2020-02-07T00:17:04.123Z,ns_1@cbserver01.intranet.asia:<0.6106.1697>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:17:16.623Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“content_public”], those of them are active [“content_public”]
[ns_server:error,2020-02-07T00:17:21.643Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“PRIOTIPS”,
“app_data”,
“result”], those of them are active [“PRIOTIPS”,
“app_data”,
“result”]
[ns_server:error,2020-02-07T00:17:26.656Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“MiCare”,
“entityState”], those of them are active [“MiCare”,
“entityState”]
[ns_server:error,2020-02-07T00:17:31.661Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“content_public”], those of them are active [“content_public”]
[ns_server:error,2020-02-07T00:17:36.669Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“Leads”,
“MiCare”,
“app_data”], those of them are active [“Analytics”,
“Leads”,
“MiCare”,
“app_data”]
[ns_server:error,2020-02-07T00:17:41.675Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Attachment”,
“PRIOTIPS”], those of them are active [“Attachment”,
“PRIOTIPS”]
[ns_server:error,2020-02-07T00:17:51.292Z,ns_1@cbserver01.intranet.asia:<0.16135.1696>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:17:51.689Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Attachment”,
“eventing_metadata”], those of them are active [“Attachment”,
“eventing_metadata”]
[ns_server:error,2020-02-07T00:17:56.694Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“PRIOTIPS”,
“app_data”], those of them are active [“Analytics”,
“PRIOTIPS”,
“app_data”]
[ns_server:error,2020-02-07T00:18:06.069Z,ns_1@cbserver01.intranet.asia:<0.11599.1698>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:18:06.623Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“Leads”,
“PRIOTIPS”,
“MiCare”,
“app_data”,
“eventing_metadata”,
“result”], those of them are active [“Analytics”,
“Leads”,
“PRIOTIPS”,
“MiCare”,
“app_data”,
“eventing_metadata”,
“result”]
[ns_server:error,2020-02-07T00:18:11.099Z,ns_1@cbserver01.intranet.asia:<0.4819.1698>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:18:11.628Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“content_public”], those of them are active [“content_public”]
[ns_server:error,2020-02-07T00:18:16.634Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“content”,
“entityState”], those of them are active [“content”,
“entityState”]
[ns_server:error,2020-02-07T00:18:21.640Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“eventing_metadata”], those of them are active [“eventing_metadata”]
[ns_server:error,2020-02-07T00:18:23.491Z,ns_1@cbserver01.intranet.asia:<0.8444.1698>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:18:26.646Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“Attachment”,
“dataRT”,
“entityState”,
“result”], those of them are active [“Analytics”,
“Attachment”,
“dataRT”,
“entityState”,
“result”]
[ns_server:error,2020-02-07T00:18:31.652Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Larangan”], those of them are active [“Larangan”]
[ns_server:error,2020-02-07T00:18:36.659Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“result”], those of them are active [“result”]
[ns_server:error,2020-02-07T00:18:37.202Z,ns_1@cbserver01.intranet.asia:<0.3202.1698>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:18:46.667Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“Analytics”,
“entityState”], those of them are active [“Analytics”,
“entityState”]
[ns_server:error,2020-02-07T00:18:56.678Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“content_public”], those of them are active [“content_public”]
[ns_server:error,2020-02-07T00:19:05.109Z,ns_1@cbserver01.intranet.asia:<0.23642.1698>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:19:06.686Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“entityState”], those of them are active [“entityState”]
[ns_server:error,2020-02-07T00:19:16.125Z,ns_1@cbserver01.intranet.asia:<0.2410.1698>:janitor_agent:query_states_details:200]Failed to query vbucket states from some nodes:
[{‘ns_1@cbserver10.intranet.asia’,timeout}]
[ns_server:error,2020-02-07T00:19:21.620Z,ns_1@cbserver01.intranet.asia:ns_doctor<0.559.0>:ns_doctor:update_status:316]The following buckets became not ready on node ‘ns_1@cbserver10.intranet.asia’: [“MiCare”,
“app_data”], those of them are active [“MiCare”,
“app_data”]
[ns_server:error,2020-02-07T04:26:03.273Z,ns_1@cbserver01.intranet.asia:<0.11527.1808>:ns_doctor:wait_statuses_loop:267]Couldn’t get statuses for [‘ns_1@cbserver09.intranet.asia’]
[ns_server:error,2020-02-07T04:26:03.273Z,ns_1@cbserver01.intranet.asia:<0.14228.1810>:menelaus_web:loop:143]Server error during processing: [“web request failed”,
{path,"/pools/default"},
{method,‘POST’},
{type,error},
{what,
{badmatch,
{error,
{timeout,
[‘ns_1@cbserver09.intranet.asia’]}}}},
{trace,
[{menelaus_web_pools,
do_validate_memory_quota,3,
[{file,“src/menelaus_web_pools.erl”},
{line,382}]},
{lists,foldl,3,
[{file,“lists.erl”},{line,1248}]},
{validator,handle,4,
[{file,“src/validator.erl”},{line,56}]},
{menelaus_web_pools,
do_handle_pool_settings_post_loop,2,
[{file,“src/menelaus_web_pools.erl”},
{line,425}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,121}]},
{mochiweb_http,headers,5,
[{file,
“/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]
[ns_server:error,2020-02-07T04:26:10.498Z,ns_1@cbserver01.intranet.asia:<0.27698.1812>:ns_doctor:wait_statuses_loop:267]Couldn’t get statuses for [‘ns_1@cbserver09.intranet.asia’]
[ns_server:error,2020-02-07T04:26:10.498Z,ns_1@cbserver01.intranet.asia:<0.2866.1792>:menelaus_web:loop:143]Server error during processing: [“web request failed”,
{path,"/pools/default"},
{method,‘POST’},
{type,error},
{what,
{badmatch,
{error,
{timeout,
[‘ns_1@cbserver09.intranet.asia’]}}}},
{trace,
[{menelaus_web_pools,
do_validate_memory_quota,3,
[{file,“src/menelaus_web_pools.erl”},
{line,382}]},
{lists,foldl,3,
[{file,“lists.erl”},{line,1248}]},
{validator,handle,4,
[{file,“src/validator.erl”},{line,56}]},
{menelaus_web_pools,
do_handle_pool_settings_post_loop,2,
[{file,“src/menelaus_web_pools.erl”},
{line,425}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,121}]},
{mochiweb_http,headers,5,
[{file,
“/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]
[ns_server:error,2020-02-07T04:26:10.571Z,ns_1@cbserver01.intranet.asia:<0.11963.1809>:ns_doctor:wait_statuses_loop:267]Couldn’t get statuses for [‘ns_1@cbserver09.intranet.asia’]
[ns_server:error,2020-02-07T04:26:10.571Z,ns_1@cbserver01.intranet.asia:<0.22018.1812>:menelaus_web:loop:143]Server error during processing: [“web request failed”,
{path,"/pools/default"},
{method,‘POST’},
{type,error},
{what,
{badmatch,
{error,
{timeout,
[‘ns_1@cbserver09.intranet.asia’]}}}},
{trace,
[{menelaus_web_pools,
do_validate_memory_quota,3,
[{file,“src/menelaus_web_pools.erl”},
{line,382}]},
{lists,foldl,3,
[{file,“lists.erl”},{line,1248}]},
{validator,handle,4,
[{file,“src/validator.erl”},{line,56}]},
{menelaus_web_pools,
do_handle_pool_settings_post_loop,2,
[{file,“src/menelaus_web_pools.erl”},
{line,425}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,121}]},
{mochiweb_http,headers,5,
[{file,
“/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]
[ns_server:error,2020-02-07T07:10:34.736Z,ns_1@cbserver01.intranet.asia:<0.29696.1891>:ns_doctor:wait_statuses_loop:267]Couldn’t get statuses for [‘ns_1@cbserver09.intranet.asia’]
[ns_server:error,2020-02-07T07:10:34.736Z,ns_1@cbserver01.intranet.asia:<0.32647.1891>:menelaus_web:loop:143]Server error during processing: [“web request failed”,
{path,"/pools/default"},
{method,‘POST’},
{type,error},
{what,
{badmatch,
{error,
{timeout,
[‘ns_1@cbserver09.intranet.asia’]}}}},
{trace,
[{menelaus_web_pools,
do_validate_memory_quota,3,
[{file,“src/menelaus_web_pools.erl”},
{line,382}]},
{lists,foldl,3,
[{file,“lists.erl”},{line,1248}]},
{validator,handle,4,
[{file,“src/validator.erl”},{line,56}]},
{menelaus_web_pools,
do_handle_pool_settings_post_loop,2,
[{file,“src/menelaus_web_pools.erl”},
{line,425}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,121}]},
{mochiweb_http,headers,5,
[{file,
“/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]

amit.kulkarni · February 7, 2020, 9:04am

Hi @lsianturi,

These are log messages from ns_server. I don’t see any indexer specific errors in these messages.

If indexer process is crashing intermittently, you should look into indexer.log for evidence. If you want, you can attach indexer.log file here. Also, lookout for errors (if any) in starting the indexer process in babysitter.log file.

By the way, which version of couchbase server you are using? Also, is it a community edition or enterprise edition?

lsianturi · February 7, 2020, 10:17am

Hi Amit,
This we are using Enterprise Edition 6.0.1 build 2037
I could not attach file due to new user restriction by this forum, so I copy part of message from log file.
In the indexer.log there is no such message starts with ‘indexer’

This is from indexer.log
61040,“needs_restart”:false,“num_connections”:13,“num_cpu_core”:36,“storage_mode”:“plasma”,“timings/stats_response”:“12438 62598948789 392817285050905293”,“uptime”:“2h17m3.451900149s”}
2020-02-07T09:32:31.937+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:32.978+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:34.405+00:00 [Info] Analytics/9-idx-AGNT_HIERARCHY-01/Mainstore#13868041856593789922:0 Plasma: mvccPurger: starting… purge_ratio 32.14593
2020-02-07T09:32:34.621+00:00 [Info] Analytics/9-idx-AGNT_HIERARCHY-01/Mainstore#13868041856593789922:0 Plasma: mvccPurger: finished… purge_ratio 32.14593
2020-02-07T09:32:35.165+00:00 [Info] Analytics/9-idx-AGNT_HIERARCHY-02/Mainstore#8407727724715604004:0 Plasma: mvccPurger: starting… purge_ratio 21.86638
2020-02-07T09:32:35.285+00:00 [Info] Analytics/9-idx-AGNT_HIERARCHY-02/Mainstore#8407727724715604004:0 Plasma: mvccPurger: finished… purge_ratio 21.86638
2020-02-07T09:32:36.325+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:37.978+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:40.779+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:41.131+00:00 [Info] Analytics/9-Analytics-idx-ZAPDPROCESSED-02/Mainstore#5483939542977177012:0 Plasma: mvccPurger: starting… purge_ratio 25.86732
2020-02-07T09:32:41.203+00:00 [Info] Analytics/9-Analytics-idx-ZAPDPROCESSED-02/Mainstore#5483939542977177012:0 Plasma: mvccPurger: finished… purge_ratio 25.86732
2020-02-07T09:32:41.667+00:00 [Info] Analytics/9-Analytics-idx-SALES_DETAIL-reconNewBasis-1_1/Mainstore#8885836278266010370:0 Plasma: mvccPurger: starting… purge_ratio 22.17085
2020-02-07T09:32:41.735+00:00 [Info] Analytics/9-Analytics-idx-SALES_DETAIL-reconNewBasis-1_1/Mainstore#8885836278266010370:0 Plasma: mvccPurger: finished… purge_ratio 22.17085
2020-02-07T09:32:42.191+00:00 [Info] Analytics/9-idx-AGNT_HIERARCHY-03/Mainstore#16210437619926211527:0 Plasma: mvccPurger: starting… purge_ratio 20.75649
2020-02-07T09:32:42.238+00:00 [Info] Analytics/9-idx-AGNT_HIERARCHY-03/Mainstore#16210437619926211527:0 Plasma: mvccPurger: finished… purge_ratio 20.75649
2020-02-07T09:32:42.630+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.130.6.1:46458. Error = Validate packet header: Invalid size 15861125963632
47616. Kill Pipe.
2020-02-07T09:32:42.630+00:00 [Error] LeaderSyncProxy.updateAcceptEpochAfterQuorum(): Error encountered = Server Error : SyncProxy.listen(): channel closed. Terminate
2020-02-07T09:32:42.630+00:00 [Error] LeaderServer:startProxy(): Leader Fail to synchronization with follower (TCP conn = 10.130.6.1:46458)
2020-02-07T09:32:42.941+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:43.129+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 10.130.4.1:57162. Error = Validate packet header: Invalid size 15861125963632
47616. Kill Pipe.
2020-02-07T09:32:43.129+00:00 [Error] LeaderSyncProxy.updateAcceptEpochAfterQuorum(): Error encountered = Server Error : SyncProxy.listen(): channel closed. Terminate
2020-02-07T09:32:43.129+00:00 [Error] LeaderServer:startProxy(): Leader Fail to synchronization with follower (TCP conn = 10.130.4.1:57162)
2020-02-07T09:32:43.487+00:00 [Info] Analytics/9-idx-OFFC_HIERARCHY-01_1/Mainstore#12038404776073276836:0 Plasma: mvccPurger: starting… purge_ratio 20.63797
2020-02-07T09:32:43.492+00:00 [Info] Analytics/9-idx-OFFC_HIERARCHY-01_1/Mainstore#12038404776073276836:0 Plasma: mvccPurger: finished… purge_ratio 20.63797
2020-02-07T09:32:44.049+00:00 [Info] Analytics/9-Analytics-idx-SALES_DETAIL-LEADS_1/Mainstore#9002130954110231437:0 Plasma: mvccPurger: starting… purge_ratio 57.00000
2020-02-07T09:32:44.049+00:00 [Info] Analytics/9-Analytics-idx-SALES_DETAIL-LEADS_1/Mainstore#9002130954110231437:0 Plasma: mvccPurger: finished… purge_ratio 57.00000
2020-02-07T09:32:44.575+00:00 [Info] Analytics/9-Analytics-idx-SALES_DETAIL-05_1/Mainstore#264424942602215357:0 Plasma: mvccPurger: starting… purge_ratio 28.54803
2020-02-07T09:32:44.650+00:00 [Info] Analytics/9-Analytics-idx-SALES_DETAIL-05_1/Mainstore#264424942602215357:0 Plasma: mvccPurger: finished… purge_ratio 28.54803
2020-02-07T09:32:45.639+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification
2020-02-07T09:32:47.470+00:00 [Info] Analytics/9-Analytics-idx-ZAPDRERUN-01_1/Backstore#10001902263528051391:0 Plasma: mvccPurger: starting… purge_ratio 1113.75000
2020-02-07T09:32:47.472+00:00 [Info] Analytics/9-Analytics-idx-ZAPDRERUN-01_1/Backstore#10001902263528051391:0 Plasma: mvccPurger: finished… purge_ratio 1113.75000
2020-02-07T09:32:47.481+00:00 [Info] Analytics/9-Analytics-idx-ZAPDRERUN-01_1/Mainstore#10001902263528051391:0 Plasma: mvccPurger: starting… purge_ratio 1580.60714
2020-02-07T09:32:47.481+00:00 [Info] Analytics/9-Analytics-idx-ZAPDRERUN-01_1/Mainstore#10001902263528051391:0 Plasma: mvccPurger: finished… purge_ratio 1580.60714
2020-02-07T09:32:47.596+00:00 [Info] janitor: running cleanup.
2020-02-07T09:32:47.629+00:00 [Info] ServiceMgr::GetCurrentTopology
2020-02-07T09:32:47.629+00:00 [Info] ServiceMgr::GetCurrentTopology returns &{[0 0 0 0 0 0 0 3] [8f3cdd268933a076781632041131e2f9 33b49abbda86aad2a296561648b23e87 a056c5b9e7ac40fdf032d664
d46b67fa bfc2712d0ea598cf62291db06e6b04ed] true }
2020-02-07T09:32:47.629+00:00 [Info] ServiceMgr::GetTaskList
2020-02-07T09:32:47.629+00:00 [Info] ServiceMgr::GetTaskList returns &{[0 0 0 0 0 0 0 3] }
2020-02-07T09:32:47.631+00:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 3]
2020-02-07T09:32:47.631+00:00 [Info] ServiceMgr::GetTaskList [0 0 0 0 0 0 0 3]
2020-02-07T09:32:47.723+00:00 [Info] memstats {“Alloc”:370401608, “TotalAlloc”:189521079736, “Sys”:877415128, “Lookups”:10057, “Mallocs”:1615041772,“Frees”:1612486966, “HeapAlloc”:3704016
08, “HeapSys”:808058880, “HeapIdle”:410959872, “HeapInuse”:397099008,“HeapReleased”:70991872, “HeapObjects”:2554806,“GCSys”:31748096, “LastGC”:1581067913282327413,“PauseTotalNs”:130506470
3, “PauseNs”:[1300078, 1966144, 1719426, 2474290, 1903199, 2505512, 2595378, 1560888], “NumGC”:792}
2020-02-07T09:32:47.860+00:00 [Info] serviceChangeNotifier: received PoolChangeNotification

amit.kulkarni · February 7, 2020, 10:34am

@lsianturi,

Thanks for the logs. With respect to indexing service, what kind of downtime are you observing? Are you seeing index creation failure? or index scan errors?

In couchbase server version 6.0.1, there was a known bug MB-34003. By default couchbase server can recover from network fluctuations. But in 6.0.1, due to aforementioned bug, sometimes, couchbase server does not recover after network changes/fluctuations.

Restart of indexer service should solve this problem temporarily, but if network fluctuations happen again, server can end up in this state again.

Please upgrade to latest 6.0.x release to avoid index service downtime.

Hope this solves your problem.

lsianturi · February 12, 2020, 8:44am

Hi Amit,
thanks for pointing the cause of the problem. What happened is the kernel need to be upgraded and after upgrading the kernel the index node is now stable.

Regards
Lambok

Topic		Replies	Views
Node went down issue Couchbase Server connections	1	1250	October 10, 2019
CouchBase Node Goes down frequently Couchbase Server	4	2541	February 21, 2017
Server keep going down when rebalancing the cluster! Couchbase Server	8	4482	July 23, 2015
Couchbase 4.0 nodes keep FAILING Couchbase Server	2	1590	February 11, 2016
Production patching : rolling mode : can I just stop couchbase on my index/query node? Couchbase Server index	3	1320	November 9, 2018

Index node keep down every day

Related topics