Hi,
I’m running into an issue when a sudden spike in N1QL queries causes the query service on a Couchbase node to stop accepting any requests until I manually restart either the node or the service (pkill -9 -f /opt/couchbase/bin/cbq-engine
).
This is how the Query page looks like in the web console:
Note that the error is not transient, it does not fix by itself.
Do you have any idea on what could be the issue that query service does not automatically restart?
Couchbase version is Couchbase Server Community Edition 7.0.2 build 6703
Here are error logs:
[ns_server:error,2022-03-16T10:38:04.859Z,ns_1@172.17.0.2:<0.428.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_ep_bg_fetched%60%7D+or+kv_vb_curr_items%7Bstate%3D%27replica%27%7D+or+kv_vb_num_non_resident%7Bstate%3D%27active%27%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5s
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
{inet,[inet],econnrefused}]}
[ns_server:error,2022-03-16T10:38:04.860Z,ns_1@172.17.0.2:<0.433.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bcategory%3D%60system-processes%60%2Cname%3D~%60sysproc_mem_resident%7Csysproc_mem_size%7Csysproc_cpu_utilization%7Csysproc_major_faults_raw%60%7D&timeout=5s
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
{inet,[inet],econnrefused}]}
[ns_server:error,2022-03-16T10:38:04.860Z,ns_1@172.17.0.2:<0.436.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bcategory%3D%60system%60%2Cname%3D~%60sys_cpu_utilization_rate%7Csys_cpu_stolen_rate%7Csys_swap_total%7Csys_swap_used%7Csys_mem_total%7Csys_mem_free%7Csys_mem_limit%7Csys_cpu_cores_available%7Csys_allocstall%60%7D&timeout=5s
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
{inet,[inet],econnrefused}]}
[ns_server:error,2022-03-16T10:38:04.895Z,ns_1@172.17.0.2:ns_heart<0.382.0>:ns_heart:grab_one_service_status:409]Failed to grab service index status: {exit,
{noproc,
{gen_server,call,
['service_agent-index',get_status,
2000]}},
[{gen_server,call,3,
[{file,"gen_server.erl"},{line,223}]},
{ns_heart,grab_one_service_status,1,
[{file,"src/ns_heart.erl"},
{line,406}]},
{ns_heart,
'-grab_service_statuses/0-lc$^1/1-1-',
1,
[{file,"src/ns_heart.erl"},
{line,402}]},
{ns_heart,current_status_slow_inner,0,
[{file,"src/ns_heart.erl"},
{line,276}]},
{ns_heart,current_status_slow,1,
[{file,"src/ns_heart.erl"},
{line,235}]},
{ns_heart,update_current_status,1,
[{file,"src/ns_heart.erl"},
{line,172}]},
{ns_heart,handle_info,2,
[{file,"src/ns_heart.erl"},
{line,116}]},
{gen_server,try_dispatch,4,
[{file,"gen_server.erl"},
{line,637}]}]}
[ns_server:error,2022-03-16T10:38:04.930Z,ns_1@172.17.0.2:<0.450.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=cm_failover_safeness_level%7Bbucket%3D%60******************%60%7D%5B20s%5D&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:04.939Z,ns_1@172.17.0.2:<0.473.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=cm_failover_safeness_level%7Bbucket%3D%60contentful-testing%60%7D%5B20s%5D&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:05.051Z,ns_1@172.17.0.2:<0.510.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_ep_bg_fetched%60%7D+or+kv_vb_curr_items%7Bstate%3D%27replica%27%7D+or+kv_vb_num_non_resident%7Bstate%3D%27active%27%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:05.073Z,ns_1@172.17.0.2:<0.524.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bcategory%3D%60system-processes%60%2Cname%3D~%60sysproc_mem_resident%7Csysproc_mem_size%7Csysproc_cpu_utilization%7Csysproc_major_faults_raw%60%7D&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:05.087Z,ns_1@172.17.0.2:<0.533.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bcategory%3D%60system%60%2Cname%3D~%60sys_cpu_utilization_rate%7Csys_cpu_stolen_rate%7Csys_swap_total%7Csys_swap_used%7Csys_mem_total%7Csys_mem_free%7Csys_mem_limit%7Csys_cpu_cores_available%7Csys_allocstall%60%7D&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:05.151Z,ns_1@172.17.0.2:<0.553.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=cm_failover_safeness_level%7Bbucket%3D%60******************%60%7D%5B20s%5D&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:05.171Z,ns_1@172.17.0.2:<0.556.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=cm_failover_safeness_level%7Bbucket%3D%60contentful-testing%60%7D%5B20s%5D&timeout=5s
Reason: <<"Service Unavailable">>
[ns_server:error,2022-03-16T10:38:05.245Z,ns_1@172.17.0.2:service_status_keeper_worker<0.579.0>:rest_utils:get_json:57]Request to (indexer) getIndexStatus with headers [] failed: {error,
{econnrefused,
[{lhttpc_client,
send_request,
1,
[{file,
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
{line,220}]},
{lhttpc_client,
execute,9,
[{file,
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
{line,169}]},
{lhttpc_client,
request,9,
[{file,
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
{line,
93}]}]}}
[ns_server:error,2022-03-16T10:38:05.246Z,ns_1@172.17.0.2:service_status_keeper-index<0.580.0>:service_status_keeper:handle_cast:103]Service service_index returned incorrect status
Thanks, Jakub