Hi, we have a Community Couchbase 6.5.1 cluster and while running cbbackup. Recently “Internal server error, please retry your request” messages started appearing in the backup process.
13/10/2021 00:49:42 /[...]/cbbackup couchbase://localhost:8091 /[...]/backup_location -u backup_user -p backup_password -t 6 -m diff
[####################] 100.0% (1267/estimated 1267 msgs)
bucket: bucket-00-1, msgs transferred...
: total | last | per sec
byte : 1155738291 | 1155738291 | 12492257.8
2021-10-13 00:51:25,648: mt ['Internal server error, please retry your request']
[####################] 100.0% (1202/estimated 1202 msgs)
bucket: bucket-00-2, msgs transferred...
: total | last | per sec
byte : 2060126397 | 2060126397 | 31824927.9
2021-10-13 00:52:40,540: mt ['Internal server error, please retry your request']
[####################] 100.0% (1808/estimated 1808 msgs))
bucket: bucket-00-3, msgs transferred...
: total | last | per sec
byte : 3038797690 | 3038797690 | 39012470.3
2021-10-13 00:54:08,575: mt ['Internal server error, please retry your request']
[####################] 100.0% (8169/estimated 8169 msgs)
What can be the consequence of those messages? Is there any way on avoid having them?
Thanks in advance and kind regards.
I have found some errors on the logs, can any of them be the cause of these issues?
/[...]/logs/info.log:[ns_server:error,2021-10-15T00:00:03.109Z,ns_1@couchbase-001.domain:service_status_keeper_worker<0.432.0>:rest_utils:get_json:62]Request to (indexer) getIndexStatus failed: {error,timeout}
/[...]/logs/indexer.log:2021-10-15T01:32:57.074+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 9.9.9.18:9100. Error = read tcp 9.9.9.17:54602->9.9.9.18:9100: use of closed network connection. Kill Pipe.
/[...]/logs/indexer.log:2021-10-15T02:13:01.122+00:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 9.9.9.17:35670. Error = EOF. Kill Pipe.
/[...]/logs/info.log:[ns_server:error,2021-10-15T02:13:42.971Z,ns_1@couchbase-001.domain:service_status_keeper-index<0.433.0>:service_status_keeper:handle_cast:119]Service service_index returned incorrect status
These error messages repeat throughout the logs. Hostnames and IP addresses were changed to protect the environment.
Hi @dopessoa,
That error message indicates that a REST request dispatched by cbbackup
received a 500 status code. To properly debug this issue we’ll need to see the logs.
Please could you provide a log collection (collected via the WebUI under Logs → Collect Information). Along with this, please could you re-run cbbackup
whilst supplying -vvv
(this will enable verbose debug logging) and provide that information as well.
Please feel free to use log-redaction (provided in the WebUI).
Thanks,
James
@dopessoa
The ClusterManager calls getIndexStatus REST API on every Index node every 5 seconds. The fact that this is timing out might indicate either some kind of network problem or a problem with an Index node becoming overloaded or otherwise unresponsive.
Unfortunately the “Kill Pipe” messages are generally not very diagnostic, as there are many places in the code that log these after a normal, expected closure of a connection, i.e. cases where the only mechanism for the thread on one end of the pipe to realize the task is finished is when the thread on the other side closes it and then the thread on the first end gets and logs these errors when trying to read from the pipe again.
@dopessoa Note that “network problem” can include a firewall blocking a port that Couchbase Server is trying to use. These don’t always return errors indicating that the access was blocked – the connection attempts may just time out.
Thank you for your valuable inputs @Kevin.Cherkauer and @jamesl33!
I will follow up with the suggestions on verbose execution of cbbackup and I will check the if any of Coucbase ports are not being blocked.
If both actions don’t fix the situation or give a clear idea of what is the source of the issue I will upload the logs.
Regards,
Douglas
Hello,
With the -vvv
option it was possible to notice that the errors have some patterns, like only when accessing indexes and reaching out to the local node where the backup is run.
[...]
2021-10-18 02:10:01,331: mt Starting new HTTP connection (1): node-001.company
2021-10-18 02:10:17,894: mt "GET /getIndexMetadata?bucket=A-00-3 HTTP/1.1" 500 138
2021-10-18 02:10:17,895: mt ['Internal server error, please retry your request']
[...]
2021-10-18 02:39:03,577: mt Starting new HTTP connection (1): node-001.company
2021-10-18 02:39:17,737: mt "GET /getIndexMetadata?bucket=bucketB-00-1 HTTP/1.1" 500 138
2021-10-18 02:39:17,738: mt ['Internal server error, please retry your request']
[...]
2021-10-18 03:15:53,015: mt Starting new HTTP connection (1): node-001.company
2021-10-18 03:16:03,138: mt "GET /getIndexMetadata?bucket=bucketB-00-3 HTTP/1.1" 500 138
2021-10-18 03:16:03,138: mt ['Internal server error, please retry your request']
[...]
About the log collection, this is a community edition Couchbase and I couldn’t find the log redaction option in the UI so I collected through the cli utility.
Where can I make the logs available?
Thank you again @Kevin.Cherkauer and @jamesl33.
@dopessoa You can upload logs by replying and selecting the upload icon in the toolbar on the reply window – on mine it’s the seventh from the left and looks like a horizontal rectangle with an upward pointing arrow coming out of it.
Hi @Kevin.Cherkauer, I have noticed that the logs have information that for me it’s sensitive like hostnames, IP addresses and bucket names. Is there any way to redact those?
@dopessoa There is not a Couchbase-provided tool to redact the logs. The forums are public, so if there is sensitive information in your logs then you should not upload them here unless you have already redacted them satisfactorily. (Note the paid versions of Couchbase have a way to upload logs privately to an internal location in Couchbase rather than this forum.)