We are testing out our application using couchbase, and currently we have a 5 node, ~200GB total data cluster on amazon AWS. We did a backup using cbbackup (which failed at at 97% - seperate issue), however more importantly we had some critical write failures during the ~2 hour backup.
The documentation for cbbackup says you can backup a live cluster, but obviously write failures during backup are not ideal. The machine we did the backup process on is not in the cluster, but a remote machine that connected to the cluster over http, for performance reasons. We did notice that the TAP queue graph seemed to spike dramatically during this time period, which makes sense because replication is apparently TAP intensive (from backup couchbase blog)
I’m looking into the logs for more information related to the failures, but I’m wondering has anyone else had similar issues backing up what I’m assuming is a large couchbase instance?