cbbackupmgr version 7.0.2-6703
OS: linux Version: 5.10.178-162.673.amzn2.x86_64
Arch: amd64 vCPU: 2 Memory: 4038017024 (3.76GiB)
I have a simple docker container running the following statements, which is scheduled by AWS to run once a day.
cd /usr/local/halix/backup
/opt/couchbase/bin/cbbackupmgr config -a /usr/local/halix/backup -r prod
/opt/couchbase/bin/cbbackupmgr backup -a /usr/local/halix/backup -r prod -c http://$DB_URI:8091 -u $DB_USERNAME -p $DB_PASSWORD --full-backup
zip -r backup.zip .
/usr/local/halix/s3put -k $AWS_ACCESS_KEY -s $AWS_ACCESS_SECRET -b $S3_URL put backup.zip
It successfully uploads a zip every night, but the result is inconsistent. Some backups are 2.6GB and others are 3.3GB. The difference appears to be that data in the 3rd bucket (last bucket) is missing or gets cut off. Also the backup-0.log appears to abruptly stop with no error produced.
In the 3.3GB backups, the log ends properly with:
2023-05-15T00:17:11.799+00:00 (Plan) Transfer for cluster complete
2023-05-15T00:17:11.799+00:00 (Plan) Transfer of all data complete
2023-05-15T00:17:11.800+00:00 (Cmd) Backup completed successfully
2023-05-15T00:17:11.800+00:00 (Stats) Stopping stat collection
On the 2.6GB backups, the log ends randomly. No error or anything, but just mid-stream it seems. Example:
2023-05-17T00:11:23.490+00:00 (DCP) (usage) (vb 1000) Creating DCP stream | {“uuid”:0,“start_seqno”:0,“end_seqno”:5824,“snap_start”:0,“snap_end”:0,“retries”:0}
2023-05-17T00:11:23.492+00:00 (DCP) (usage) (vb 357) Creating DCP stream | {“uuid”:0,“start_seqno”:0,“end_seqno”:6139,“snap_start”:0,“snap_end”:0,“retries”:0}
As I’m running a zip command after the cbbackupmgr command finishes, I know that the backup command is finishing and not crashing the docker container or anything. Is it possible for this tool to return as complete while it is still actually doing work? Do I need to add a pause before attempting to zip the backup directory to give the system time to finish writing backup files? I’m baffled and need to answer these inconsistencies before we can completely remove the old backup tools from our production environments. Any debugging ideas or suggestions as to what might cause what I’m seeing would be greatly appreciated. I’d upload the full logs, but does not appear my account is allowed to attach files. Thanks!