I ran command line backup on travel-sample bucket and the directory structure and files that i see is different then given in the documentation link. I don’t see any shard* file in data directory instead i see failoverlog*.fol, data*.rift, index*.sqlite, snapshot*.snp, stats.json files. Can someone please help to understand what are this files and how is data stored ? Is there any documentation on command line backed-up?
Configuration : 3 node cluster with couchbase 7.0 version
The archive layout created by ‘cbbackupmgr’ has changed a fair bit since that
documentation was written and as a consequence, the documentation is now
out-of-date.
In the past we’ve given slightly more information about the layout/storage
formats, however, we (the maintainers) agree that it should be removed for a
couple of reasons:
Ideally, it’s not something users need to understand/care about (cbbackupmgr
should just work, it shouldn’t matter how the data is stored/formatted)
The format shouldn’t be changed/modified in any way, otherwise users may
experience undefined behavior.
a) In 7.0.0, a README.md was added to the repository which indicates this
# Repository repo
Creation Time: 2021-08-09T12:18:01+01:00
Author: Unknown
Version: cbbackupmgr-master-831fc4b6
This is a repository created by the cbbackupmgr tool, please don't alter any of the files as this may result in unexpected
behaviour.
The format is subject to change, it has and will continue to evolve as we
add new features in the future.
As such, I’ve created DOC-8948 which can be used to track the removal of the
archive layout documentation.
If you’re interested, we do have some high level overviews of the backup
architecture/design/features in the form of Connect videos that are available
on our YouTube channel.
Thanks for sharing the link. I had doubts since there are many files created and i wanted to what each file format means?
So will it be possible to get some information on that? Also for empty buckets also i see many files created under the data directory.
You’re correct, newer versions of cbbackupmgr will create a lot more files than previous versions for various reasons; most importantly performance.
Regarding what the file formats mean:
The .snp files contain DCP snapshot metadata
The .fol files contain DCP failover logs
Rift storage files (briefly covered in the S3 video as it was a requirement for native cloud integration)
a) The index_[\d+].sqlite.[\d+] files are storage indexes, they contain metadata/locations for document data
b) The data_[\d+].rift.[\d+] files contain packed document metadata/data
General metadata files .info, .backup, backup-meta.json etc. are all JSON files
It’s worth noting that the formats/locations have changed significantly between versions so this information won’t be correct for older versions (for example, those that use SQLite/ForestDB).
Regarding the creation of “many files” for a backup of an empty bucket, this is to be expected; cbbackupmgr will still open DCP streams to the bucket and persist the failover logs for each vBucket. This metadata is used when calculating the data which should be streamed when creating the next incremental backup.
Hopefully I’ve covered your questions, if I haven’t please let me know.