Understanding the directory structure for backup

Jayesh · August 9, 2021, 6:45am

Hello Team,

I ran command line backup on travel-sample bucket and the directory structure and files that i see is different then given in the documentation link. I don’t see any shard* file in data directory instead i see failoverlog*.fol, data*.rift, index*.sqlite, snapshot*.snp, stats.json files. Can someone please help to understand what are this files and how is data stored ? Is there any documentation on command line backed-up?

Configuration : 3 node cluster with couchbase 7.0 version

jamesl33 · August 9, 2021, 11:49am

Hi @Jayesh,

The archive layout created by ‘cbbackupmgr’ has changed a fair bit since that
documentation was written and as a consequence, the documentation is now
out-of-date.

In the past we’ve given slightly more information about the layout/storage
formats, however, we (the maintainers) agree that it should be removed for a
couple of reasons:

Ideally, it’s not something users need to understand/care about (cbbackupmgr
should just work, it shouldn’t matter how the data is stored/formatted)

The format shouldn’t be changed/modified in any way, otherwise users may
experience undefined behavior.

a) In 7.0.0, a README.md was added to the repository which indicates this

# Repository repo

Creation Time: 2021-08-09T12:18:01+01:00
Author: Unknown
Version: cbbackupmgr-master-831fc4b6

This is a repository created by the cbbackupmgr tool, please don't alter any of the files as this may result in unexpected
behaviour.

The format is subject to change, it has and will continue to evolve as we
add new features in the future.

As such, I’ve created DOC-8948 which can be used to track the removal of the
archive layout documentation.

If you’re interested, we do have some high level overviews of the backup
architecture/design/features in the form of Connect videos that are available
on our YouTube channel.

Thanks for the interest, if you have any further questions, please let me know,
James

Jayesh · August 12, 2021, 4:46am

Hi @jamesl33 ,

Thanks for sharing the link. I had doubts since there are many files created and i wanted to what each file format means?
So will it be possible to get some information on that? Also for empty buckets also i see many files created under the data directory.

jamesl33 · August 16, 2021, 12:18pm

Hi @Jayesh,

You’re correct, newer versions of cbbackupmgr will create a lot more files than previous versions for various reasons; most importantly performance.

Regarding what the file formats mean:

The .snp files contain DCP snapshot metadata
The .fol files contain DCP failover logs
Rift storage files (briefly covered in the S3 video as it was a requirement for native cloud integration)
a) The index_[\d+].sqlite.[\d+] files are storage indexes, they contain metadata/locations for document data
b) The data_[\d+].rift.[\d+] files contain packed document metadata/data
General metadata files .info, .backup, backup-meta.json etc. are all JSON files

It’s worth noting that the formats/locations have changed significantly between versions so this information won’t be correct for older versions (for example, those that use SQLite/ForestDB).

Regarding the creation of “many files” for a backup of an empty bucket, this is to be expected; cbbackupmgr will still open DCP streams to the bucket and persist the failover logs for each vBucket. This metadata is used when calculating the data which should be streamed when creating the next incremental backup.

Hopefully I’ve covered your questions, if I haven’t please let me know.

Thanks,
James

Topic		Replies	Views
Cbbackup generates files with question marks in parent directory of destination directory Couchbase Server	3	1177	November 14, 2016
Why there is no way to backup schema? Couchbase Server	2	839	October 30, 2017
CB 6.6.2 backup script for larger environment sample Couchbase Server backup	0	665	August 22, 2021
Increase log file debug level Couchbase Server backup	8	998	September 19, 2022
Cbbackupmgr Tool usage Couchbase Server	3	1541	December 5, 2016

Understanding the directory structure for backup

Related topics