I’ve setup XDCR of production cluster to backup node and use cbbackup to dump all buckets on a daily basis. Couchbase data directory is around 55GB. When I run cbbackup, it goes crazy in terms of memory usage. It’s using up to 60GB of memory - which is beyond machine RAM (it has 16GB, out of that 7GB is used by couchbase itself) so it ends up using SWAP excessively. This all makes the backup process very very slow, around 20hours. Disk I/O (on the data partition) and CPU usage is pretty small, definitely memory usage (and because of swapping - IO on swap drive) being the bottleneck.
Is there any way which we could reduce memory usage by cbbackup?
Looking on other datastores I’ve been using before (*SQL, MongoDB) it was never a problem to backup 50GB of data…
I’m running on Amazon EC2 - the backup host is m3.xlarge instance, which provides two 40GB ephemeral drives. I’m using raid0 of them for swap volume.
Available on gist - output of dstat command. xvdb, xcdc are ephemeral local SSD drives used for swap, xvdf is used for couchbase data and writing backup outputs (with IO capacity of 1500IOPS):
I am experiencing this same issue using version 3.0.1 Community on Centos6.5 in Rackspace. Python version “python.x86_64 2.6.6-52.el6”. I run this on one of the couch base nodes with the --bucket option to backup a specific bucket from all nodes’ data (not single node mode).
The cbbackup utility just uses all available memory until the VM’s kernel OOMkills it, as well as memcached.
Mar 9 20:44:10 couchdbwhois1113r kernel: Out of memory: Kill process 14119 (memcached) score 567 or sacrifice child
Mar 9 20:44:10 couchdbwhois1113r kernel: Killed process 14119, UID 497, (memcached) total-vm:17794644kB, anon-rss:17453500kB, file-rss:8kB
Mar 9 20:44:10 couchdbwhois1113r kernel: Out of memory: Kill process 4630 (python) score 320 or sacrifice child
Mar 9 20:44:10 couchdbwhois1113r kernel: Killed process 4630, UID 0, (python) total-vm:11581564kB, anon-rss:10773124kB, filers:4kB
Sorry to hear that you’re encountering OOM for memcached process. Will be great if you could share cbcollect_info log from the node when OOM killer kicked in.
I don’t think that it has anything to do with couchbase server itself. As it happend to both me and emccormick - the problem is that cbbackup process starts to consume all available memory, which in end-effect causes out of memory situation, which is normal and triggers OOM-killer.
From my experiments, I have figured out that cbbackup loads into memory the whole contents of a bucket when dumping its contents. On large clusters it just renders cbbackup useless tool, which is very pity because it’s impossible to dump the data. I have been suggested other solutions (like XDCR to another cluster for backup and do volume snapshots), but it’s not perfect for many reasons - and we would love to be able to use cbbackup tool anyway!