Hi @edwardzhong, that “hard out of memory error” may not be because of the rebalance, but it is likely causing the rebalancing to pause or seem hung.
“Hard out of memory” indicates that write traffic is being rejected because there is no enough RAM to allocate new data coming in. This would apply to both your application traffic as well as the rebalance, and so it’s very likely that the rebalance can’t proceed moving data for that bucket.
There are a few options here, but in general I would recommend increasing the RAM quota for that bucket and see if it allows the rebalance to proceed better. After that, you may want to re-evaluate your overall sizing for this cluster to make sure that you have enough RAM to support what you’re trying to do.
Remember that Couchbase has a very tightly managed caching layer of its own, it doesn’t rely upon the filesystem buffer cache. I believe this may explain why you are seeing RAM usage be relatively low. If you have set the bucket quota to a certain amount (as you indicated you had) then Couchbase will only use that much RAM for caching that particular dataset…even if you have much more RAM available in the system as a whole. i.e., if your node has 16GB of RAM available, but your bucket(s) is/are set to only use 1GB, then that’s all that Couchbase will use and it may seem like the rest of the system resources aren’t being utilized…they’re not. Thankfully you can both raise and lower the RAM quota for each bucket dynamically without restarting or affecting the system in anyway, Couchbase will just start using more or less RAM for that bucket.
On a related note, your initial thought process was correct…you can add a new node, mark an old node for removal and then perform the rebalance. Couchbase will automatically move data only between those two nodes and you can verify this in the logs by seeing a message that “this bucket is a swap-rebalance” (or something to that effect). You won’t lose any data, the cluster will stay at the same size, and the other nodes in the cluster not involved in that rebalance won’t be tasked with moving any data.
Finally, you mentioned that you were looking to increase the capacity of the nodes in your cluster. You are going about it in exactly the right way, I just wanted to make sure to explain the whole process and what to be aware of. When you add nodes of a larger RAM capacity to a cluster, you won’t be able to make use of that RAM right away, until all nodes have the same minimum amount. i.e., let’s say you have 3 nodes of 8GB of RAM each and you want to increase them to 16GB. You can add/remove one node at a time (or multiple at a time) but as long as there is still at least one node in the cluster with only 8GB of RAM, all the nodes will use only that much. Once all the nodes have been swapped, it will STILL be only using 8GB of RAM. At this point, you can raise the “cluster RAM quota” (also dynamically) from ~8GB per node to ~16GB per node (depending on how much headroom you want to give the OS, etc), and then you can raise the bucket quotas and/or add more buckets.
This page might help explain the architecture a bit more: https://developer.couchbase.com/documentation/server/4.6/architecture/managed-caching-layer-architecture.html
And this forum post as well: Cluster does not use the increased RAM quota - #4 by drigby
Hope that helps explain things, please let us know if you have any other questions.
Perry