We are trying to deploy new version of Couchbase using Couchbase Operator for Kubernetes.
In our old Couchbase 6.0 CE single-node cluster we have around 60 millions of documents in one of the buckets and have indexes built on this bucket. Usual build time for biggest index is less than 1 hour.
As for new cluster memory-optimized index storage would require too much RAM we switched the engine to standard GSI (plasma storage).
Now the problem is that index build time is very long - more than 24 hours for a set of 5 indexes. We have indexes varying in size from 100K to 5G.
As we initially used standard storage class for persistent volumes we thought the speed could be related to disk throttling. So we decided to give a try and use premium (SSD) storage option for persistent volumes, but now after moving data from one of the data nodes to new one during re-balancing some indexes “got stuck” in moving state (which is also taking enormous amount of time similar to build time). If moving 240G of data from single node took ~1 hour, expected time according to the remaining mutations graph would take ~24hours or more.
At the same time neither of data, not index nodes are showing significant CPU usage - maximum ~20% despite we use smallest node type for Couchbase cluster (n2d-highmem2, 2vCPU, 16G RAM).
Could you give any recommendation what can be a bottleneck in this case? Which resource metrics we should pay attention at?