Under outbound XDCR operations graph, the mutations skipped by resolution is very high. Also noticed that “goxdcr” process is using a lot of memory (>6GB). Is that normal or is there anything else going on that I should check?
Hi rj123, could you please provide more details? (version of the server are you using, number of clusters, nodes and approximate number/size of docs being replicated, Network Usage Limit and other XDCR replication settings, if modified)
Version was EE 5.5.1 with 3 nodes in the cluster and 3 nodes in remote cluster having XDCR replication in both directions. Also noticed that all the services were enabled on all the nodes - data, eventing, index, query, search and analytics. We are only using n1ql queries, so I assume only data, index and query services would suffice. Also noticed transparent huge pages was on. All other settings for XDCR were kept default. Number of replicas was set to 1. Total number of docs was around 100k, but several hundred were probably being updated at same time. There was no load testing, but load average on the servers was also very high (>20 on 4 cpu, 8GB memory server).
Based on some inputs I got here is what we have done now -
- Disabled transparent huge pages
- Upgraded server to v5.5.3 for one cluster. Remote cluster has not been upgraded for now.
We will watch this for couple of days to see if the changes made any improvements. Please provide further inputs if you notice anything.
This seems to have helped the high load issue. Regarding the “Mutations skipped by resolution” statistic, I believe it by itself is not an indication of any problem.