We’re currently using couchbase Version: 3.0.1 Community Edition (build-1444), and experiencing an issue with XDCR.
Currently we have 2 data centers A and B in production with XDCR enabled(1 Billion docs in A and B). We’re doing a Unidirectional XDCR from datacenter A to new datacenter C for 1 billion docs. We find that it can only copy 90% docs(900 million) to datacenter C. Datacenter C still have XDCR requesting coming in, but the num will hold around 900 million. This issue is there for about one day.
Could anyone give me some help on this? Thanks a lot.
Hi, Going through your scenario it seems that 10% of data is not getting replicated every time. Can you check if your documents have Time To Live (TTL) set? If yes, then next setting I will ask to check is ‘Metadata Purge Interval’ this could explain why 10% data is not getting replicated. If not, I would suggest that you should open a issue in our JIRA tracker referencing this post as well as providing cbcollect_info from the cluster and any additional steps which might help us reproduce it.
Hi Steve, Looks like purge is happening quite frequently every hour (0.04) and deleting the docs and metadata. I cannot say with certainty that’s the issue without knowing all the details. But I would recommend setting the Metadata Purge Interval to default of 3 days and checking if that fixes the issue.
I changed the Metadata Purge Interval to 6 days. and after one night, the number in cluster C increased by 1%. so it looks like there is some other issue here.
I also checked the XDCR errors from web UI and saw errors below. This error is in XDCR between our production XDCR(A <-> B), and also in this backup XDCR(A -> C). Does this mean there are some bad docs in the bucket so XDCR will not sync the docs in that bucket?
2016-01-12 18:29:52 [Vb Rep] Error replicating vbucket 455. Please see logs for details.
2016-01-12 18:29:52 [Vb Rep] Error replicating vbucket 446. Please see logs for details.
2016-01-12 18:29:52 [Vb Rep] Error replicating vbucket 433. Please see logs for details.
To find the root cause I would suggest to turn-on verbose logging using the Advanced XDCR Settings. Also I would suggest that you should open a issue in our JIRA tracker referencing this post as well as providing cbcollect_info from the cluster and any additional steps which might help us reproduce it.