XDCR V1 is not working

I want to run XDCR replication from cluster1 to cluster2, both are Couchbase Server community 4.1.1 running on ubuntu. They are behind firewalls so I opened port 8092 on both clusters to the IP addresses in the other cluster and I’ve verified that both clusters can run view queries over port 8092 on the other cluster. I chose XDCR v1 because I only wanted to open port 8092. Performance is not an issue, this is for a test and will be shut off once the initial replication is complete).

I created an outgoing XDCR V1 from cluster1 to cluster2 and it said everything was created ok, and I started replication of one of the buckets, but it doesn’t seem to be doing anything. It’s been over 10 minutes and there are no errors reported in the logs on either cluster, but there are no documents replicated either. The Outbound XDCR Operations on cluster1 showed initially that it had a few million mutations (the number of docs in the bucket), then it went down to 2 mutations and nothing is happening in any of the graphs. I tried restarting the node for cluster1 and now the number of mutations is correct but it’s still not replicating anything.

NOTE: I did accidentally create a XDCR V2 replication initially, then saw there were constant errors replicating and that I needed to open a lot more ports, so I paused it then removed it and then created the XDCR v1.

It’s rather old functionality now, but my recollection is that port 8091 is also needed, as each side needs to see what the topology is on the other side. Since you control both sides, you might be able to open ports 8091/8092 to just the hosts in question.

Sorry I forgot to mention that port 8091 is also open.

The only other thing I can think of is make sure the hostnames/ports are resolvable from each cluster to the nodes of the other. Hope that helps.

I double checked that the hostnames resolve and then I opened all required ports for XDCR v2 and tried creating a replicate with xdcr v2 but it reported errors communicating over one of the ports I had opened (11210). I’m on Azure trying to replicate from a cluster in one vnet to a cluster in another vnet, and I opened the ports using azure security groups, so I think it’s an Azure issue. I eventually gave up and just created my new cluster inside the same vnet as the existing one and then replication worked.

I know Azure is not recommended for xdcr but I thought it just said it would be slow. Is XDCR across vnets not supported in Azure at all?

XDCR works fine on Azure. Please share the screen shots to help us identify the issue.

I already gave up and just created a new cluster inside the same vnet and deleted the old one I was trying to replicate to. So I don’t have screenshots but this is what I originally tried to do:

  1. create a new resource group in central us
  2. create two VMs inside that resource group that were in their own vnet together and in their own network security group that allowed inbound traffic on port 8091 from all IPs, and inbound traffic on port 8092,11210 and the other xdcr v2 required ports from only the IP address of the VM I wanted to replicate from (this is the public IP address, not the internal one from inside its vnet)
  3. install Couchbase on the two new VMs and make them a two node cluster.
  4. create an empty bucket that I would replicate to
  5. on my original cluster that is in an East us resource group within its own vnet and network security group I added an inbound rule for traffic to all ports required by xdcr v2 only from the two IP addresses of the nodes in my new cluster (public IP addresses)
  6. I ran some view queries over port 8092 from the old cluster to the new one and vice versa to verify they could communicate over those ports and resolve their hostnames correctly
  7. on the original cluster I created a replication target to the new cluster
  8. on the old cluster I started xdcr v2 replication to a bucket in the new cluster. It just kept reporting the error that it couldn’t communicate over port 11210.