Couchbase node inititates a failuring during XDCR Replication

When we try to setup XDCR replication to our remote data center, one of our production node always fails 3-6 hour into it. When this occurs, couchbase initiates a fail-over. CPU, Memory, and Disk resources seem to be adequate. We could really use some help in understanding why this occurs. Looking at the logs, we see the following:

  • Replication from bucket “default” to bucket “default” on cluster “Couchbase DR” created.
  • (6 hours later) Could not auto-failover node (‘ns_1@10.0.100.103’). There was at least another node down.
  • Starting failing over ‘ns_1@10.0.100.103’
  • Shutting down bucket “default” on ‘ns_1@10.0.100.103’ for deletion
  • Failed over ‘ns_1@10.0.100.103’: ok
  • Then the fail-over data:
    Node (‘ns_1@10.0.100.103’) was automatically failovered.
    [{last_heard,{1486,461532,134206}},
    {outgoing_replications_safeness_level,[]},
    {incoming_replications_conf_hashes,[]},
    {active_buckets,[]},
    {ready_buckets,[]},
    {local_tasks,[[{type,xdcr},
    {id,<<“c31a321ef6d1a091c15bfece55113287/default/default”>>},
    {errors,[]},
    {changes_left,21607089},
    {docs_checked,6000},
    {docs_written,6000},
    {data_replicated,1162225},
    {active_vbreps,32},
    {waiting_vbreps,0},
    {time_working,399},
    {time_committing,0},
    {num_checkpoints,0},
    {num_failedckpts,0},
    {docs_rep_queue,24421},
    {size_rep_queue,2791713}]]},
    {memory,[{total,294963176},
    {processes,174874312},
    {processes_used,171989272},
    {system,120088864},
    {atom,1502673},
    {atom_used,1495115},
    {binary,28432056},
    {code,15041863},
    {ets,64090848}]},
    {system_memory_data,[{system_total_memory,20962406400},
    {free_swap,4294963200},
    {total_swap,4294963200},
    {cached_memory,1529925632},
    {buffered_memory,145485824},
    {free_memory,3445846016},
    {total_memory,20962406400}]},
    {node_storage_conf,[{db_path,"/opt/couchbase/var/lib/couchbase/data"},
    {index_path,"/opt/couchbase/var/lib/couchbase/data"}]},
    {statistics,[{wall_clock,{52925934918,5000}},
    {context_switches,{163817818663,0}},
    {garbage_collection,{13267299552,91799678196518,0}},
    {io,{{input,4786443488287},{output,13158370499012}}},
    {reductions,{30658253554678,145473}},
    {run_queue,0},
    {runtime,{5026268150,80}}]},
    {system_stats,[{cpu_utilization_rate,0.5},
    {swap_total,4294963200},
    {swap_used,0}]},
    {interesting_stats,[]},
    {cluster_compatibility_version,131072},
    {version,[{public_key,“0.13”},
    {lhttpc,“1.3.0”},
    {ale,“8cffe61”},
    {os_mon,“2.2.7”},
    {couch_set_view,“1.2.0a-8352437-git”},
    {mnesia,“4.5”},
    {inets,“5.7.1”},
    {couch,“1.2.0a-8352437-git”},
    {mapreduce,“1.0.0”},
    {couch_index_merger,“1.2.0a-8352437-git”},
    {kernel,“2.14.5”},
    {crypto,“2.0.4”},
    {ssl,“4.1.6”},
    {sasl,“2.1.10”},
    {couch_view_parser,“1.0.0”},
    {ns_server,“2.0.1-170-rel-community”},
    {mochiweb,“1.4.1”},
    {oauth,“7d85d3ef”},
    {stdlib,“1.17.5”}]},
    {supported_compat_version,[2,0]},
    {system_arch,“x86_64-unknown-linux-gnu”},
    {wall_clock,52925934},
    {memory_data,{20962406400,20629684224,{<15470.7605.0>,13313704}}},
    {disk_data,[{"/",21545540,16},
    {"/dev/shm",10235548,0},
    {"/boot",99150,60},
    {"/opt",412710064,24}]},
    {meminfo,<<“MemTotal: 20471100 kB\nMemFree: 3365084 kB\nBuffers: 142076 kB\nCached: 1494068 kB\nSwapCached: 0 kB\nActive: 13488072 kB\nInactive: 2527744 kB\nActive(anon): 12773140 kB\nInactive(anon): 1647564 kB\nActive(file): 714932 kB\nInactive(file): 880180 kB\nUnevictable: 767544 kB\nMlocked: 0 kB\nSwapTotal: 4194300 kB\nSwapFree: 4194300 kB\nDirty: 280 kB\nWriteback: 0 kB\nAnonPages: 15147212 kB\nMapped: 44608 kB\nShmem: 168 kB\nSlab: 116148 kB\nSReclaimable: 87212 kB\nSUnreclaim: 28936 kB\nKernelStack: 1488 kB\nPageTables: 42460 kB\nNFS_Unstable: 0 kB\nBounce: 0 kB\nWritebackTmp: 0 kB\nCommitLimit: 14429848 kB\nCommitted_AS: 21044432 kB\nVmallocTotal: 34359738367 kB\nVmallocUsed: 197232 kB\nVmallocChunk: 34359537336 kB\nHardwareCorrupted: 0 kB\nAnonHugePages: 1622016 kB\nHugePages_Total: 0\nHugePages_Free: 0\nHugePages_Rsvd: 0\nHugePages_Surp: 0\nHugepagesize: 2048 kB\nDirectMap4k: 10240 kB\nDirectMap2M: 20961280 kB\n”>>}]