Sync Gateway Channels View Indexing takes too long - maxing out server CPU usage

Hi,
We are using Sync Gateway v1.3 with couchbase server v4.1.0, on a 4-core, 12 GB machine. We have about 300K docs in the sync gateway bucket (including sync gateway docs - _rev and _deleted), and 40K users, out of which 15K are active.

Sync gateway is configured to 200,000 file descriptors, and our bucket is configured to revs_limit=10, rev_cache_size=1,000,000, channel_cache_max_length=5000, channel_cache_expiry=600.

Sync gateway is maxing out the CPU usage on the machine. I keep seeing the following messages in the sync gateway logs repeatedly.

_time=2017-09-18T23:58:38.200+05:30 _level=INFO _msg=go-couchbase: call to ViewCustom("sync_gateway", "channels") in github.com/couchbase/sync_gateway/db.(*DatabaseContext).getChangesInChannelFromView took 5.87908525s

2017-09-19T00:00:03.508+05:30 changes_view: Query took 452.469978ms to return 18 rows, options = db.Body{"endkey":[]interface {}{"custom_channel", 0x5e02c2}, "limit":50, "stale":false, "startkey":[]interface {}{"custom_channel", 0x1d60b0}}

_time=2017-09-19T00:01:46.103+05:30 _level=INFO _msg=go-couchbase: call to Do("_sync:user:custom_user") in github.com/couchbase/go-couchbase.(*Bucket).casNext took 25.473439007s

_time=2017-09-19T00:08:18.537+05:30 _level=INFO _msg=go-couchbase: call to Do("_sync:rev:1469550927282-df3cc14d-109f-4532-ab6d-79b0a4c54af9:35:54-97c8f61c4494da874014dd8458aa599a") in github.com/couchbase/go-couchbase.(*Bucket).GetsRaw took 967.382148ms

_time=2017-09-19T00:09:35.383+05:30 _level=INFO _msg=go-couchbase: call to Do("_sync:seq") in github.com/couchbase/go-couchbase.(*Bucket).Incr took 229.813282ms

Could someone please explain the meaning of these log statements? From what I understand, the sync gateway is repeatedly querying the channels view for the bucket and it is taking too long to execute the query. Is this due to some problem at the couchbase server side or something we need to fix with sync gateway?

BTW, couchbase server bucket shows CPU utilization of 100%. Wonder why that is happening?

@Abhilash

All Sync Gateway calls to Couchbase Server are taking a long time to return.

This could be symptomatic of the 100% CPU utilisation on the CBS server.

Have you looked for errors in the Couchbase Server logs.

Do you have high write throughput in SG when you are seeing these issue?

I am receiving the following error in the couchbase server logs

Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2017-10-03T15:00:03.758+05:30 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=3
MetadataService 2017-10-03T15:00:03.758+05:30 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=4
RemoteClusterService 2017-10-03T15:00:03.758+05:30 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
Error starting remote cluster service. err=metakv failed for max number of retries = 5
[goport] 2017/10/03 15:00:03 /opt/couchbase/bin/goxdcr terminated: exit status 1

The throughput for SG is low when these errors arise though. I am seeing upto 100 ops/sec on the couchbase server node by SG.