Sync Gateway Channels View Indexing takes too long - maxing out server CPU usage

Abhilash · September 18, 2017, 6:59pm

Hi,
We are using Sync Gateway v1.3 with couchbase server v4.1.0, on a 4-core, 12 GB machine. We have about 300K docs in the sync gateway bucket (including sync gateway docs - _rev and _deleted), and 40K users, out of which 15K are active.

Sync gateway is configured to 200,000 file descriptors, and our bucket is configured to revs_limit=10, rev_cache_size=1,000,000, channel_cache_max_length=5000, channel_cache_expiry=600.

Sync gateway is maxing out the CPU usage on the machine. I keep seeing the following messages in the sync gateway logs repeatedly.

_time=2017-09-18T23:58:38.200+05:30 _level=INFO _msg=go-couchbase: call to ViewCustom("sync_gateway", "channels") in github.com/couchbase/sync_gateway/db.(*DatabaseContext).getChangesInChannelFromView took 5.87908525s

2017-09-19T00:00:03.508+05:30 changes_view: Query took 452.469978ms to return 18 rows, options = db.Body{"endkey":[]interface {}{"custom_channel", 0x5e02c2}, "limit":50, "stale":false, "startkey":[]interface {}{"custom_channel", 0x1d60b0}}

_time=2017-09-19T00:01:46.103+05:30 _level=INFO _msg=go-couchbase: call to Do("_sync:user:custom_user") in github.com/couchbase/go-couchbase.(*Bucket).casNext took 25.473439007s

_time=2017-09-19T00:08:18.537+05:30 _level=INFO _msg=go-couchbase: call to Do("_sync:rev:1469550927282-df3cc14d-109f-4532-ab6d-79b0a4c54af9:35:54-97c8f61c4494da874014dd8458aa599a") in github.com/couchbase/go-couchbase.(*Bucket).GetsRaw took 967.382148ms

_time=2017-09-19T00:09:35.383+05:30 _level=INFO _msg=go-couchbase: call to Do("_sync:seq") in github.com/couchbase/go-couchbase.(*Bucket).Incr took 229.813282ms

Could someone please explain the meaning of these log statements? From what I understand, the sync gateway is repeatedly querying the channels view for the bucket and it is taking too long to execute the query. Is this due to some problem at the couchbase server side or something we need to fix with sync gateway?

BTW, couchbase server bucket shows CPU utilization of 100%. Wonder why that is happening?

andy · September 19, 2017, 12:07pm

@Abhilash

All Sync Gateway calls to Couchbase Server are taking a long time to return.

This could be symptomatic of the 100% CPU utilisation on the CBS server.

Have you looked for errors in the Couchbase Server logs.

Do you have high write throughput in SG when you are seeing these issue?

Abhilash · October 3, 2017, 9:38am

I am receiving the following error in the couchbase server logs

Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2017-10-03T15:00:03.758+05:30 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=3
MetadataService 2017-10-03T15:00:03.758+05:30 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=4
RemoteClusterService 2017-10-03T15:00:03.758+05:30 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
Error starting remote cluster service. err=metakv failed for max number of retries = 5
[goport] 2017/10/03 15:00:03 /opt/couchbase/bin/goxdcr terminated: exit status 1

The throughput for SG is low when these errors arise though. I am seeing upto 100 ops/sec on the couchbase server node by SG.

Topic		Replies	Views
Sync gateway taking 30 seconds to compute changes_view? Sync Gateway	10	3221	September 11, 2015
Sync gateway taking 13 - 20s to get changes_view Sync Gateway	4	1879	June 14, 2016
Couchbase server High CPU with sync gateway Couchbase Server	9	1621	December 13, 2019
Slow indexing for map function in _design/sync_gateway Couchbase Server	2	2298	June 5, 2015
How to speed up document pushing to user's machine from sync gateway? Sync Gateway	1	900	November 2, 2018

Sync Gateway Channels View Indexing takes too long - maxing out server CPU usage

Related topics