Sync. gateway task stops before data nodes are ready when all servers are restarted

jda · August 18, 2020, 1:46pm

I have two Couchbase servers in a cluster and one Sync.gateway talking to the two servers.

We have had some issues with the data center so that all servers have been taking down and restarted. Every time this happens then the sync_gatewaytask on the Sync.Gateway server has stopped before the database servers have finished warming up. The log on the server looks like this when querying the status:

[root@sg1 logs]# systemctl status sync_gateway
● sync_gateway.service - Couchbase Sync Gateway server
   Loaded: loaded (/usr/lib/systemd/system/sync_gateway.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Tue 2020-08-18 14:30:10 CEST; 22min ago
  Process: 1955 ExecStart=/usr/bin/bash -c ${GATEWAY} --defaultLogFilePath "${LOGS}" ${CONFIG} (code=exited, status=1/FAILURE)
  Process: 1952 ExecStartPre=/bin/chown -R sync_gateway:sync_gateway /home/sync_gateway/data (code=exited, status=0/SUCCESS)
  Process: 1949 ExecStartPre=/bin/mkdir -p /home/sync_gateway/data (code=exited, status=0/SUCCESS)
  Process: 1946 ExecStartPre=/bin/chown -R sync_gateway:sync_gateway /home/sync_gateway/logs (code=exited, status=0/SUCCESS)
  Process: 1944 ExecStartPre=/bin/mkdir -p /home/sync_gateway/logs (code=exited, status=0/SUCCESS)
 Main PID: 1955 (code=exited, status=1/FAILURE)

Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service: main process exited, code=exited, status=1/FAILURE
Aug 18 14:30:10 sg1 systemd[1]: Unit sync_gateway.service entered failed state.
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service failed.
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service holdoff time over, scheduling restart.
Aug 18 14:30:10 sg1 systemd[1]: Stopped Couchbase Sync Gateway server.
Aug 18 14:30:10 sg1 systemd[1]: start request repeated too quickly for sync_gateway.service
Aug 18 14:30:10 sg1 systemd[1]: Failed to start Couchbase Sync Gateway server.
Aug 18 14:30:10 sg1 systemd[1]: Unit sync_gateway.service entered failed state.
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service failed.

I have looked at the documentation for sync.gateway configuration but have not really found any way to configure e.g. a timeout prior to retrying to start.

Any suggestions for solving this issue would be much appreciated. I know the best solution would be not to have to restart the servers - but it is a nuisance that we have to start the task manually once the servers are up.

Here are the main items of the config file:

        "maxFileDescriptors": 250000,
        "logging": {
                "console": {
                        "color_enabled": true,
                        "log_keys": ["HTTP+", "Sync"]
                        }
                },
        "adminInterface": "0.0.0.0:4985",
        "interface": "0.0.0.0:4984",
        "databases": {
                "data": {
                        "use_views":false,
                        "num_index_replicas":0,
                        "bucket": "data",
                        "server": "http://db1,db2:8091",
                        "username": "xyz",
                        "password": ".......",
                        "enable_shared_bucket_access": true,
                        "import_docs": true,
            "users": { "GUEST": { "disabled": true, "admin_channels": ["!"] } },

(the server is hidden behind a firewall so only trusted IPs can access the admin interface )

Environment:
OS: CentOS Server 7
Couchbase Community edition 6.5
Sync. Gateway 2.7.2 CE

bbrks · August 18, 2020, 2:35pm

You should be able to tune some parameters in the systemd service file to allow for a longer (or indefinite?) period of retry.

jda · August 18, 2020, 2:42pm

Thanks. I’ll try and adjust those parameters

Topic		Replies	Views
Sync gateway stops working, requires restart Sync Gateway	7	1846	July 21, 2023
Sync Gateway Not Starting Sync Gateway	9	1659	February 18, 2020
Couchbase Sync Gateway Service Not running(Windows Server 2016) - Sync Gateway exiting with exit status 1 Sync Gateway connections	4	1106	February 16, 2021
Sync Gateway Cluster setup issue Sync Gateway	4	2498	August 10, 2015
Syncgateway failed to start with error "Job for sync_gateway.service failed because the control process exited with error code. See "systemctl status sync_gateway.service" and "journalctl -xe" for details Sync Gateway	3	982	May 17, 2021

Sync. gateway task stops before data nodes are ready when all servers are restarted

Related topics