I have two Couchbase servers in a cluster and one Sync.gateway talking to the two servers.
We have had some issues with the data center so that all servers have been taking down and restarted. Every time this happens then the sync_gateway
task on the Sync.Gateway server has stopped before the database servers have finished warming up. The log on the server looks like this when querying the status:
[root@sg1 logs]# systemctl status sync_gateway
● sync_gateway.service - Couchbase Sync Gateway server
Loaded: loaded (/usr/lib/systemd/system/sync_gateway.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Tue 2020-08-18 14:30:10 CEST; 22min ago
Process: 1955 ExecStart=/usr/bin/bash -c ${GATEWAY} --defaultLogFilePath "${LOGS}" ${CONFIG} (code=exited, status=1/FAILURE)
Process: 1952 ExecStartPre=/bin/chown -R sync_gateway:sync_gateway /home/sync_gateway/data (code=exited, status=0/SUCCESS)
Process: 1949 ExecStartPre=/bin/mkdir -p /home/sync_gateway/data (code=exited, status=0/SUCCESS)
Process: 1946 ExecStartPre=/bin/chown -R sync_gateway:sync_gateway /home/sync_gateway/logs (code=exited, status=0/SUCCESS)
Process: 1944 ExecStartPre=/bin/mkdir -p /home/sync_gateway/logs (code=exited, status=0/SUCCESS)
Main PID: 1955 (code=exited, status=1/FAILURE)
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service: main process exited, code=exited, status=1/FAILURE
Aug 18 14:30:10 sg1 systemd[1]: Unit sync_gateway.service entered failed state.
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service failed.
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service holdoff time over, scheduling restart.
Aug 18 14:30:10 sg1 systemd[1]: Stopped Couchbase Sync Gateway server.
Aug 18 14:30:10 sg1 systemd[1]: start request repeated too quickly for sync_gateway.service
Aug 18 14:30:10 sg1 systemd[1]: Failed to start Couchbase Sync Gateway server.
Aug 18 14:30:10 sg1 systemd[1]: Unit sync_gateway.service entered failed state.
Aug 18 14:30:10 sg1 systemd[1]: sync_gateway.service failed.
I have looked at the documentation for sync.gateway configuration but have not really found any way to configure e.g. a timeout prior to retrying to start.
Any suggestions for solving this issue would be much appreciated. I know the best solution would be not to have to restart the servers - but it is a nuisance that we have to start the task manually once the servers are up.
Here are the main items of the config file:
"maxFileDescriptors": 250000,
"logging": {
"console": {
"color_enabled": true,
"log_keys": ["HTTP+", "Sync"]
}
},
"adminInterface": "0.0.0.0:4985",
"interface": "0.0.0.0:4984",
"databases": {
"data": {
"use_views":false,
"num_index_replicas":0,
"bucket": "data",
"server": "http://db1,db2:8091",
"username": "xyz",
"password": ".......",
"enable_shared_bucket_access": true,
"import_docs": true,
"users": { "GUEST": { "disabled": true, "admin_channels": ["!"] } },
(the server is hidden behind a firewall so only trusted IPs can access the admin interface )
Environment:
OS: CentOS Server 7
Couchbase Community edition 6.5
Sync. Gateway 2.7.2 CE