Thanks for the help, this issue has been causing me a lot of trouble for the past few weeks since I’m brand new to server administration and learning this all as I go. It will take a few hours for me to test hitting sync gateway directly, but I can answer the other questions:
nginx.conf relevant lines:
user www-data;
worker_processes 4;
worker_rlimit_nofile 8192;
pid /run/nginx.pid;
events {
worker_connections 8192;
# multi_accept on;
}
http {
##
# Basic Settings
##
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# server_tokens off;
# server_names_hash_bucket_size 64;
# server_name_in_redirect off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
##
# Logging Settings
##
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
##
# Gzip Settings
##
gzip on;
gzip_disable "msie6";
# gzip_vary on;
# gzip_proxied any;
# gzip_comp_level 6;
# gzip_buffers 16 8k;
# gzip_http_version 1.1;
# gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
##
# nginx-naxsi config
##
# Uncomment it if you installed nginx-naxsi
##
#include /etc/nginx/naxsi_core.rules;
##
# nginx-passenger config
##
# Uncomment it if you installed nginx-passenger
##
#passenger_root /usr;
#passenger_ruby /usr/bin/ruby;
##
# Virtual Host Configs
##
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
relevant sections of nginx config for application in /etc/nginx/sites-enabled (/dbws/ is the web sockets endpoint for iOS, and /db/ is the endpoint for android and web to access sync gateway):
server {
listen 443;
#ssl setup stuff
server_name <host name>;
client_max_body_size 20M;
# Make site accessible from http://localhost/
server_name localhost;
location /dbws/ {
proxy_pass_header Accept;
proxy_pass_header Server;
keepalive_requests 1000;
keepalive_timeout 360s;
proxy_read_timeout 360s;
proxy_pass http://localhost:4984/<bucket name>/;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_http_version 1.1;
}
location /db/ {
proxy_pass http://localhost:4984/<bucket name>/;
}
}
The keep alive stuff was in nginx, I couldn’t find any documentation on all the possible sync gateway config params but these are the ones we have:
{
"log": ["CRUD", "REST+", "Shadow", "Access"],
"facebook": {
"register": true
},
"CORS": {
"Origin":["*"],
"Headers": ["DNT","X-Mx-ReqToken","Keep-Alive","User-Agent","X-Requested-With","If-Modified-Since","Cache-Control","Content-Type"],
"MaxAge": 1728000
},
"MaxFileDescriptors":20000,
"databases": {
"todos": {
"server": "http://<hostname>:8091",
"users": {
"GUEST": {
"disabled": false
}
},
"sync": `
// @formatter:off
// sync function
The sync gateway documentation for setting up with an nginx reverse proxy (http://developer.couchbase.com/mobile/develop/guides/sync-gateway/nginx/configuring-nginx-for-sync-gateway/index.html) says to use all the parameters we have specified in the /dbws/ endpoint, but since we have android, web, and iOS using it the most I can specify in our generic sync gateway endpoint are the keepalive_requests, keepalive_timeout, and proxy_read_timeout so that at least the connections will last longer and not keep getting torn down before the longpoll is done. When I do this we don’t get any CLOSE_WAIT or “can’t identify protocol” connections in sync gateway in the lsof output, but the number of connections just keeps growing(situation mentioned in my first post). If I just proxy pass to sync gateway it works ok, though we get a ton of “can’t identify protocol” and CLOSE_WAIT connections in the lsof output. I’ve increased the number of open files for sync gateway now and it seems to be able to remove those dead connections fast enough that we don’t use up all our open files for it, but we haven’t had any significant load happen yet either to truly test it out. This is a snippet of the lsof output for the “can’t identify protocol” CLOSE_WAIT connections, I’m pretty sure it’s because nginx is closing them on sync gateway in a way it doesn’t know how to handle:
sync_gate 14154 root 4239u sock 0,7 0t0 49412253 can't identify protocol
sync_gate 14154 root 3303u IPv6 49422016 0t0 TCP localhost:4984->localhost:58581 (CLOSE_WAIT)
sync_gate 14154 root 3145u IPv6 49426317 0t0 TCP localhost:4984->localhost:59705 (ESTABLISHED)
At the point I ran lsof to get these snippets sync gateway had 2759 “can’t identify protocol” lines, 1046 CLOSE_WAIT, and 279 ESTABLISHED. nginx had only ESTABLISHED connections. I don’t want to expose sync gateway directly to the internet though because nginx is designed to handle many connections very efficiently, we already have ssl setup on it, and this way it hides our actual bucket name and port from clients.
By “use up all our connections” I meant open files. It generally hits that before actually using up all available ports. This is the error I see in the sync gateway log when it happens:
2015/03/10 18:59:04 http: Accept error: accept tcp [::]:4984: too many open files; retrying in 320ms
I’ve increased the number of open files for sync gateway to 20000, so it can handle the load now when I specify a keep_alive in nginx of 65 seconds, but if I change sync_gateway’s keep_alive to 360s it will just keep building up connections. Our android clients longpoll the server every 3 minutes so long as the app is either running or minimized on the client device. That is due to our choice of using continuous replication though and I need to talk to our developer about that because it isn’t needed for our application.
Thanks again for any help or info you can give on this issue.