Started getting connection errors to sync gateway

In my test environment I have started to get problems with connecting to sync.gateway from my app. These are the errors I see in the app:

2024-09-04 10:33:02.885114+0200 Angler.iOS[11742:608135] [dotnet] [Thread Pool Worker]| ERROR)  [Network] Obj=/DNRepl@2B2B/C4RemoteRepl#4/Repl#5/Connection#1/C4Socket#3/ WebSocket failed to connect! (reason=Network error 6) Authentication failed prior to server certificate (bad / missing client cert?)
2024-09-04 10:33:02.887329+0200 Angler.iOS[11742:608398] [dotnet] [33]| ERROR)  [Replicator] Obj=/DNRepl@2B2B/C4RemoteRepl#4/Repl#5/ Got LiteCore error: Network error 6, "Authentication failed prior to server certificate (bad / missing client cert?)"
[DbDataStore] PushAndPull Replicator: 0/0, error CouchbaseLiteException (NetworkDomain / 6): Authentication failed prior to server certificate (bad / missing client cert?)., activity = Stopped
[DbDataStore] Error :: Couchbase.Lite.CouchbaseNetworkException: CouchbaseLiteException (NetworkDomain / 6): Authentication failed prior to server certificate (bad / missing client cert?).

I was using Nuget package 3.1.7 but upgrading to 3.2.0 made no change. Sync.gateway is: Couchbase Sync Gateway/3.1.3(6;52b979c) CE
It probably makes no difference but the Couchbase servers behind the sync.gateway are: Community Edition 7.2.4 build 7070

/John

Hmmm… The SG does not request a client cert.

Any chance a that the SG is behind a proxy that wants a cert?
… or that some cert has expired?

That error seems to come from CBL’s .NET networking code, which ultimately calls into .NET APIs. I am not familiar with that code, but another possibility is a TLS handshake failure due to the protocol version or cipher choice.

Have you upgraded .NET itself, or the client OS, recently?

Yes it is behind a proxy (nginx) - but the certificate should be Ok (it’s Let’s Encrypt):

I have set the app to use .net8 last time I did an update - so the simulators worked with that. The sync gateway is reached through the nginx gateway to avoid making the sync.gateway itself visible for the outside world. The Nginx server uses LetsEncrypt for TLS-certs (see above) - and the cert seems to be valid. I can open the website under the same cert from the browser on the simulator without issues…

I am not sure if Xcode (and thus the OS on the simulators) have been updated since last time I made a new version…

There is a litany of things that could go wrong in TLS. It’s probably one of the least stable parts of .NET as a whole and it often drives me crazy. This particular message comes as the result of a failure of the following API

SslStream.AuthenticateAsClientAsync

Now that I look at the area in question I realize it is using TLS 1.2, so perhaps a TLS 1.3 only server will fail (.NET Framework still only supports only TLS 1.2 but I should probably make an enhancement to get all the other targets using that along with TLS 1.3).

Could this be a source of the problem?

That is a good question - I’m not sure how to see that. I have set the NGinx up to use LetsEncrypt and it handles it automatically. The “version” in the certificate is “3” - if that is what we are looking for:

The TLS version should be part of the nginx server configuration. The certificate version is in parallel to that and is not relevant to the problem.

Ahh… sorry. I’m not too deep into TLS etc.

This is the entire server setup from Nginx:

server {
    if ($host = fangst.dalsgaard-data.dk) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    listen 80;
    server_name  fangst.dalsgaard-data.dk;
    return 301 https://$host$request_uri;
    include /etc/nginx/default.d/*.conf;
}

server {
    listen 443 ssl;
    server_name fangst.dalsgaard-data.dk;
    ssl_certificate /etc/letsencrypt/live/fangst.dalsgaard-data.dk/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/fangst.dalsgaard-data.dk/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf;
    include /etc/nginx/default.d/*.conf;

	# Use a specific url pattern to identify sync requests - and remove that part before redirecting to the db server
	
    location /_sync {
    	rewrite                 /_sync/(.*) /$1  break;
    	proxy_pass              http://sync_gateway;
    	proxy_http_version      1.1;
    	proxy_set_header        Upgrade $http_upgrade;
    	proxy_set_header        Connection "upgrade";
    	proxy_set_header        Host $host;
    	proxy_read_timeout      360s;
    	proxy_send_timeout      360s;
    	proxy_buffering         off;
   		proxy_set_header        X-Real-IP $remote_addr;
    	proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
	    proxy_set_header        X-Forwarded-Proto $scheme;
    	proxy_ssl_verify        off;
    }

	location / {
		:
		:
		.... removed as probably not relevant here.
    }

}

# Pool of sync gateways
# Remember to allow Nginx to redirect... See: https://stackoverflow.com/questions/23948527/13-permission-denied-while-connecting-to-upstreamnginx
upstream sync_gateway {
    server sg1.dalsgaard-data.dk:4984;
    #server db2.dalsgaard-data.dk:4984;
}

The file /etc/letsencrypt/options-ssl-nginx.conf contains:

# This file contains important security parameters. If you modify this file
# manually, Certbot will be unable to automatically provide future security
# updates. Instead, Certbot will print and log an error message with a path to
# the up-to-date file that you will need to refer to when manually updating
# this file. Contents are based on https://ssl-config.mozilla.org

ssl_session_cache shared:le_nginx_SSL:10m;
ssl_session_timeout 1440m;
ssl_session_tickets off;

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;

ssl_ciphers "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384";

I suppose the answer to your question could lie in the last file…

Indeed it looks like it is set up to handle TLS 1.2. Maybe you could try removing TLS 1.3 from that ssl_protocols list and see what happens? There also could be some issue with the ssl_ciphers I suppose, but I can’t be sure about that. If you use wireshark, you might be able to see what is happening during the handshake.

Ok, I tried to remove TLSv1.3, but it still gives me the error:

2024-09-17 10:56:32.346423+0200 Angler.iOS[69722:2725570] [34]| ERROR)  [Network] {C4SocketImpl#9} WebSocket failed to connect! (reason=Network error 6) Authentication failed prior to server certificate (bad / missing client cert?)
2024-09-17 10:56:32.348004+0200 Angler.iOS[69722:2725030] [28]| ERROR)  [Replicator] {Repl#8} Got LiteCore error: Network error 6, "Authentication failed prior to server certificate (bad / missing client cert?)"

I also downgraded to version 3.1.7 - but that is related to trying to solve another (UI) issue where the app (in app.store) does not load correctly on iOS 18…

I’m not really an expert when it comes to why TLS connections fail so as I mentioned it would be interesting to see if wireshark shows anything. There should also be an inner exception there that shows more detail and I’m not sure why it doesn’t get printed. Maybe putting an exception breakpoint on Couchbase.Lite.Sync.TlsCertificateException could shed some light? That’s an internal type but I think you should still be able to input it as an exception breakpoint.

I am sorry, but I will not get time to check this into more detail until starting traveling on Friday. Will look back into this end of October.

Thanks for your help so far.