Any exception from Socket.Send() closes the connection:
SendTimeoutExpiredException does not do that.
This leaves the connection in the pool for quite a while. In our case the socket never recovers. Never throws an exception either. So, the connection stays in the pool and keeps causing timeout exceptions.
It’s pretty urgent issue here. Could you suggest any workarounds that I can implement in my code until you fix it?
Thanks for reporting, this does seem to be a bug; I am surprised this wasn’t identified sooner. I created a ticket NCBC-2200 for tracking, a fix should be in one of the next two releases.
You can configure the SDK to use a custom IConnection implementation or pool, but it’s fairly tricky. I would suggest pulling from above (you’ll need to create an account) and then seeing how well it works. Code in Gerrit is pre-merge into Github so it’s only partially tested.
@jmorris, checked the patch, that’s just one place. There are more variations of Send() in that file and they all seem to have the same bug, including async versions
we are having the same issue above. we updated the client to version 2.7.16, but we still have the same errors and client doesn’t handle the disconnection:
The operation has timed out. [“s”:“kv”,“i”:“11ba6”,“c”:“296b20a52cd96057/6a54c9fffca3f628”,“b”:“pricing”,“l”:“10.70.2.11:52232”,“r”:“10.70.4.22:11210”,“t”:15000000] ckey: Cruise.Domain.Rule.Model.SaleRule68_DEP
Couchbase.IO.SendTimeoutExpiredException: The operation has timed out. [“s”:“kv”,“i”:“11ba6”,“c”:“296b20a52cd96057/6a54c9fffca3f628”,“b”:“pricing”,“l”:“10.70.2.11:52232”,“r”:“10.70.4.22:11210”,“t”:15000000]
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Cruise.Cache.Base.Cache.CouchbaseCache.d__19`1.MoveNext()
As mdomashchenko said, can you check that the bug is resolved?
What makes you think that the connection was not closed and recreated after the timeout? TBH, in the issue the connection was closed, it was just a few milliseconds later in the calling method. The fix just ensured that it was closed sooner.
Because I have a lot of logs similar to the previous one, and if I don’t restart the application, the connection doesn’t work. We did a workaround, closing and reinizializing the ClusterHelper as soon as we are catching a SendTimeoutExpiredException.
You’ll need to do some diagnosis; the SDK Doctor should be able to help you. Assuming the the timeout are consistent and reproducible, the doctor may help with the diagnosis. If nothing surprising is found, you can probably assume it’s ephemeral or random. From here I would check to see if something on the network is closing the connection (common in cloud environments) and/or TCP Keep-Alives are enabled and the ClientConfiguration.TcpKeepAliveTime and/or ClientConfiguration.TcpKeepAliveInterval are tuned to the environment.
That being said, connections could be closed locally, by the OS, network appliance or the server itself, so its a matter of isolating the cause as timeouts are a symptom. Furthermore, I suspect if you look into the logs you’ll see that connections are being recreated after the timeout.