While trying to bring up application we are getting below exception in application log.
To troubleshoot further we checked below things,
1.There is no N/W latency between app and CB node.
2. We can see that connection is established over port 8093 between app node and CB
node.
3. We had checked the compatibility between SDK(3.x) and CB Server(7.1.x).
Could you please help me on the below error.
[vf-usage-metering:318:349]|com.mobileum.app.vf.cache.cb.VFCallDistributionCacheCBRefresher|useRecord|105
Fri Oct 13|15:14:02.054|W|cb-events|[com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"DISABLED","connectedForMs":18449,"coreId":"0xc31449500000001","local":"XX.XX.XX.XX:50514","numOutstandingRequests":0,"remote":"XX.XX.XX.XX:8093","type":"QUERY"}|com.couchbase.endpoint||
Fri Oct 13|15:14:02.071|W|cb-events|[com.couchbase.endpoint][EndpointConnectionFailedEvent][10s] Connect attempt 1 failed because of TimeoutException: Did not observe any item or terminal signal within 10000ms in 'source(MonoDefer)' (and no fallback has been configured) {"circuitBreaker":"DISABLED","coreId":"0xc31449500000001","remote":"XX.XX.XX.XX:8093","type":"QUERY"}: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 10000ms in 'source(MonoDefer)' (and no fallback has been configured)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:295)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:280)
at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:419)
at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
@dh for response but there is no such tool i:e: Wireshark . I am using below netstat command to check if connection is establishing or not. Let me check query.log file and update here.
If there is no problem present in the query.log, could you install and use Wireshark or similar? ( https://www.wireshark.org/download.html ) You would need admin privileges but this could be solely on the application machine to start with.
Netstat only tells us that a connection was established (and perhaps stats on the amount of data waiting to be sent/received); packet sniffing with something like Wireshark would let us see what the application last sent and when, and what the server responded with and when. This may short-cut the path to understanding why the exception was raised.
( If you were to go this route and nothing was evident on the client, then it may be necessary to monitor from the server - in case it is something between the two forcing the disconnection. You only stated observing low latency, not whether or not individual connections are severed. )
That’s just the logging of an event. The SDK does all sorts of things - connections, reconnections etc on it’s own. Is any operation actually failing?
A timeout when making a connection is usually when making a connection to cloud and the client is not allowed. ( If it is not cloud, and there is nothing listening at the address, there would be repeated ConnectionRefused failures. )
We can see that connection is established over port 8093 between app node and CB node.
From where? The client? What AWS does is if the client is not allowed, it accepts the connection - but does not let it complete, so the client hangs until it times out. AWS does that to prevent rapid port-scanning.
You might get more information if you tried accessing with curl --verbose. Or SDK Doctor. Or using waitUntilReady() in your application.
@dh Thanks for your response. We did not get any error query.log file. After further discussion with Dev team they told the application is up and running W/O issue, but after few mins the application experiencing the issue. Before getting above error in application log we are seeing below errors also. Any clue why these error getting captured in log.
> Mon Oct 16|11:28:15.374|W|cb-events|[com.couchbase.endpoint][UnexpectedEndpointDisconnectedEvent] The remote side disconnected the endpoint unexpectedly {"circuitBreaker":"DISABLED","connectedForMs":243572,"coreId":"0x28bf2e1f00000001","local":"10.10.16.118:44856","numOutstandingRequests":1,"remote":"10.10.18.196:8093","type":"QUERY"}|com.couchbase.endpoint||
If the Query service has not restarted (did you check for this?) nor logged anything then it has to be between the application and the server - likely something like a firewall etc. (Hence the Wireshark idea to monitor the actual traffic.)
(Or it is simply a request that has timed out though that’d normally be reflected by a timeout exception, and of course 243 seconds seems like quite an odd timeout. )
Perhaps others may have other ideas, but the Query service won’t sever the connection without responding unless it itself suffers an issue (which should be logged or at least you’d see evidence of a restart).
If this is Enterprise Edition, it may be best pursued via a support case.
Thinking a little more about it, it seems you could monitor/capture system:active_requests (or the active requests endpoint) say every 30 seconds to see if there is a constant request executing during the 4 minutes prior to the exception being raised. This may help direct next steps.
And of course, mreiche’s suggestions for checking from the client will help direct things too.
If it happens during a query, and apparently it does - I would suspect there is network equipment configured to drop idle connections. (A query request that is sent and waiting for a response appears idle to anything in between)
The SDK closes idle http connections after one second by default, so this was apparently not an idle connection. Unless you’ve specified a different idle timeout value? The more information you show in you questions, the better help that can be provided. The SDK dumps out all the config settings on startup.