We are using latest java sdk 2.7.20 and we recently migrated to couchbase 6.6. We introduced a health check to check the cluster health every few seconds. We are seeing intermittent RequestCancelledException when using reactive n1ql using the fail fast strategy.
It takes 1 second before the error is propagated back to client and it is going to break our SLA to respond within 100-200 ms in case of n1ql to retry.
Stack trace
throwable_class=com.couchbase.client.core.RequestCancelledException" “throwable_message=Could not dispatch request, cancelling instead of retrying.” "stacktrace_elements=com.couchbase.client.core.retry.RetryHelper.retryOrCancel(RetryHelper.java:51),com.couchbase.client.core.service.PooledService.unsubscribeAndRetry(PooledService.java:420),com.couchbase.client.core.service.PooledService.access$1200(PooledService.java:50),com.couchbase.client.core.service.PooledService$9.onError(PooledService.java:401),rx.observers.SafeSubscriber._onError(SafeSubscriber.java:153),rx.observers.SafeSubscriber.onError(SafeSubscriber.java:115),rx.subjects.SubjectSubscriptionManager$SubjectObserver.onError(SubjectSubscriptionManager.java:227),rx.subjects.AsyncSubject.onError(AsyncSubject.java:116),com.couchbase.client.core.endpoint.AbstractEndpoint$2.onSuccess(AbstractEndpoint.java:465),com.couchbase.client.core.endpoint.AbstractEndpoint$2.onSuccess(AbstractEndpoint.java:422),rx.internal.operators.SingleOperatorOnErrorResumeNext$2.onSuccess(SingleOperatorOnErrorResumeNext.java:63),rx.internal.operators.SingleTimeout$TimeoutSingleSubscriber.onSuccess(SingleTimeout.java:79),com.couchbase.client.core.endpoint.AbstractEndpoint$4$1.operationComplete(AbstractEndpoint.java:399),com.couchbase.client.core.endpoint.AbstractEndpoint$4$1.operationComplete(AbstractEndpoint.java:383),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609),com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117),com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:262),com.couchbase.client.deps.io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98),com.couchbase.client.deps.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170),com.couchbase.client.deps.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164),com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472),com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500),com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989),com.couchbase.client.deps.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74),com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30),java.lang.Thread.run(Thread.java:748)
@himanshu.mps it looks like the socket connect attempt was not successful. Can you check the logs in the same timeframe on why the query socket could not be established?
@daschl Attached are the logs from our server. Hope this gives a clue. If you need more logs, let us know the settings that you want us to put in to debug the issue.