Hello, I am seeing the following exception:
2018-01-25 16:26:35.913 WARN 3152 — [tionScheduler-2] c.c.c.c.endpoint.AbstractGenericHandler : [/127.0.0.1:11210][KeyValueEndpoint]: Got error while consuming KeepAliveResponse.
java.util.concurrent.TimeoutException: null
at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber.onTimeout(OnSubscribeTimeoutTimedWithFallback.java:166) [rxjava-1.3.4.jar:1.3.4]
at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber$TimeoutTask.call(OnSubscribeTimeoutTimedWithFallback.java:191) [rxjava-1.3.4.jar:1.3.4]
at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$2.call(EventLoopsScheduler.java:189) [rxjava-1.3.4.jar:1.3.4]
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) [rxjava-1.3.4.jar:1.3.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_144]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_144]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
I think you answered your own question in the subject - how big are your 100,000 document on average? What’s the network throughput to the server? What time-out have you set (or are you using the default)? If you calculate this you can probably estimate how long it takes just to transmit the request, not including the processing on both the client and server.
Most likely you’re hitting the operation timeout sending such a large amount of data. I suggest you either increase the timeout, and/or reduce the batch size.
Note that’s from the SDK’s internal keepalive check. It’s not an actual problem, but rather a symptom of another problem.
My guess here is that you may be first overloading the ringbuffer, getting a backpressure exceptions, but creating more work and causing a lot of JVM memory pressure, which then turns into GC activity.
Had the same issue with 100,000 document in one operation. Try doing 10 batches of 10,000 - this worked for me. You can also try playing with these settings:
Hello, I am having basically the same issue. Is there a way to throttle as needed to avoid these timeouts, rather than just tuning batch size, timeout length, etc.? I can see that couchbaseBucket.async().upsert(docToInsert).retryWhen(…) does not catch these timeouts and apply a delay to retry, so I guess this throttling would have to happen somewhere else?
Also, I assume data is being lost when I see this timeout. However, it’s not obvious to me from the trace where I can catch the timeout exception and resend the data. Can you please clarify the correct way to do that?
If you’re using the async API, unless there is something else that slows the rate at which you add requests, like network IO, you are likely to put too much memory pressure on the system and start to timeout. An asynchronous operation can be thought of as a memory allocation. You can do many of them quickly, and if you don’t wait for some to get through, it may overwhelm the system.
Note that one of the features of Java SDK 3.x is automatic retry of idempotent ops. It also has differentiation in the error handling between things that unambiguously failed and things that may or may not have been processed. Check out some of the documentation and blogs on this.
meta note: it’s much better if you don’t pick up a 3 year old thread with a slightly different question; here you’re asking about async and a timeout exception is rather generic.