Getting java.util.concurrent.TimeoutException when inserting more than 100,000 documents

csanchez · January 25, 2018, 9:32pm

Hello, I am seeing the following exception:
2018-01-25 16:26:35.913 WARN 3152 — [tionScheduler-2] c.c.c.c.endpoint.AbstractGenericHandler : [/127.0.0.1:11210][KeyValueEndpoint]: Got error while consuming KeepAliveResponse.

java.util.concurrent.TimeoutException: null
at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber.onTimeout(OnSubscribeTimeoutTimedWithFallback.java:166) [rxjava-1.3.4.jar:1.3.4]
at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber$TimeoutTask.call(OnSubscribeTimeoutTimedWithFallback.java:191) [rxjava-1.3.4.jar:1.3.4]
at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$2.call(EventLoopsScheduler.java:189) [rxjava-1.3.4.jar:1.3.4]
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) [rxjava-1.3.4.jar:1.3.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_144]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_144]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]

when executing this code:

 Observable
         .from(documents)
         .flatMap(new Func1<JsonDocument, Observable<JsonDocument>>() {
           @Override
           public Observable<JsonDocument> call(final JsonDocument docToInsert) {
             return couchbaseBucket.async().upsert(docToInsert);
           }
         }).retryWhen(RetryBuilder
         		.anyOf(Exception.class)
         		.delay(Delay.exponential(TimeUnit.MILLISECONDS, 1000))
         		.max(10)
         		.build())
         .last()
         .toBlocking()
         .single();

Has anyone experience this problem? Can you provide any pointer as to what may be the root cause?

drigby · January 25, 2018, 10:56pm

I think you answered your own question in the subject - how big are your 100,000 document on average? What’s the network throughput to the server? What time-out have you set (or are you using the default)? If you calculate this you can probably estimate how long it takes just to transmit the request, not including the processing on both the client and server.

Most likely you’re hitting the operation timeout sending such a large amount of data. I suggest you either increase the timeout, and/or reduce the batch size.

ingenthr · January 26, 2018, 7:29pm

Note that’s from the SDK’s internal keepalive check. It’s not an actual problem, but rather a symptom of another problem.

My guess here is that you may be first overloading the ringbuffer, getting a backpressure exceptions, but creating more work and causing a lot of JVM memory pressure, which then turns into GC activity.

fruel · January 29, 2018, 12:07pm

Had the same issue with 100,000 document in one operation. Try doing 10 batches of 10,000 - this worked for me. You can also try playing with these settings:

 CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder()
            .connectTimeout(10000)
            .keepAliveTimeout(100000000)
           // .requestBufferSize(131072)
           // .responseBufferSize(131072)
            .build();

jc · October 2, 2020, 4:01pm

Hello, I am having basically the same issue. Is there a way to throttle as needed to avoid these timeouts, rather than just tuning batch size, timeout length, etc.? I can see that couchbaseBucket.async().upsert(docToInsert).retryWhen(…) does not catch these timeouts and apply a delay to retry, so I guess this throttling would have to happen somewhere else?

Also, I assume data is being lost when I see this timeout. However, it’s not obvious to me from the trace where I can catch the timeout exception and resend the data. Can you please clarify the correct way to do that?

ingenthr · October 3, 2020, 6:23am

If you’re using the async API, unless there is something else that slows the rate at which you add requests, like network IO, you are likely to put too much memory pressure on the system and start to timeout. An asynchronous operation can be thought of as a memory allocation. You can do many of them quickly, and if you don’t wait for some to get through, it may overwhelm the system.

Note that one of the features of Java SDK 3.x is automatic retry of idempotent ops. It also has differentiation in the error handling between things that unambiguously failed and things that may or may not have been processed. Check out some of the documentation and blogs on this.

meta note: it’s much better if you don’t pick up a 3 year old thread with a slightly different question; here you’re asking about async and a timeout exception is rather generic.

Topic		Replies	Views
Problem with bulk insert of documents - Java SDK 2.5 Java SDK	0	1470	September 29, 2017
Got error while consuming KeepAliveResponse. java.util.concurrent.TimeoutException Java SDK	0	1342	June 25, 2019
Couchbase server performance degradation Couchbase Server connections , java	5	1447	October 7, 2019
java.util.concurrent.TimeoutException on get and upsert operations Java SDK	1	2348	August 22, 2016
Time out Exception Java SDK	7	4090	November 23, 2015

Getting java.util.concurrent.TimeoutException when inserting more than 100,000 documents

Related topics