We are planning on replacing our current NoSQL servers with Couchbase. Our evaluation and testing of the Coucbase server (3.0) showed a massive improvement over our current choice and we’re looking forward to switching over completely to Couchbase. However, we ran into some unexpected trouble in our staging servers.
We are using the Java Client 2.0.0 in our Scala Play application. We have a global Cluster and Bucket object, and we use those to get and upsert. Sometimes (not always), with the couchbase server still doing minimal ops/sec (~30), the Java client starts throwing timeout exceptions during which the service is unusable. This goes away after sometime, and seems to be related to load. Here’s what we see in our logs:
Started at Sat Nov 08 06:55:08 UTC 2014","ex":"java.lang.RuntimeException: java.util.concurrent.TimeoutException
at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:481) ~[io.reactivex.rxjava-1.0.0-rc.3.jar:1.0.0-rc.3]
at rx.observables.BlockingObservable.singleOrDefault(BlockingObservable.java:382) ~[io.reactivex.rxjava-1.0.0-rc.3.jar:1.0.0-rc.3]
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:76) ~[com.couchbase.client.java-client-2.0.0.jar:2.0.0-beta-14-gbe5dc12-dirty]
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:71) ~[com.couchbase.client.java-client-2.0.0.jar:2.0.0-beta-14-gbe5dc12-dirty]
...
I’ve not seen this in the development environment. We just started testing 2.0.1 client in dev, but we had not seen the timeout exceptions with 2.0.0, so I’m not sure updgrading our staging machines to 2.0.1 would fix this issue. I’m currently working on upgrading staging servers with the new client, so I will know if that works.
So, the documentation suggests we use a global Cluster object to share connections. What about Bucket? Is opening/closing a bucket around a request a bad idea? Are there any other settings that we should be looking at tuning (timeouts, etc.)?
Any other ideas/pointers to either reproduce this consistently (that would be a start!) or a way to resolve this would be highly appreciated.
Thanks