TimeoutException during load test

Hi

Everything run fine with very good response time when system is not under load. The following exception is raised during load testing(with bad response time). Interestingly, no resource(cpu or memory on couchbase or app server) is saturated. Couchbase is running around 20% cpu and 50% memory. JBoss is running 70% cpu and 20% memory. While the system is under stress and app servers are throwing the TimeoutException, if I query Couchbase directly through cbq-shell and unit-test outside the JBoss server, response time is good. It seems some settings in Couchbase driver(running on the JBoss server) are limiting the throughput.

Couchbase Enterprise 4.5.0
3 data nodes c4.xlarge on 3 Availability zones(with server groups match AZ)
2 index nodes c4.xlarge on 2 Availability zones(with server groups match AZ)
2 query nodes c4.8xlarge on 2 Availability zones(with server groups match AZ)

Java SDK 2.4.2

16:13:27,229 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/DeviceServer].[resteasy-servlet]] (http--0.0.0.0-8443-502) Servlet.service() for servlet resteasy-servlet threw exception: org.jboss.resteasy.spi.UnhandledException: java.lang.RuntimeException: java.util.concurrent.TimeoutException
        at org.jboss.resteasy.core.SynchronousDispatcher.handleApplicationException(SynchronousDispatcher.java:340) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.handleException(SynchronousDispatcher.java:214) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.handleInvokerException(SynchronousDispatcher.java:190) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:540) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:502) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:119) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50) [resteasy-jaxrs-2.3.2.Final.jar:]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec-1.0.0.Final.jar:1.0.0.Final]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:329) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:489) [jbossweb-7.0.13.Final.jar:]
        at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153) [jboss-as-web-7.1.1.Final.jar:7.1.1.Final]
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368) [jbossweb-7.0.13.Final.jar:]
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) [jbossweb-7.0.13.Final.jar:]
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671) [jbossweb-7.0.13.Final.jar:]
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930) [jbossweb-7.0.13.Final.jar:]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_95]
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException
        at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:73) [java-client-2.4.1.jar:]
        at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:637) [java-client-2.4.1.jar:]
        at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:572) [java-client-2.4.1.jar:]
        at com.samsung.retail.cms.metadatastore.SearchEngine.deviceSearchPartnerExperience(SearchEngine.java:1133) [cms-0.0.1-SNAPSHOT.jar:]
        at com.samsung.retail.cms.rest.CmsDeviceProcessor.devicePartnerExperienceSearch(CmsDeviceProcessor.java:151) [classes:]
        at sun.reflect.GeneratedMethodAccessor329.invoke(Unknown Source) [:1.7.0_95]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_95]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_95]
        at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:155) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:257) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:222) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:211) [resteasy-jaxrs-2.3.2.Final.jar:]
        at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:525) [resteasy-jaxrs-2.3.2.Final.jar:]
        ... 20 more
Caused by: java.util.concurrent.TimeoutException

Hi, any advise? Should I tune up queryEndpoints?

Hi @mikehcumail,

How much is your load and the query payload? Could the load query ops/s saturate your network? Tuning query endpoints wouldn’t really help much with timeout exceptions. It can help boost performance in cases where just using few sockets is the bottleneck, it can parallelize query requests.

During heavy load, the system gets around 300 query executions per second.
And this is the situation, I receive concurrent timeout exception. When the
system is not under load, the query take only 250ms to complete.
So, I guess the ops or queries saturate the network? How can we increase
the throughput in this situation?

It would be best to monitor network interface bandwidth utilization using iptraf or such tool and allocate more resources if required or scale down the load.

I think actually it is possible to see a timeout @subhashni if we don’t have enough queryEndpoints. After we exhaust them, we queue requests, right? And since in 2.4.1 we disabled pipelining because of a defect with pipelining, that queue could go beyond what’s expected.

We have an open documentation issue to make sure this is covered, but since you’re on 2.4.2, you can use the dynamic pool size. For instance, to allow number of connections to start at 4 and grow to 128, something like this would work:

final CouchbaseEnvironment env = DefaultCouchbaseEnvironment.builder()
            .queryServiceConfig(QueryServiceConfig.create(4, 128))
            .build(); 

I’ll also note that when I’ve done benchmarking to try to maximize throughput on EC2, I had to go to an enhanced networking configuration between the clients and servers. This isn’t just Couchbase, it’s common if you’re trying to get everything out of the system. Since you’re split on AZs, I don’t think you can actually do this. A quick thing you can do to see what the ‘ceiling’ is, so to speak, is run iperf3 between a machine running the code on the SDK and a machine running the query service.

If you’re not seeing something close to the 750 Mbps Amazon advertises (assuming the client is capable, you don’t say what that is), then you will probably be held back to the actuals. My testing was with rather different instances, but just to give you a sense of it, I was at 25% of the advertised instance between the two nodes. I think I could get the full throughput out of the EC2 machines interface if used to many nodes, but couldn’t get that between two nodes without enhanced networking.

1 Like

True, but the default query timeout is large 75s. Now with the bounded request queue, I’d expect a lower throughput than timeouts.

Ah, true enough… unless @mikehcumail overrode it. Still, I’d recommend on 2.4.2 or later to remove the queryEndpoints by default, then apply the tuning I’d mentioned if more connections are needed. I suspect the environment is the issue at the moment.