N1QL queries throw TimeoutException after the application runs for some days

Hi. I’ve built an Java application running on Google Cloud, as well as the Couchbase. Everything was fine until I got the first TimeoutException like this when executing N1QL queries in the application:

java.lang.RuntimeException: java.util.concurrent.TimeoutException
	at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:71)
	at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:652)
	at com.admin.cb.dao.impl.CbQueryDaoImpl.query(CbQueryDaoImpl.java:28)
	at com.admin.service.impl.AdvancedFunctionsServiceImpl.query(AdvancedFunctionsServiceImpl.java:37)
	at com.admin.controller.tools.AdvFunController.query(AdvFunController.java:68)
	at sun.reflect.GeneratedMethodAccessor1713.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:114)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at com.admin.filter.AccessFilter.doFilterInternal(AccessFilter.java:71)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
	at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
	at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:262)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:478)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
	at psiprobe.Tomcat85AgentValve.invoke(Tomcat85AgentValve.java:40)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:624)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:799)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:861)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1455)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException
	... 58 more

Restarting the application could temporary solve the problem. But after running the application for some days, the same TimeoutException threw again.

What I observed was that part of the queries of a request from the application could be finished, but it started throwing TimeoutException when executing the remaining query. For example, there is a request from the application that contains 10 queries. The first 5 queries finish without problem, but a TimeoutException throws when executing the 6th query. In the same request, it is always the 6th query getting TimeoutException. However, this query has no problem when running individually in the Couchbase web console or via the Java Client.

Here’s more info.
Couchbase: version 4.6.3-4136, 4 nodes on a cluster, use Memory-Optimized Global Secondary Indexes
Java client: version 2.3.1

I don’t know much about the Couchbase and currently have no idea of how to solve this TimeoutException. Could anyone provide some suggestions on this?

Okay so first its important to understand that a timeout is always the effect of a problem, never the root cause. So we need to find out whats causing a timeout (that is why a request took longer than the timeout maximum wait specified).

  • Can you share the code you are using and the timeout settings?
  • Can you update the java sdk to the latest version and see if the issue persists?

When you say only restarting helps, do you mean that once a timeout is thrown the app doesn’t work anymore? Are you properly catching the timeout and don’t let your main threads die? Just tossing out some initial ideas before getting more info.

Thanks for the reply. Here’s the code I use for querying the Couchbase server.

public N1qlQueryResult query(String n1ql) {
     return this.getBucket().query(N1qlQuery.simple(n1ql), 180, TimeUnit.SECONDS);
}

private N1qlQueryResult query(Statement statement) {
     return this.getBucket().query(statement, 30, TimeUnit.SECONDS);
}

private N1qlQueryResult query(String projection, List<String> criteriaList, int limit, int offset, Sort orderBy) {
     Expression criteriaExpression = Expression.TRUE();
     if (criteriaList != null)
          criteriaExpression = criteriaList.stream().map(criteria -> Expression.x(criteria)).reduce(Expression::and).get();
     N1qlQueryResult result;
          if (orderBy!=null)
               result = this.getBucket().query(Select.select(projection).from("`admin`")
                       .where(criteriaExpression)
                       .orderBy(orderBy)
                       .limit(limit).offset(offset)
               , 30, TimeUnit.SECONDS);
          else
               result = this.getBucket().query(Select.select(projection).from("`admin`")
                       .where(criteriaExpression)
                       .limit(limit).offset(offset)
               , 30, TimeUnit.SECONDS);
     return result;
}

The application still works when TimeoutException throws. Some simple requests that contain only 2 or 3 queries are able to finish. But for requests contain more queries, TimeoutException throws after running some of the queries so these requests cannot finish and return error to the frontend. The application never stops working because of the TimeoutException.

One thing I forgot to mention was if I didn’t restart the application, it might ‘recover’ by itself within a day.

And we are considering upgrading the Java client, but not now.

Based on the query I see there, it looks like you may want to look at adding indexes or doing other optimization. It’s hard to say without more information, but it may be that the current execution complexity of the query does take more than the allotted 30s or 180s.

One quick way to find out would be to look at the query monitoring to see if it is indeed taking that long to execute. Then look at an explain, see if you can optimize it with indexes or adjustments to the query.

When you restart, is it purely the client, or the cluster as well? I ask because the query service does also have some garbage collection that does need to be done from time to time, which could explain why it’s intermittent. You may want to try to correlate these to the stats.