A bit of context:
I’m querying a large data set using async bucket and stream it to a http response.
I see in logs the following warning:
ERROR util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it’s garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
which leads to the following error: com.couchbase.client.deps.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 3204448263, max: 3221225472) at
Any hints to address this issue would be greatly appreciated.
I’ve filed JCBC-1277 for upgrading to the latest Netty version. In the mean time, one thing to try would be to enable advanced leak detection, which will pinpoint the location of the leak. Try adding this to your Java command line:
(Replace advanced with paranoid to monitor all buffers instead of just 1% of them).
In the Couchbase Java SDK, the Netty system property names are prefixed with com.couchbase.client.deps. So for example if you want to tweak the io.netty.allocator.pageSize property, the name to use would be com.couchbase.client.deps.io.netty.allocator.pageSize.
I have modified vm args as suggested. Additionally the following argument was added:
-Dcom.couchbase.client.deps.io.netty.noPreferDirect=true
to use heap instead of direct memory (this helps trigger the problem faster).
Below is a stacktrace which might help to identify the root cause of the leak:
05-12-2018 13:12:20.622 [cb-io-1-4] ERROR c.c.c.d.i.n.u.ResourceLeakDetector.error - LEAK: ByteBuf.release() was not called before it’s garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at:
com.couchbase.client.deps.io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:572)
com.couchbase.client.deps.io.netty.buffer.PooledHeapByteBuf.copy(PooledHeapByteBuf.java:210)
com.couchbase.client.deps.io.netty.buffer.SlicedByteBuf.copy(SlicedByteBuf.java:181)
com.couchbase.client.deps.io.netty.buffer.AbstractByteBuf.copy(AbstractByteBuf.java:937)
com.couchbase.client.deps.io.netty.buffer.WrappedByteBuf.copy(WrappedByteBuf.java:699)
com.couchbase.client.deps.io.netty.buffer.AdvancedLeakAwareByteBuf.copy(AdvancedLeakAwareByteBuf.java:651)
com.couchbase.client.core.endpoint.view.ViewHandler.parseViewRows(ViewHandler.java:508)
com.couchbase.client.core.endpoint.view.ViewHandler.parseQueryResponse(ViewHandler.java:379)
com.couchbase.client.core.endpoint.view.ViewHandler.decodeResponse(ViewHandler.java:277)
com.couchbase.client.core.endpoint.view.ViewHandler.decodeResponse(ViewHandler.java:72)
com.couchbase.client.core.endpoint.AbstractGenericHandler.decode(AbstractGenericHandler.java:338)
com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageCodec$2.decode(MessageToMessageCodec.java:81)
com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
com.couchbase.client.deps.io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
com.couchbase.client.deps.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:299)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:415)
com.couchbase.client.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
com.couchbase.client.deps.io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
com.couchbase.client.deps.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1304)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:921)
com.couchbase.client.deps.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:135)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:748)
I believe, the issue could be related to the ViewHandler implementation. It probably related to backpressure (slow consumer vs fast producer). As I can see, when the ViewQueryResponse is instantiated it uses the following view row observable:
Given the scenario, when the queried view returns a large data set (about 1mln), how should it is supposed to handle such a load without putting to much pressure on the consumer and eventually get OOM error?
As of Couchbase Java SDK 2.7.9, we’re still on Netty 4.0.56 (most recent version in the 4.0.x line). Couchbase Java SDK 3 (currently in development, to be released alongside Couchbase Server 6.5) will use the newest Netty.
In SDK 3 the query/view result handling is also much improved, which should address the OOM and backpressure issues present in SDK 2.x.
Is there are workaround for the OOM problem? We had an occurrence last week in our production environment. We were running a query that returned a huge payload and I believe that is what caused the OOM. Do you know what we can do in the meanwhile to prevent this from happening again?
we have found a workaround for the memory leak issue by using pagination. More specifically, by using startkey / startkey_docid parameters.
The idea is to expose a method which would return an observable over a potentially infinite number of rows. The returned observable would use the pagination, making the client agnostic about this implementation detail.
Two ideas come to mind, first is that you try everything possible to make sure that the application can keep up with the incoming data so it won’t OOM, by say processing each row as lightly as possible.
Second, you could paginate the data as @alexo is suggesting, so you’re fetching more manageable chunks.
As David says, SDK3 is going to address this directly by providing automatic backpressure handling, so it will slow down requesting rows from the producer, if the consumer cannot keep up.
@graham.pople any meaningful processing would not be light enough to avoid OOM. In our case we just streamed the content to a servlet output stream, which in theory is very fast.
Looking forward to see the SDK3 but I’m sure it is not backward compatible with the current client, making it harder to adopt for larger applications. Btw, what are the timelines for releasing SDK3?