Kafka Connect Couchbase Out of Memory

Trying to use the new Kafka Connect Couchbase connector in a distributed mode and running into Out of Memory issues…

We are running in distributed mode with 2 tasks and set the partition number to 1 for all topics so that we limit, for the moment, the amount of memory used by the producer.

We can see messages like
2017-03-02 13:21:01,224] INFO Poll returns 325524 result(s)(com.couchbase.connect.kafka.CouchbaseSourceTask:170)

and looking through the code this is the number of SourceRecord held in memory to be sent to Kafka!

This is coming from the blocking queue that are populated by the DCP streams but can’t see any bound on the queue, so this can grow indefinitely…

Is there a way to limit the number of DCP event that can be received and processed in the connector, so that we do not have an exponential usage of memory? This is happening as we start against a couchbase bucket with a lot of history and data in it!

Many thanks.

Which version are you on? The current release is 3.1.1. We do have flow control, so I don’t think you should see this. Is there anything unusual about the items? @avsej may have some other thoughts.

@lbertrand thank you for report. I’ve managed to reproduce it, I will get back here with more info soon.

The problem not in the queue side, but rather in the way how we are batching records for Kafka.

Ticket on our bug tracker: https://issues.couchbase.com/browse/KAFKAC-66
The fix (will be released in 3.1.2): http://review.couchbase.org/74624

Thank you very much for looking into this…