Trying to use the new Kafka Connect Couchbase connector in a distributed mode and running into Out of Memory issues…
We are running in distributed mode with 2 tasks and set the partition number to 1 for all topics so that we limit, for the moment, the amount of memory used by the producer.
We can see messages like
2017-03-02 13:21:01,224] INFO Poll returns 325524 result(s)(com.couchbase.connect.kafka.CouchbaseSourceTask:170)
and looking through the code this is the number of SourceRecord held in memory to be sent to Kafka!
This is coming from the blocking queue that are populated by the DCP streams but can’t see any bound on the queue, so this can grow indefinitely…
Is there a way to limit the number of DCP event that can be received and processed in the connector, so that we do not have an exponential usage of memory? This is happening as we start against a couchbase bucket with a lot of history and data in it!
Many thanks.