Kafka connector couch base DCP queue

As for requirements,
We are trying to store events on couchbase and using kafka connector to publish messages to Kafka.
As per our understanding, Kafka connector works or reads message based on DCP and from dcp queue.
What happens if dcp queue goes down with replication enabled as well.
Can connector read the events and publish to kafka with out any loss of events once queue is back.
Assume like DCP queue node failure on couchbase.
any ways of maintaining resiliency, can it be just increasing nodes or is there a way to identity within couch base based on offset and use Kafka connector to read based on offset.

Our kafka/dcp lead will be back next week.

Thanks,
As for requirements,
We are trying to store events on couchbase and using kafka connector to publish messages to Kafka.
As per our understanding, Our assumption is that kafka connector works or reads message based on DCP and from dcp queue.
What happens if dcp queue goes down with replication enabled as well.
Can connector read the events and publish to kafka with out any loss of events once queue is back.
Assume like DCP queue node failure on couchbase.
any ways of maintaining resiliency, can it be just increasing nodes or is there a way to identify within couch base based on offset and use Kafka connector to read based on offset.
Can connector be connected to more than one node

@mreiche @david.nault
Can you please help on this

@naveensrinevas17 You don’t have to worry about any of this. If a document is written to Couchbase, the latest version of the document will eventually be published to the Kafka topic. For more details, see Delivery Guarantees | Couchbase Docs

@david.nault
I’ve gone through those documents but no mention of DCP queue whereas our couchbase prod instance has DCP.
does connector read from dcp queue or change log.
I’ld also assume connector has to be connected to a single node.

does connector read from dcp queue or change log.

I don’t know what the precise term is. It reads from something that is similar to a change log, which is exposed over DCP.

I’d also assume connector has to be connected to a single node.

Each connector task connects to each node in the cluster.

If you’d like to learn more about DCP, the protocol is documented in detail here: https://github.com/couchbase/kv_engine/blob/master/docs/dcp/README.md

Obligatory disclaimer: DCP is not an officially supported protocol. Instead, we recommend using a connector like the Couchbase Kafka Connector (which is supported) instead.

Is there a specific failure scenario you’re concerned about?

we need to validate a use case connecting Kafka connector to couch base bucket.
Data written to Couchbase, we wanted to validate based on the x traffic in Couchbase and bringing down one node of Couchbase cluster and validating if messages are replicated once node is back, Connectors can read and publish to Kafka.

We are checking about few things as below
Can couch base replicate messages and messages can be delivered to Kafka without any failure to Couchbase, if so what’s recommended settings that has to be kept on Couchbase.
Can Kafka connector effectively read all messages and deliver to couch base without failure as we can’t have multiple connectors on multiple nodes.

Is this understanding correct,
Message is written to Couchbase and Kafka connector read message from change log or dcp queue, if this dcp queue goes down and comes back, does connector starts scanning from beginning of the bucket or from specific offset as 10 messages on dcp queue and queue goes down while connector reads 4 or 5 messages, once queue is back, we assume messages are replicated back and connector can read and deliver to Kafka.

Best advice I can give is to test those scenarios and verify the connector meets your requirements.

We do have a requirement,

We have a requirement around producers writting documents to couchbase, kafka connectors reading couchbase documents and delivering to kafka.
We would like the documents to be deleted once message delivered successful.
Can kafka connector delete or can it update ttl of couchbase bucket.

Hi @naveensrinevas17,

Thank you for starting that separate topic. I have responded with some thoughts over there.

Thanks,
David

Thanks @david.nault
Can we use couchbase sink connector to delete documents from Couchbase based on identifier that documents have been read by couchbase source connector.

What kind of identifier?

What I suggested in the other thread is that the sink could listen to the topic the source publishes to, and delete every document mentioned in that topic; that way, if the sink sees a message, it knows the source has processed it, because it must have seen it in order to publish it to the topic.

Note that you’d have to configure the source connector to ignore deletions, otherwise you could end up with a feedback loop. See the documentation about the DropIfNullValue SMT Quickstart | Couchbase Docs

Thanks,
David

1 Like

@david.nault
Not sure if i understand correctly, was asking for ways to delete messages in couchbase or source and not in kafka(destination).
Was asking if there ways or tools that can be used to update ttl on run time.
We would like to delete documents in couchbase after successful delivery and not before

Correct. And the way to delete documents in the source when the Couchbase Source Connector sends the message to Kafka is to configure a Couchbase Kafka Sink Connector which deletes the message from the original Couchbase Source (which, for the purpose of this Couchbase Kafka Sink Connector, is the sink). So you’ll have two Couchbase Kafka Connectors configured. A Source which sends the Couchbase update to Kafka, and a Sink, which receives the message sent by the Source connector, and deletes the document in the Couchbase source.

Thanks @david.nault , I’ll do a POC and let you know if in case of any doubts.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.