Looking for a Kafka Couchbase source connector expert to weigh in on this from the couchbase side
Looking for answers to below scenarios during a couchbase upgrade
Couchbase was upgraded to version 7.1.1 as part of this the hostnames and IPs of the cluster we connected to changed however since we have an SRV DNS it was told no action was needed from the Kafka Connect side however this was not the case. The kafka source connector was failing. We had to restart the kafka connector to reestablish connection to new nodes.
We are using kafka connect as source connector to stream changes in couchbase documents to ADLS Gen2 via Kafka and Databricks. When an upgrade happened on the couchbase side we restarted the kafka source connectors to start from beginning however we observed that some of document changes were not being streamed in realtime although they changed on the couchbase side.
What is the behavior of the kafka connect source connector during an upgrade when the cluster IP addresses of cluster are changed? What actions do we take from kafka source connector perspective? Do we need to delete and recreate our offset, status topics, delete the target topic or just restart would be enough?
Do we need to delete and create the connector again?
When we migrate to new IP addresses as part of couchbase upgrade what happens to the Vbuckets and DCP messages. Will they be migrated exactly how they were originally in the old cluster Ips or would that be changed ?
Sorry about that. If the nodes with new IP addresses are gradually rotated into the cluster, there shouldn’t be a problem… but if the addresses all change a the same time, the connector won’t know where to connect.
A future version of the connector (hopefully 4.1.10) will be able to fetch a whole new set of node addresses from DNS SRV, but that feature is not yet available.
Kafka Connect doesn’t have great offset management tools (at least not in the year 2022). In my opinion, the most reliable way to restart the stream from the beginning is to rename the connector, so the Kafka Connect framework stores the source offsets in a different namespace.
Just restart should be enough. In a future version of the connector, if you’re using DNS SRV then you won’t even need to restart.
I’m afraid I don’t know. If you want an authoritative answer, you could ask the experts in the Couchbase Server category whether Couchbase vbucket UUIDs are preserved during upgrade. The answer might depend on how the upgrade was done.
If you have the luxury of being able to restart the Kafka stream from the beginning (by stopping, renaming, and starting the connector), that’s the most reliable solution.
Thanks @david.nault I have raised another topic with the couchbase server group. Still have a question on stream from beginning. When we set to stream from beginning I was hoping we dont have to rename the connector. Does the connector still refer to offsets when we stream from beginning. Will it not create new references for offsets?
Another way to tell the connector to stream from the beginning is to set the couchbase.stream.from config property to BEGINNING.
When run with this setting, it still periodically saves source offsets. Once the offsets are saved (which might take quite a while depending on how Kafka Connect is configured – or might never happen if there are no documents in a certain vbucket) you can restart the connector with couchbase.stream.from set to SAVED_OFFSET_OR_BEGINNING. Then it should resume from where it left off. I emphasise “should” because recently I saw a case where it looked like this wasn’t happening; still investigating that.