My question is regarding the ordering of messages from CouchBase to Kafka topic. This is not well documented in my opinion and hence this question.
Use case
The use case is to get the changes for every document and get the latest change for every document. Till the messages reach the Kafka topic, its real time and the consumer of the Kafka topic can be a batch application as well. This means that i may end up receiving multiple events for the same document in a given batch.
My understanding
- CouchBase bucket has multiple vBuckets
- When a document is inserted into the bucket, it gets into one of the vBuckets (based on the hash of the document key). This means that if the document with the same key is updated, it goes to the same vBucket.
- CouchBase streams the change events using the DCP for the other applications like Kafka Source connector to consume
- CouchBase ensures the ordering of the events per vBucket. Cluster wide ordering is not guaranteed. This means that if the document with the same key is updated multiple times, then those event ordering is guaranteed.
- Now, when the Kafka source connector reads the DCP events, it reads in the same order that came into the DCP streams
- Ordering within a vBucket is guaranteed. So far so good
Question
- When the Kafka source connector publishes the messages to the Kafka topic, does it maintain the same ordering?
- How does the source connector decides the kafka partitions for the messages? (assuming that there would be more than 1 partition for the topic)