Currently I am using the DCP Client as a streaming processor in NiFi and storing the SessionState in a JSON file every x seconds. My problem now is that I can’t store the SessionState after each Mutation or Deletion and I also cannot rollback the Mutations that I sent out, as the receiving processor is not a Couchbase Server.
So how do I persist the DCP Streaming Client, so when I restart the processor there is no lost data that was never streamed, but also no data streamed twice?
Basically, how do I keep track which documents were streamed and which not? Would a Kafka NiFi processor be a solution?
Exactly-once delivery is a hard problem. The Couchbase Kafka connector currently does not guarantee “exactly once” delivery due to a limitation of the Kafka Connect framework. Since you’re already consuming DCP streams directly*, perhaps you could experiment with Idempotent Kafka Producers, but that has limitations as well.
The best advice I can offer would be to architect the system in such as way that “at least once” is a sufficient guarantee. Which, to be fair, can also be a hard problem.
* Obligatory disclaimer: DCP is not officially supported.