Hi,
Is it possible to get all mutations on a document in DCP stream.
It seems even when data is not compacted for a bucket, DCP returns only latest copy of document.
DCP SDK version 0.19
No. DCP will only ever return the latest version of a document - primarily because this is all the Data Service stores.
Thanks @drigby.
One more related question.
Does DCP retain latest mutations for all keys indefinitely ?
For example, in first DCP run, we get 10 mutations, all create.
Then all docs are deleted.
I run next DCP run, starting previous endSeqNo after a month, do I get 10 deletes ?
If it retains the latest mutations indefinitely, does first DCP run of a billion keys data set, return millions of delete mutations for older docs ?
Found https://developer.couchbase.com/documentation/server/3.x/admin/Concepts/concept-tombstone.html.
Metdata purge interval while creating bucket sets time when deletes are removed.
@drigby does DCP send any message or something to indicate it had purged records since last backup ?
Yes, it’s part of the negotiation when a DCP stream is established - see the protocol documentation at: https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/protocol-flow.md
(This is probably where I should highlight that AFIAK the Java DCP client is unsupported; and I don’t know how much of this it exposes to clients…)
Since client missed some deletes, I am guessing that is considered a history branch and client gets a rollback to 0.
So this seems a case where client gets rollback without any failover etc, ie can happen on single node cluster too.
We are using https://github.com/couchbase/java-dcp-client.
Yes, I believe that’s correct. If you wait “too long” you can be effectively guided into rematerialization. That kind of a rollback is effectively always required to be handled. With supported products that use the DCP client (Kafka Connector, Elastic Search connector) we then try to have options for how you want the connector to behave when this occurs. In some cases, it’s “sit on your hands and ask for help”.
I’m curious what you’re looking to do with this @zxcvmnb. Is it possible to describe what you’re aiming for a bit more?
We use java SDK and DCP to get mutations regularly. However some runs might fail, and next successful run might be after Metadata purge has happened.
If we get a rollback to some previous seqno/timestamp, then we handle it.
In this case since vbucket owner uuid will not change, so how does SDK figure out it has to send a rollback ?
Does it check my start seqno/timestamp against last purged seqno/timestamp ?
Generally a DCP client will stash state somewhere (not in the cluster) and use that as a starting point. The SDK doesn’t determine a rollback is needed, rather the cluster does based on the request sequence number But, you do need to possibly manage to vbucket UUID changes, wherein a sequence may go back to an earlier point.
Of course, as mentioned, this is not officially supported, but it is Open Source so you have the power yourself.
Hi, found info in DCP doc indicating we might get multiple versions for a key. See https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/concepts.md.
In Deduplication section
However, when multiple disk snapshots are merged logically into a single DCP backfill snapshot deduplication is not done.
It seems to contradict the said answer. Am I misunderstanding something here ?
@drigby ?