Is there any bound to Mutation deduping

Hi @naftali,

Yes even a a few seconds apart should be enough to to track your changes. Current customer do this already understanding if they get a burst of 100 updates in a second they will only see the most current document.

Below I will give some interesting alternatives that might be applicable to your architecture where I use Eventing a realtime lambda to help out.

CASE 1: Avoiding Dedup

Note if you have control of writing your data you could always write a unique key say basename:baseid:timestamp_millis_since_epoch:random_int in this case dedup would never impacet you.

Then to ensure you could easily access the most recent document you could then use an Eventing Function to update a document basename:baseid with the latest unique basename:baseid:timestamp_millis_since_epoch:random_int for example say I have a basename of “customer” and the following Eventing Function deployed

function OnUpdate(doc, meta) {
    // Given unique keys <basename>:<baseid>:<timestamp_millis_since_epoch>:<random_int>
    // we update a common key <basename>:<baseid> with the latest most current update 
    // or mutation.
    var ary = meta.id.split(":");
    if (ary.length != 4 && ary[0] !== "customer") return;
    var basekey =  ary[0] + ":" + ary[1];
    // src_col is a an alias to a Bucket binding for the source bucket/collection in r+w mode
    src_col[basekey] = doc;
}

Create document in the source bucket/collection
customer:001:1638368667660:999181{"a": 1}

The : is created in the source bucket
customer:001{"b":1}

Create another document in the source bucket/collection
customer:001:1638368667661:999182{"b": 2}

The : is updated in the source bucket
customer:001{"b":2}

The final document set in the source bucket/collection

  customer:001{"b":2}
  customer:001:1638368667660:999181{"a": 1}
  customer:001:1638368667661:999182{"b": 2}

So here you have not only the most current document you want as basename:baseid but you also have the history with millisecond timestamps as basename:baseid:timestamp_millis_since_epoch:random_int available.

You could improve this if you used a long counter instead of random_int then you would have correct ordering. Note the Eventing Function would stay the same.

CASE 2: Approximate Reverse Dedup

As you indicated a day (or even a second or two spread) will result in each mutation not being subject to dedup (barring some major issue like a network outage or a long rebalance).

Thus you could do what I call an “Approximate Reverse Dedup” where you take basename:baseid and in realtime you use Eventing to create a new a new basename:baseid:timestamp_millis_since_epoch:random_int document for example say I have a basename of “customer” and the following Eventing Function deployed

function getRandomInt(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min) + min); //The maximum is exclusive and the minimum is inclusive
}

function OnUpdate(doc, meta) {
    // ignore all approximate reverse dedup records, process only <basename>:<baseid>
    var ary = meta.id.split(":");
    if (ary.length !== 2 || ary[0] !== "customer") return;
    
    // Given a base key (for this mutation) of <basename>:<baseid> we want to store the 
    // history as <basename>:<baseid>:<timestamp_millis_since_epoch>:<random_int>
    var myMillis = Date.now();
    var myRand = getRandomInt(1000000,9999999);
    var fullkey =  meta.id + ":" + myMillis + ":" + myRand;
    // src_col is a an alis to a Bucket binding for the source bucket/collection in r+w mode
    src_col[fullkey] = doc;
}

Create document in the source bucket/collection

customer:001{"a":1}

A new basename:baseid:timestamp_millis_since_epoch:random_int is created in the source bucket (note your timestamp and. random_int will differ)

customer:001:1638370428288:8976100{"a":"1"}

Update the document in the source bucket/collection

customer:001{"b":2}

A new basename:baseid:timestamp_millis_since_epoch:random_int is created in the source bucket
and of course the original document was updated

customer:001:1638370461488:9534833{"b":"2"}

The final document set in the source bucket/collection

  customer:001{"b": "2"}
  customer:001:1638370428288:8976100{"a":"1"}
  customer:001:1638370461488:9534833{"b":"2"}

Case 3: Keep the last N documents

If you just want access to the the last N versions you might also consider this design pattern, refer to the Eventing Documentation specifically the Scriptlet Function: Advanced Keep the Last N User Items | Couchbase Docs

Best

Jon Strabala
Principal Product Manager - Server‌

1 Like