Log Document Mutation history

Hi All,
I’d like to know which is the best approach to satisfay the following need. Log all the document mutations that occur in the bucket, giving the possibility to look the document modification history, I mean all the version of the document data during the time.

I think about the eventing service, in order to not impact our software (.net SDK), but maybe is not the correct approach.

Thanks,
Regards

Dario

Hi @Dario_Mazza,

Without knowing your data volumes you could make an Eventing Function like the following:

  • It will keep track of any change to DCP deplication (by this I mean if things change super fast Couchbase’s underlying data base change protocol may dedup multiple version into a single item. So if you had a tight SDK loop changing a document this will not work, but if you have a few milliseconds between changes you should be good.
  • The most recent change will have the same KEY in the arc_bkt (and Bucket binding of arc_bkt to your target bucket say “archive” in r+w mode). This easy to find as it has the same KEY
  • All changes will also have a archive_KEY = KEY + “:” + CAS where CAS is an incrementing number loosely based on seconds since epoch. If you want to find the history you may need a primary index on the “archive” bucket
  • An expiry of 12 hours is set to clean up the history you can adjust this, you can change this.
  • Metadata Purge Interval in the detailed bucket settings by default is three days you can set this lower if you want to free up space faster.

Eventing Function name: ArchiveDocHistory12Hours

// A bucket binding to an archive bucket aliased as 'arc_bkt'
// in r+w mode is required. 

function OnUpdate(doc, meta) {
    var previous_key_info = arc_bkt[meta.id]
    var key_info = extractKeyDataToArchive(doc);
    if (!previous_key_info) {
        // The archive of key information did not exist seed the first
        // deployment with feed boundary Everything fill in everything
        arc_bkt[meta.id] = key_info;
        // Also keep KEY.cas (cas is a timestamp like value) for 12 hours
        couchbase.upsert(arc_bkt,{"id": meta.id + ":" + meta.cas, expiry_date: new Date(Date.now() + 12 * 60 * 60 * 1000)},key_info)
        log(meta.id,"had no prior key_info archived");
        // new doc was added (or initial run) 
        // *** Add any needed code here ...
        return;
    }
    // Determine subset from extractKeyDataToArchive has changed
    var changed = false;
    if (crc64(previous_key_info) != crc64(key_info)) {
        changed = true;
        // since the key_info change update the archive version
        arc_bkt[meta.id] = key_info;
        // Also keep KEY.cas (cas is a timestamp like value) for 12 hours
        couchbase.upsert(arc_bkt,{"id": meta.id + ":" + meta.cas, expiry_date: new Date(Date.now() + 12 * 60 * 60 * 1000)},key_info)        
    }
    // *** Add any needed code here ...
    log(meta.id,"key_info changed:",changed,
        "current", key_info, "previous",previous_key_info);
}

function extractKeyDataToArchive (doc) {
    // *** Adjust the data you want to archive this could be a 
    // **** subset of the doc, however here we do the entire doc
    return doc;
}

This code was loosely based on a previous forum post Onupdate eventing function to get the old value - #2 by jon.strabala where extractKeyDataToArchive() was used to only archive on version of a key subset of data.

Okay say you have one document in the “source” bucket …

KEY: anydoc:124
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

When you deploy the Function you will get two items in the “archive” bucket.

KEY: anydoc:124
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623073710383104000
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

The log(…) message for the mutation from the initial deployment will be emitted to the Application log

2021-06-07T07:02:16.606-07:00 [INFO] "anydoc:124" "had no prior key_info archived"

Decoding part of the CAS suffix on the archive key anydoc:124:1623073710383104000 -or- 1623073710 (I am in Pacific Time) as a unix timestamp in seconds we see that the archived doc was mutated on

Mon Jun 07 2021 13:48:30 GMT+0000
Mon Jun 07 2021 06:48:30 GMT-0700 (Pacific Daylight Time)

Now let’s update the source doc change externalId from 1111 to 7777 change nothing else ( in the source bucket ) and of course leave the Eventing Function deployed. Now you will see three documents in the bucket archive.

KEY: anydoc:124
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623073710383104000
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623075290424279040
{
  "id": 124,
  "externalId": 7777,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

Now let’s update the source doc change externalId from 7777 to 8888 change nothing else ( in the source bucket ) and of course leave the Eventing Function deployed. Now you will see four documents in the bucket archive.

KEY: anydoc:124
{
  "id": 124,
  "externalId": 8888,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623073710383104000
{
  "id": 124,
  "externalId": 1111,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623075290424279040
{
  "id": 124,
  "externalId": 7777,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}
KEY: anydoc:124:1623075564122079232
{
  "id": 124,
  "externalId": 8888,
  "more": {
    "megabytes of uninteresting data": "just emulating ...."
  }
}

Best

Jon Strabala
Principal Product Manager - Server‌