Hi @Dario_Mazza,
Without knowing your data volumes you could make an Eventing Function like the following:
- It will keep track of any change to DCP deplication (by this I mean if things change super fast Couchbase’s underlying data base change protocol may dedup multiple version into a single item. So if you had a tight SDK loop changing a document this will not work, but if you have a few milliseconds between changes you should be good.
- The most recent change will have the same KEY in the arc_bkt (and Bucket binding of arc_bkt to your target bucket say “archive” in r+w mode). This easy to find as it has the same KEY
- All changes will also have a archive_KEY = KEY + “:” + CAS where CAS is an incrementing number loosely based on seconds since epoch. If you want to find the history you may need a primary index on the “archive” bucket
- An expiry of 12 hours is set to clean up the history you can adjust this, you can change this.
- Metadata Purge Interval in the detailed bucket settings by default is three days you can set this lower if you want to free up space faster.
Eventing Function name: ArchiveDocHistory12Hours
// A bucket binding to an archive bucket aliased as 'arc_bkt'
// in r+w mode is required.
function OnUpdate(doc, meta) {
var previous_key_info = arc_bkt[meta.id]
var key_info = extractKeyDataToArchive(doc);
if (!previous_key_info) {
// The archive of key information did not exist seed the first
// deployment with feed boundary Everything fill in everything
arc_bkt[meta.id] = key_info;
// Also keep KEY.cas (cas is a timestamp like value) for 12 hours
couchbase.upsert(arc_bkt,{"id": meta.id + ":" + meta.cas, expiry_date: new Date(Date.now() + 12 * 60 * 60 * 1000)},key_info)
log(meta.id,"had no prior key_info archived");
// new doc was added (or initial run)
// *** Add any needed code here ...
return;
}
// Determine subset from extractKeyDataToArchive has changed
var changed = false;
if (crc64(previous_key_info) != crc64(key_info)) {
changed = true;
// since the key_info change update the archive version
arc_bkt[meta.id] = key_info;
// Also keep KEY.cas (cas is a timestamp like value) for 12 hours
couchbase.upsert(arc_bkt,{"id": meta.id + ":" + meta.cas, expiry_date: new Date(Date.now() + 12 * 60 * 60 * 1000)},key_info)
}
// *** Add any needed code here ...
log(meta.id,"key_info changed:",changed,
"current", key_info, "previous",previous_key_info);
}
function extractKeyDataToArchive (doc) {
// *** Adjust the data you want to archive this could be a
// **** subset of the doc, however here we do the entire doc
return doc;
}
This code was loosely based on a previous forum post Onupdate eventing function to get the old value - #2 by jon.strabala where extractKeyDataToArchive() was used to only archive on version of a key subset of data.
Okay say you have one document in the “source” bucket …
KEY: anydoc:124
{
"id": 124,
"externalId": 1111,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
When you deploy the Function you will get two items in the “archive” bucket.
KEY: anydoc:124
{
"id": 124,
"externalId": 1111,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
KEY: anydoc:124:1623073710383104000
{
"id": 124,
"externalId": 1111,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
The log(…) message for the mutation from the initial deployment will be emitted to the Application log
2021-06-07T07:02:16.606-07:00 [INFO] "anydoc:124" "had no prior key_info archived"
Decoding part of the CAS suffix on the archive key anydoc:124:1623073710383104000 -or- 1623073710 (I am in Pacific Time) as a unix timestamp in seconds we see that the archived doc was mutated on
Mon Jun 07 2021 13:48:30 GMT+0000
Mon Jun 07 2021 06:48:30 GMT-0700 (Pacific Daylight Time)
Now let’s update the source doc change externalId from 1111 to 7777 change nothing else ( in the source bucket ) and of course leave the Eventing Function deployed. Now you will see three documents in the bucket archive.
KEY: anydoc:124
{
"id": 124,
"externalId": 1111,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
KEY: anydoc:124:1623073710383104000
{
"id": 124,
"externalId": 1111,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
KEY: anydoc:124:1623075290424279040
{
"id": 124,
"externalId": 7777,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
Now let’s update the source doc change externalId from 7777 to 8888 change nothing else ( in the source bucket ) and of course leave the Eventing Function deployed. Now you will see four documents in the bucket archive.
KEY: anydoc:124
{
"id": 124,
"externalId": 8888,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
KEY: anydoc:124:1623073710383104000
{
"id": 124,
"externalId": 1111,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
KEY: anydoc:124:1623075290424279040
{
"id": 124,
"externalId": 7777,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
KEY: anydoc:124:1623075564122079232
{
"id": 124,
"externalId": 8888,
"more": {
"megabytes of uninteresting data": "just emulating ...."
}
}
Best
Jon Strabala
Principal Product Manager - Server