I need fast access to doc keys and doc CAS (in order to know what doc has changed). Basically, we will maintain a mapping of keys to CAS and when the ETL runs we will compare and see if the CAS and changed to know if there is any work to do…
What is the best way to get this information efficiently.
I am thinking to do an index like this
('type', meta) to support select meta().id, meta().cas from default where type = x
Can I index the Meta?
How would I write this index?
Is there an obvious better way to do what I’m doing?
If your goal is to do some processing whenever a document changes, the Eventing service might be useful. Or possibly the Couchbase Kafka connector, if you’ve got a Kafka broker handy.
In recent versions of Couchbase the leading characters of the CAS string is like a timestamp in millis since epoch. You might use this to your advantage.
For example if you used Eventing (or the Couchbase Kafka connector) you could ignore the older CAS values prior to your last ETL run. For example assume I only want data changed in the most recent hour in Eventing I could set LOOKBACKSEC to 3600. Here we could run an Eventing Function with a feed boundary of Everything.
function OnUpdate(doc, meta) {
if (LOOKBACKSEC > 0) {
var cutoff_millis = Date.now() - LOOKBACKSEC * 1000;
var doc_millis = parseInt(meta.cas.substring(0,13));
if (doc_millis < cutoff_millis) return;
}
// ..... your code here the document changed in the last hour ....
}
You deploy it and just leave the eventing function “live” and it will emit anything newer than an hour and continue to emit current values on changes (subject to DCP dedup).
In the above Eventing function you might use the curl() function to emit changed items to an external REST endpoint. Refer to Function: Basic cURL POST | Couchbase Docs to use the curl() call.
Note no indexes are required but you need a fast REST endpoint and a lot of workers for your Eventing function is you have lots of changes (or mutations) to emit.