Couchbase XDCR doubt

Lets say, we have 2 clusters A and B (assuming millions of data)

  1. Cluster A has index C and data D which is part of index C.
  2. Cluster B also has index C but no data D.
  3. Starting data replication using XDCR from cluster A to cluster B.
  4. It will copy data D to cluster B.

My doubt is whether index C at cluster B be updated with data D automatically?

Guys, please reply. One of my project is stuck based on this info.

As data arrives into a cluster the index will automatically be updated asynchronously. It does not matter how the data gets into cluster be via XDCR or application or the being created in the UI. For more information about XDCR please see the documentation - xdcr overview.. The section on XDCR Payloads covers this further.

This is also true for Eventing too and any other actions that is triggered by incoming data to the cluster.

Thanks. I got it.

  1. Will eventing work if i put the alias bucket with “read only” option?
  2. i am getting an error attached while doing UPDATE query inside eventing. how to fix it?

@Nitesh_Gupta Not too sure about that error. @jon.strabala Can you help Nitesh please.

Hi @Nitesh_Gupta

What version of Couchbase are you running?

Note, this seems to indicate you couldn’t deploy your function i.e. before you execute an UPDATE via N1QL so it is not a N1QL error.

In the future, can you please provide your Eventing Function (an export from the UI is preferrable) alternatively can you direct us to an Enterprise support ticket, i.e. an uploaded cbcollect_info.

One thing that comes to immediately to mind is that Eventing via embedded N1QL will prevent deployment of handler with N1QL writing to the Function source bucket. I assume the eventing Function’s source bucket is “promotions” and you have a statement in it with something like:

UPDATE `promotions` UNSET foo 

updated this post on 12/7/2020 (above is wrong if more than one (1) item will update every thing we really want the below)

 var key = meta.id;
UPDATE `promotions` USE KEYS $key UNSET foo;

Okay so for Eventing you most likely have something like (I am guessing a bit here and I called my Function 0A_myexample):

// BAD example with inline N1QL will NEVER deploy
// The function name is "0A_myexample"
// The source bucket is "promotions"

function OnUpdate(doc, meta) {
    log('docId', meta.id);

    // skip if the field foo has already been removed 
    // IMPORTANT if you use N1QL to avoid infinite recursion.
    if (!doc.foo) return; 

    // must use a var not meta.id as a parameter to inline N1QL
    var key = meta.id;

    // try to use inline N1QL - this will fail if bucket `promotions` 
    // is the source bucket.
     UPDATE `promotions` USE KEYS $key UNSET foo;
    // it will fail to even deploy with an error as follows:
    // deploy failed: {"code":51,"info":"Function: 0A_myexample 
    //              N1QL dml to source bucket promotions"}
}

As you can see we get the expected error that you experienced:

deploy failed: {"code":51,"info":"Function: 0A_myexample N1QL dml to source bucket promotions"}

If you need to update the source bucket from within N1QL you will get a recursive mutation as such you have to be VERY VERY careful you could crash your server (as in the Data Nodes) if you don’t suppress recursions in your JavaScript. This is why we prevent such deployments by default.

The good news there is a way to work around this - the below would work for you:

// GOOD (but NOT OPTIMAL) example with the N1QL() function call
// The function name is "0A_myexample"
// The source bucket is "promotions"
// requires an index on bucket "promotions" note I added one as follows:
// CREATE PRIMARY INDEX `#primary_promo` ON `promotions`

function OnUpdate(doc, meta) {
    log('docId', meta.id);

    // skip if the field foo has already been removed 
    // IMPORTANT if you use N1QL to avoid infinite recursion.
    if (!doc.foo) return; 

    // try to use the N1QL() function call - this will succeed since 
    // we don't perform recursion checks - but if we don't have the 
    // statement "if (!doc.foo) return" as above you will get infinite 
    // recursion - this would be very very bad.
    N1QL("UPDATE `promotions` USE KEYS '"+meta.id+"' UNSET foo");
    // The N1QL() function will succeed in deploying and work as 
    // expected - however it is not optimal as in 10X to 100X slower 
    // than KV.
}

Now I did say the above is not optimal as in 10X to 100X slower than KV so let’s rewrite it in a pure Eventing KV map mode and skip the N1QL entirely to implement a high performance version of the above without N1QL.

// BEST (OPTIMAL) example with no N1QL and no N1QL() function call
// The function name is "0A_myexample"
// The source bucket is "promotions"
// requires a bucket alias "src_bkt" in read-write mode to bucket 'promotions'

function OnUpdate(doc, meta) {
    log('docId', meta.id);

    // Optional: the KV map recursion is automatically suppressed internally by 
    // by Eventing - but having this will be faster as it will avoid unneeded 
    // writes if the property has already been removed if you undeploy then
    // subsequently redeploy "From Everything"
    if (!doc.foo) return; 
    
    // remove the field
    delete doc["foo"]; // or delete doc.foo;
    
    // write the updated doc back to the bucket promotions
    src_bkt[meta.id] = doc;
}

Hope the above helps not essentially we have rewritten the example at Function: redactSharedData | Couchbase Docs this function doesn’t remove field but you will get the idea.

Cheers

Jon Strabala

Thanks @jon.strabala for the detailed explanation. You are right. I was trying the update unset query only. And later I got to know that I was on wrong version. Source bucket mutations are possible only on 6.5 onwards.
I have few more doubts:

  1. As you mentioned the optimal query. If I try the same optimal query on below versions where source bucket mutations not supported, will this also result in CB performance degradation? Will this impact data and indexes already on that bucket or on all buckets? Let me mention data size is around 55 million.
  2. If degradation is there, will recreate the same bucket and restoring the data and indexes will help?
  3. If even recreating the bucket, data and indexes does not help, then what will help?
  4. Regarding XDCR, I am having XDCR from server A to server B. A will copy all data to B. Now, I remove XDCR and I just add one more field to each document of B. Now I put XDCR from B to A. Will all the data be copied back to A with that extra field in those all documents?
  • As you mentioned the optimal query. If I try the same optimal query on below versions where source bucket mutations not supported, will this also result in CB performance degradation? Will this impact data and indexes already on that bucket or on all buckets? Let me mention data size is around 55 million.

I prefer to call the optimal query KV access so as not to confuse it with N1QL, the performance hit will be much less than KV and you will not need an index. Hower writing to the source bucket via KV from Eventing is slightly less performant than writing to a non source bucket.

I typically test with 200M and 750M documents so I don’t see a big issue with 55M docs, although yo may want to up the workers to the number of course for your first run (assuming you deploy for everything for the feed boundary).

Note if I am mistaken and you want to stay with N1QL you should consider USE KEYS as that is much faster than a free format WHERE clause. I apologize I was testing with one item in my prior post - so I updated my first post to this thread post above to be more accurate as in your 55M use case.

  • If degradation is there, will recreate the same bucket and restoring the data and indexes will help?

Sorry I’m not really an index guy, but I don’t think you will have any issues I do this sort of stuff all the time you will need a decent quota on your bucket (at least 20% residency for the initial run the more the better . The only thing that comes to mind is the meta data purge interval if you do a lot of updates to clean out the tombstones under your bucket settings.

  • If even recreating the bucket, data and indexes does not help, then what will help?

It seems that you are asking about “what ifs” going down multiple paths, you should just try it and see how it all works. Of course when doing 55M docs I would comment out the log(…) messages in the Eventing functions.

  • Regarding XDCR, I am having XDCR from server A to server B. A will copy all data to B. Now, I remove XDCR and I just add one more field to each document of B. Now I put XDCR from B to A. Will all the data be copied back to A with that extra field in those all documents?

This seems fine, hopefully @shivani_g can verify that the XDCR will work as expected.

Yes, XDCR will copy all the data back from B to A since the data is updated. Updates are sent across XDCR just like new inserts. @neilhuang fyi in case you want to add something.

If the source and target buckets are using Time-based conflict resolution, and the clusters are both running NTP, then that is true.

Otherwise, you are running Revision-Number based conflict resolution:

If you are doing what you explicitly stated here, then the data will be replicated back from B to A. With one caveat:

Given a document in cluster A, if you have modified the document more times than the same doc in cluster B, then it is possible that when B replicates the document to A, that it would lose the conflict resolution. (A would have a higher revision ID because it was modified much more than B - Most-Write-Wins). You will not see B’s document replicated to cluster A in this case.

thanks @neilhuang for the detailed explanation.