I have a user U, with multiple channels including channel C, and finally a document D (un-conflicted) that has been added to C. Everything appears to be working. I establish a continuous replication and user U sees D on their mobile device.
Document D is then moved server-side to channel X (it is taken from channel C and added to channel X). User U does not subscribe to channel X. There have been NO changes to the channels that user U subscribes to in this exercise. The document was moved from a channel that user U could access to a channel that U never had access to.
Is it safe to expect that 1) an existing continuous replication will be told D has been removed from C (the mobile device would get a a tombstone synced), and that 2) a fresh continuous replication against a clean database should never see D at all (not even a tombstone - it should be a fresh sync of active revisions only)? Would revision history settings have an impact on observed behavior?
I’d like to make sure we have our assumptions correct here as we try to chase down some customer issues.
DanP
PS-Couchbase Server 4.1.1 with Sync Gateway 1.3.1.
PPS-The following assumptions can be made for the questions above: If you were to pull the _changes feed with active_only true on U’s login you would not find D. If you pulled the _changes feed with active_only false, then a single row that indicates D was removed from C would be in the feed results. Finally, hitting _all_docs with channels true and a key of "D” would return the correct channel X for document D.
The expectations described are generally correct, but vary a bit in the details:
an existing continuous replication will be told D has been removed from C (the mobile device would get a a tombstone synced)
An existing continuous replication will be sent a removal notification for document D. This isn’t the same thing as a tombstone - the _changes entry will include a ‘removed’ property with a list of channels, and attempting to retrieve the document on a subsequent GET /bulk_get will return a stub document with "_removed":true.
a fresh continuous replication against a clean database should never see D at all (not even a tombstone - it should be a fresh sync of active revisions only)?
A fresh _changes request without active_only set to true will still be sent the removal notification. I believe a completely fresh Couchbase LIte replication begins with an one-shot active_only=true request, but I don’t know how long active_only is maintained (particularly around interaction with limit handling). If you’re using Couchbase Lite, information on build and platform could probably help get an answer on that question.
Is it safe to expect that 1) an existing continuous replication will be told D has been removed from C (the mobile device would get a a tombstone synced), and that
Yes, it should get an update on the changes feed with the doc ID of D that has the _removed flag set, which indicates to Couchbase Lite that the user has lost access to the document. It’s not considered a tombstone document, since the access control is separate from the revision tree.
a fresh continuous replication against a clean database should never see D at all (not even a tombstone - it should be a fresh sync of active revisions only)?
Correct, a fresh replication as User U should never show D as an active doc, since it’s been removed from that user’s channel access.
Would revision history settings have an impact on observed behavior?
Can you be more specific? Do you mean the sync gateway revs_limit config setting?
Thank you to both @traun and @adamf for the speedy replies.
We move tickets through aging channels in order to minimize the document load on mobile devices. There have been complaints of high sync counts, and as document counts have climbed we’re looking more at ways to mitigate issues.
As an example of things driving these questions:
User logs in “fresh” on a clean database and we watch progress count through over 54,000 items being sync’d before it comes “ready”. This is using a continuous replication on Couchbase Lite for Android 1.4.x.
When the sync is complete, we can take the ID for a document we know has changed channels and is marked removed (it’s a GUID, but call it D), do a getDocument on it, and actually get back a response that indicates it is removed (I understand now that is not a tombstone - not a marked deletion on a branch of the document), but nonetheless we’re not getting back a NULL or NOT FOUND … something is coming back and therefore it appears something must have transferred in the sync and been persisted for that ID.
With that same user, hit Sync Gateway directly to pull _changes with query params set to “style=all_docs&active_only=true&include_docs=false&feed=normal” and you’ll get back a JSON response that has 16,482 rows of data, 3,198 of which are tagged “removed”. Note that in this pull the document ID mentioned above cannot be found anywhere. It is not one of the 16,482 docs.
With that same user, hit Sync Gateway directly to pull _changes with query params set to “style=all_docs&include_docs=false&feed=normal” and you’ll get back 54,482 rows of data, 40,938 of which are tagged “removed” and 268 of which are tagged deleted. In this data pull, one of the 40,938 rows tagged removed includes the document ID from above.
Based on the above we concluded that if we moved the doc from one channel to another, something was still taking up time transferring (taking a lot of time futzing over what looks to be 40K removed items that shouldn’t sync), and possibly taking up space (that bit about it looks like it persisted the removed document). If the initial sync used active_only=true, then it doesn’t seem likely that anything could have been persisted about the doc being removed, because it would not have been part of the response.
We’re really just trying to understand so that we can truly minimize the time it takes to do that sync after a customer has gotten pretty mature and has a bunch of docs (some have approaching 200,000 remove ddocs). I’d hate to have to wade through those or have precious mobile storage taken up.
@traun - regarding the revs_limit question… I asked because I wondered if those 40K+ revs was possibly from pulling 2-3 revisions for each of the 16,482 docs vs. drawing the conclusion that the 54K count during sync was directly related to what we saw with a direct active_only=false pull.