Couchbase Lite Replicator Not Syncing after write

Hi

I am using couchbase lite on android version 3.0.0 and sync gateway version 3.
I am hoping someone can shed light on this issue. It seems that for some users if on the first time the database is created the user writes a document to the database before the replicator is started then no documents are pulled during replication ever, even after restarts of the application. This seems to be a perisistent issue for those users and once it has occurred it occurs every time.
Something similar seems to occur when the sync is interrupted for these users it never seems to fully sync documents for that user.

There are no errors in the device logs or sync gateway logs and I was wondering if anyone can shed light on what could be causing this.

I have also created an example project that shows this behavior here: https://github.com/meirrosendorff/couchbaseExample

The example project has 2 sets of credentials in it, one that works when creating a document before syncing and one that doesn’t. The app must be deleted and reinstalled between tests because this only works the first time the database is created.

Bellow are the device logs as well as the sync gateway logs

Any assistance would be greatly appreciated

Server Logs: log_files_18_July_2022 2.zip (4.2 MB)

Device Logs: deviceLogs.txt.zip (4.9 KB)

1 Like

Hey @meirrosendorff : This appears to be an Android device, right?

1 Like

I have a spider sense feeling that the answer is going to lie in the sync gateway configuration, so could you post your sync gateway configuration JSON file as well (along with any command line options you have used?)

Yes correct, its android. The issue does not seem to be OS specific though it happens on all operating systems I have tested.

We are using the latest version of sync gateway so there is no longer a single configuration file it is in memory but here is the current config.

{
    "bucket": "link.v3",
    "name": "link",
    "sync": "function(doc, oldDoc) {\n          if (doc._deleted) { return }\n          if (!doc.meta || !doc.meta.sync) {\n            throw ({\n              forbidden: \"Missing required properties [meta|sync]\"\n            })\n          }\n          if (oldDoc && oldDoc.meta.sync.toString() !== doc.meta.sync.toString() ) {\n            throw ({\n              forbidden: \"Channel cannot be changed\"\n            })\n          }\n            requireAccess(doc.meta.sync)\n            channel(doc.meta.sync)\n        }",
    "import_docs": true,
    "enable_shared_bucket_access": true,
    "allow_conflicts": false,
    "num_index_replicas": 1
}

This is the old config file from before we changed over to the latest gateway, I believe the issue was still occurring then. serviceconfig.backup.json.zip (2.3 KB)

I also suspected it would be the sync gateway but we have a few buckets using the same configuration and this is the only one we are seeing this issue on.

We are currently looking into it.

Since there are no errors, as you pointed out, there is nothing to go on to diagnose this issue. If you are feeling brave, you could make a wireshark capture of the working and non working connections to see what is going on (make sure to start recording before the initial handshake, or the traces will be worthless though).

I must be honest I am struggling to sniff traffic coming from the app, I will keep trying though. The best I have managed is to set up a man in the middle proxy and as far as I can see there is no difference in the websockets between the working and non working versions. I have attached images of the details of each socket to see if anything jumps out to you guys.

Something interesting I also noticed is that every time the replicator connects it first makes a get call before connecting the websocket and it receives back a 400 Unauthorised response, this happens in both the working and non working connection so I am not sure if it has any significance.

Here is the initial unauthorised get,



Here are the details for the Not Working connection:





Here are the details for the Working Connection:





1 Like

Is there anything that is worth checking on the sync gateway side that we can compare between a working user profile and a broken one ? because as I said this seems to be linked to the user.

Sorry I missed this, but I suggest using Wireshark since there is a BLIP plugin for it (the screenshots don’t look like wireshark). It will decode the packets and print information about them. You won’t find much of anything in the initial HTTP packets. All the conversation is carried out in subsequent websocket messages.

hi @borrrden we managed to do the wireshark capture of the syncing vs non syncing credentials though we where unsure how to use the BLIP plugin, we where unable to find anything obvious, would you mind taking a look at the logs and letting us know ?

Please let me know if I have recorded everything correctly.

Appreciate the assistance.

wg.zip (525.8 KB)

For future reference, BLIP is built into wireshark since 3.0.0. You only need type “blip” into the filter to see. It’s often hard to find the blip packets without filtering. I do see them in your traces. I don’t see any evidence of wrongdoing directly in the trace, which suggests the server side simply is not sending documents because it thinks that there are none to send for that user. Both traces are asking for documents from the server (MSG 3 on each) in the same way. The only difference is that the successful user has never tried to replicate until that point, according to the server. It has no checkpoint, while the non successful user has a checkpoint whose value is 6 (too low for the 2047 documents). I think the next step is to grab more verbose logs from the bad run from the server. That should contain more information about what is going on. I will tag @bbrks to address that.

Thank you for taking a look at the logs, appreciate it.
What kind of logs are you looking for and how would I go about capturing them ?

I am referring to the Sync Gateway logs. You posted some before, but they are only the warning and error levels. I simply want to receive all of them up to verbose instead.

See this page for details on how to set up logging.

Understood,

I have opened a support ticket for this (reference #46943) and will work though it with the guys in the support portal

Thank you so much for all the help.