Not all documents replicated from sync gateway

My iOS app, with CB Lite 1.4.1, uses continuous replication from sync gateway version 1.4 (Couchbase server 4.6.2-3905) to pull documents. When the app is first run it starts pulling down data and eventually the replication goes idle, but not all documents have been replicated. If I kill and restart the app, more documents are replicated. It may require several restarts before all of the documents are replicated from sync gateway. I’ve turned on logging and I see this pattern in the log:

First app run…

2018-01-15 17:00:43.158 [info][tid:com.facebook.react.JavaScript] Starting Couchbase Sync Gateway 
    replication with https://sgw.CENSORED/master/
    17:00:43.163‖ Router: POST http://lite.couchbase./_replicate + body stream
        Accept: application/json
        Content-Type: application/json
        Connection: keep-alive
        Authorization: Basic dXNlcjpwdw==
        Content-Length: 246
    17:00:43.164‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] STARTING ...
    17:00:43.165‖ Database: {{ Begin transaction (level 1)...
    17:00:43.165‖ Database: }} Commit transaction (level 1)
    17:00:43.167‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/]: Reachability state = 
    <sgw.CENSORED>:reachable (02), suspended=0
    17:00:43.167‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/]: Going online
    17:00:43.167‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/]: No local checkpoint; not getting 
    remote one
    17:00:43.167‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] Progress: set active = 1
    17:00:43.167‖ ChangeTracker: CBLSocketChangeTracker[0x6080001da7c0 master]: Starting...
    17:00:43.168‖ ChangeTracker: CBLSocketChangeTracker[0x6080001da7c0 master]: POST 
    //sgw.CENSORED/master/_changes?feed=normal&heartbeat=300000&style=all_docs {
        "Accept-Encoding" = gzip;
        Authorization = "Basic CENSORED";
        "Content-Length" = 80;
        "Content-Type" = "application/json";
        Host = "sgw.CENSORED";
        "User-Agent" = "CouchbaseLite/1.3 (iOS 1.4.1/86c63659b56d87406eac3b1e4db631fe67098139)";
    }
    17:00:43.168‖ ChangeTracker: Changes feed using proxy settings {
        ExceptionsList =     (
            "*.local",
            "169.254/16"
        );
        FTPPassive = 1;
        "__SCOPED__" =     {
            en4 =         {
                ExceptionsList =             (
                    "*.local",
                    "169.254/16"
                );
                FTPPassive = 1;
            };
        };
    }
    17:00:43.168‖ ChangeTracker: CBLSocketChangeTracker[0x6080001da7c0 master]: Started... 
    <https://sgw.CENSORED/master/_changes?feed=normal&heartbeat=300000&style=all_docs>
    17:00:43.169‖ Router: Response -- status=200, body=24 bytes
            Server: CouchbaseLite 1.4.1 (build 88)
            Content-Type: application/json
    17:00:43.169‖ ChangeTracker: CBLSocketChangeTracker[0x6080001da7c0 master]: Event 1 on 
    <__NSCFInputStream: 0x60800010f300>
    17:00:46.952‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/]: Server is Couchbase Sync 
    Gateway/1.4.0
    17:00:46.953‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] Progress: 0 / 100
    17:00:46.954‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] queued 100 remote revisions from 
    seq=973093:878993 (100 in bulk, 0 individually)
    17:00:46.954‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] bulk-fetching 100 remote 
    revisions...
    17:00:46.955‖ RemoteRequest: CBLBulkDownloader[/master/_bulk_get]: Starting...
    17:00:46.955‖ RemoteRequest: CBLRemoteSession starting CBLBulkDownloader[/master/_bulk_get]
    17:00:46.957‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] Progress: 0 / 200`

Eventually getting to this point…

17:07:05.065‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] Progress: 92712 / 92800
    17:07:05.066‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] queued 100 remote revisions from 
    seq=973093:972921 (100 in bulk, 0 individually)    17:07:05.066‖ Sync: 
    CBLRestPuller[https://sgw.CENSORED/master/] bulk-fetching 100 remote 
    revisions...
    17:07:05.066‖ RemoteRequest: CBLBulkDownloader[/master/_bulk_get]: Starting...
    17:07:05.066‖ RemoteRequest: CBLRemoteSession starting CBLBulkDownloader[/master/_bulk_get]
    17:07:05.070‖ ChangeTracker: CBLSocketChangeTracker[0x6080001da7c0 master]: EndEncountered 
    <__NSCFInputStream: 0x60800010f300>
    17:07:05.070‖ ChangeTracker: CBLSocketChangeTracker[0x6080001da7c0 master]: RESUME
    17:07:05.073‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/]: Caught up with changes!

At that point no more documents are replicated. After the restart replication starts again from this 973093 and finds more documents:

    17:22:08.747‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] queued 15 remote revisions from 
    seq=973093:887952 (15 in bulk, 0 individually)
    17:22:08.747‖ Sync: CBLRestPuller[https://sgw.CENSORED/master/] bulk-fetching 15 remote revisions...

In this case I had to restart my app two more times before it completed replication. It looks like the rest puller thinks it’s caught up when it isn’t. Any ideas what’s wrong here?

EDIT: I just tried this on the Android version of my app and I’m seeing similar behavior. I had to force stop the app a couple of times to get all documents replicated.

Thanks,
David

Are there any errors in the Sync Gateway logs?

CBLite has additional log levels that can be enabled which might help:

In particular LogSync + LogSyncVerbose

Also, a tool like Wireshark or HTTPScoop running on the same machine as Sync Gateway will allow you to look at the HTTP requests, which might reveal a pattern. Eg, do you see any HTTP errors and retries?

These resources might help to understand the replication protocol:


CloudApp

I’ve been tailing the sgw log and have not seen any errors. I already have the log levels cranked up. The back end is a sync gateway cluster talking to a cb cluster in a docker swarm. I think next I’ll load a local sync gateway with the same set of docs and see if I can reproduce, in which case it would be easier to monitor the network traffic.

I’ve not seen any HTTP errors monitoring the network and sync gateway logs. I modified the replicate request to specify a poll value of 30000ms and websocket false, which I believe should disable websockets in case that’s part of the problem. However, I still see the behavior that the first time the app replicates with sync gateway it eventually reaches a state where no more changes are received, but upon killing and restarting it resumes replicating.

The last replication request before I killed the client emitted this log:

00:47:12.420‖ ChangeTracker: CBLSocketChangeTracker[0x6180003c0e10 master]: Starting...
00:47:12.421‖ ChangeTracker: CBLSocketChangeTracker[0x6180003c0e10 master]: POST //CENSORED.amazonaws.com/master/_changesfeed=normal&heartbeat=300000&style=all_docs&since=94107 {
    "Accept-Encoding" = gzip;
    Authorization = "Basic CENSORED";
    "Content-Length" = 94;
    "Content-Type" = "application/json";
    Host = "CENSORED.amazonaws.com";
    "User-Agent" = "CouchbaseLite/1.3 (iOS 1.4.1/86c63659b56d87406eac3b1e4db631fe67098139)";
}
00:47:12.422‖ ChangeTracker: Changes feed using proxy settings {
    ExceptionsList =     (
        "*.local",
        "169.254/16"
    );
    FTPPassive = 1;    
    "__SCOPED__" =     {
        en4 =         {
            ExceptionsList =             (
                "*.local",
                "169.254/16"
            );
            FTPPassive = 1;
         };
    };
}
00:47:12.422‖ ChangeTracker: CBLSocketChangeTracker[0x6180003c0e10 master]: Started... <http://CENSORED.amazonaws.com/master/_changes?feed=normal&heartbeat=300000&style=all_docs&since=94107>
00:47:12.423‖ ChangeTracker: CBLSocketChangeTracker[0x6180003c0e10 master]: Event 1 on <__NSCFInputStream: 0x618000311670>
00:47:12.601‖ ChangeTracker: CBLSocketChangeTracker[0x6180003c0e10 master]: EndEncountered <__NSCFInputStream: 0x618000311670>
00:47:12.601‖ ChangeTracker: CBLSocketChangeTracker[0x6180003c0e10 master]: Next poll of _changes feed in 30 sec...

Then, after killing and restarting the app, replication resumes with this logged:

2018-01-17 00:49:27.826 [info][tid:com.facebook.react.JavaScript] Starting Couchbase Sync Gateway replication with http://CENSORED.amazonaws.com/master
00:49:27.832‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master] STARTING ...
00:49:27.835‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master]: Reachability state = <CENSORED.amazonaws.com>:reachable (02), suspended=0
00:49:27.835‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master]: Going online
00:49:27.835‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master] Progress: set active = 1
00:49:27.835‖ RemoteRequest: CBLRemoteJSONRequest[GET http://CENSORED.amazonaws.com/master/_local/4fc015c942e4c8a18d80fe9f33d93e6d141d7b74]: Starting...
00:49:27.836‖ RemoteRequest: CBLRemoteSession starting CBLRemoteJSONRequest[GET http://CENSORED.amazonaws.com/master/_local/4fc015c942e4c8a18d80fe9f33d93e6d141d7b74]
00:49:27.836‖ Router: Response -- status=200, body=24 bytes
        Server: CouchbaseLite 1.4.1 (build 88)
        Content-Type: application/json
00:49:28.559‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master]: Server is Couchbase Sync Gateway/1.4.0
00:49:28.560‖ RemoteRequest: CBLRemoteJSONRequest[GET http://CENSORED.amazonaws.com/master/_local/4fc015c942e4c8a18d80fe9f33d93e6d141d7b74]: Got response, status 200
00:49:28.560‖ RemoteRequest: CBLRemoteSession done with CBLRemoteJSONRequest[GET http://CENSORED.amazonaws.com/master/_local/4fc015c942e4c8a18d80fe9f33d93e6d141d7b74]
00:49:28.560‖ RemoteRequest: CBLRemoteJSONRequest[GET http://CENSORED.amazonaws.com/master/_local/4fc015c942e4c8a18d80fe9f33d93e6d141d7b74]: Finished loading
00:49:28.560‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master]: Replicating from lastSequence=94106:30367
00:49:28.560‖ ChangeTracker: CBLSocketChangeTracker[0x6000001d7160 master]: Starting...
00:49:28.561‖ ChangeTracker: CBLSocketChangeTracker[0x6000001d7160 master]: POST //CENSORED.amazonaws.com/master/_changes?feed=normal&heartbeat=300000&style=all_docs&since=94106%3A30367 {
    "Accept-Encoding" = gzip;
    Authorization = "Basic CENSORED";
    "Content-Length" = 102;
    "Content-Type" = "application/json";
    Host = "CENSORED.amazonaws.com";
    "User-Agent" = "CouchbaseLite/1.3 (iOS 1.4.1/86c63659b56d87406eac3b1e4db631fe67098139)";
}
00:49:28.561‖ ChangeTracker: Changes feed using proxy settings {
    ExceptionsList =     (
        "*.local",
        "169.254/16"
    );
    FTPPassive = 1;
    "__SCOPED__" =     {
        en4 =         {
            ExceptionsList =             (
                "*.local",
                "169.254/16"
            );
            FTPPassive = 1;
        };
    };
}
00:49:28.561‖ ChangeTracker: CBLSocketChangeTracker[0x6000001d7160 master]: Started... <http://CENSORED.amazonaws.com/master/_changes?feed=normal&heartbeat=300000&style=all_docs&since=94106:30367>
00:49:28.562‖ ChangeTracker: CBLSocketChangeTracker[0x6000001d7160 master]: Event 1 on <__NSCFInputStream: 0x6080001152a0>
00:49:31.597‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master] Progress: 0 / 100
00:49:31.602‖ Sync: CBLRestPuller[http://CENSORED.amazonaws.com/master] queued 11 remote revisions from seq=94106:30368 (11 in bulk, 0 individually)

In the first case it’s requesting changes since 94107 and getting nothing. In the second case it’s requesting changes since 94106:30367 and finding documents. When the client has eventually replicated all documents it again requests changes since 94107 and is idle. Why is the client apparently prematurely getting to the the 94107 sequence number? What’s happening when the pull replication starts again after the app is restarted that allows the client to figure out it needs to start at 94106:30367 instead of where it previously left off at 94107?

This is mysterious. I’ve filed a Github issue and pasted in your initial description (thanks for the detailed info & logs!) Let’s continue there.