Couchbase Lite replication stalls out on device, then crashes

I’m working on an iOS app with CB Lite 2.6.1. Testing pull replication of a fairly large set of small documents from sync gateway in the simulator I had no problem - it replicated all 700K documents fairly quickly. Going to my iPhone 7 test device, it works fine until it hits about the 350K document mark, then starts getting connection timeouts and eventually stalls out completely, stuck in the connecting state. I’m unable to replicate beyond 429K documents.

While this is happening, any other activity on the database, such as queries, saving or fetching a document, essentially hang, taking 60+ sec to complete, if at all.

CPU is only at about 45%. Memory usage fluctuates between 80MB-120MB, disk is around 100 MB/s. Any idea what’s going on here?

David

After killing the app and restarting, now it crashes during replication:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000124
VM Region Info: 0x124 is not in any region.  Bytes before following region: 4335976156
      REGION TYPE                      START - END             [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                 000000010271c000-0000000102b00000 [ 3984K] r-x/r-x SM=COW  ...seLiteExample

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [2976]
Triggered by Thread:  3

Thread 3 name:  Dispatch queue: Repl->wss://sgw.dev.XXXX/master/_blipsync
Thread 3 Crashed:
0   CouchbaseLiteSwift            	0x00000001039da204 std::__1::__function::__func<litecore::repl::Replicator::getRemoteCheckpoint()::$_0, std::__1::allocator<litecore::repl::Replicator::getRemoteCheckpoint()::$_0>, void (litecore::blip::MessageProgress const&)>::operator()(litecore::blip::MessageProgress const&) + 811524 (Replicator.cc:479)
1   CouchbaseLiteSwift            	0x00000001039da1f8 std::__1::__function::__func<litecore::repl::Replicator::getRemoteCheckpoint()::$_0, std::__1::allocator<litecore::repl::Replicator::getRemoteCheckpoint()::$_0>, void (litecore::blip::MessageProgress const&)>::operator()(litecore::blip::MessageProgress const&) + 811512 (Replicator.cc:0)
2   CouchbaseLiteSwift            	0x00000001039fdaec std::__1::__function::__func<litecore::repl::Worker::sendRequest(litecore::blip::MessageBuilder&, std::__1::function<void (litecore::blip::MessageProgress const&)>)::$_0, std::__1::allocator<litecore::repl::Worker::sendRequest(litecore::blip::MessageBuilder&, std::__1::function<void (litecore::blip::MessageProgress const&)>)::$_0>, void (litecore::blip::MessageProgress)>::operator()(litecore::blip::MessageProgress&&) + 957164 (Worker.cc:0)
3   CouchbaseLiteSwift            	0x00000001039fd850 invocation function for block in std::__1::function<void (litecore::blip::MessageProgress)> litecore::actor::Actor::_asynchronize<litecore::blip::MessageProgress>(std::__1::function<void (litecore::blip::MessageProgress)>)::'lambda'(litecore::blip::MessageProgress)::operator()(litecore::blip::MessageProgress) + 956496 (Actor.hh:0)
4   CouchbaseLiteSwift            	0x0000000103a7d458 litecore::actor::GCDMailbox::safelyCall(void () block_pointer) const + 1479768 (GCDMailbox.cc:91)
5   CouchbaseLiteSwift            	0x0000000103a7d52c invocation function for block in litecore::actor::GCDMailbox::enqueue(void () block_pointer) + 1479980 (GCDMailbox.cc:102)
6   libdispatch.dylib             	0x0000000188e25610 _dispatch_call_block_and_release + 24
7   libdispatch.dylib             	0x0000000188e26184 _dispatch_client_callout + 16
8   libdispatch.dylib             	0x0000000188dd2464 _dispatch_lane_serial_drain$VARIANT$mp + 608
9   libdispatch.dylib             	0x0000000188dd2e58 _dispatch_lane_invoke$VARIANT$mp + 420
10  libdispatch.dylib             	0x0000000188ddc340 _dispatch_workloop_worker_thread + 588
11  libsystem_pthread.dylib       	0x0000000188e75fa4 _pthread_wqthread + 276
12  libsystem_pthread.dylib       	0x0000000188e78ae0 start_wqthread + 8

Yikes! Please file a bug report. This sounds like at least three different issues (replicator, slow db access, crash) but we’ll figure it out there.

The crash is known, but we don’t have a release with the fix yet. The workaround is to avoid pull-only replications. (This doesn’t happen on all pull-only replications, but if the previous replication was aborted due to a crash or other disconnect, a subsequent pull-only replication is likely to trigger this.)

I filed an issue: https://github.com/couchbase/couchbase-lite-core/issues/878