Couchbase Sync Gateway Data Usage

Hi

For some background we have an Android application that we run with couchbase as its main database, this database is connected with a continuous push-pull replicator to our sync gateway.

I am looking to get some more understanding on strategies to minimise mobile data usage from my application. some questions I thought of are

  1. Does the entire document sync every time I make an update or just the fields that changed ?
  2. If I am going to write to a document multiple times should I try wait until everything is ready and then only write to the document once ?
  3. Would many small documents be more efficient than one large document so each write only uses the bare minimum amount of data.
  4. Anyone have any suggestions on tools to watch the traffic or track the usage so I can get an idea of what areas are using the most data ?
  5. Anyone have any of their own strategies they have found to optimise data usage ?

Thank you for any assistance

If you enable Delta Sync, only the changed part of the document will be syncronised in most cases, at the cost of some temporary additional bucket storage. This is explained in greater detail in the link above.

Only the latest version of a document is ever sent, not every change made to a doc since the last time it was seen, so that might help.
However, if you want to avoid actively connected clients from pulling each update as they happen, yes you’d want to batch your updates. You can do this by batching your writes on the update side, or by controlling the replicators and not have them run continuously.

As long as you’re using delta sync effectively (clients already have a recent previous version of a document to base the delta on), one large document would be better, as there’s some small amount of replication overhead per-document.

However, this does depend on client behaviour (how long they’re offline/out of date for). Delta sync falls back to full document replication if the client doesn’t have a usable version, so in that case it probably would be better to use more smaller documents if you’re only ever updating a small part of it past a ~24hr window.

With the above answers about delta sync considered, use the delta sync stats to see how effective enabling that is. (stats also available in Prometheus format)

Also check the CBL push and pull replication stats.

and these two stats (BLIP is the name of our replication protocol transport):

Replication traffic is by default compressed with flate and the context is across the whole replication stream, which means we can get compression benefits shared across multiple documents and replication metadata. Don’t worry about compressing document data yourself!

Hi @bbrks

Thank you for such an incredibly detailed answer, it seems Delta Sync is the place to start and getting some idea of my current data usage.

Why is the delta sync not enabled by default ? Is it just a storage issue ?

Yes, this is the only real reason. Given that doc_size and updates_per_day determining bucket size increase when enabled are both unknowns by Couchbase, it was safer to make it opt-in and let the customer decide which tradeoff they’d like, given the impact of running out of disk space.

FWIW, I suggest you look at the WorkManagerReplicator in the Kotlin Extensions package.

The problem with our replicator is that it may try to turn on the device broadband radio fairly frequently. This is especially bad since the radio stays on for a bit after use, in the expectation that it will probably be used again. The result is that, just as it decides to go back to low power, the replication process may decide to wake it up again. The result may end up being nearly continuous high power.
Android’s WorkManager not only prevents this, it takes remote request from multiple applications, across the device, and bundles them so that the radio goes on once for all of them. It can save a boatload of battery.