Correct usage of the Sync Gateway _purge API

Hi all,

what is the correct format to delete multiple Documents from Sync Gateway using the _purge API of SyncGateway documented at this link?

It looks like the _purge API is intended to be invoked for 1 single Document at the time. So we would need to loop through the document IDs to be delete and for each of them invoke with this body:


{
  "my_document_id": [
    "*"
  ]
}

Is that the correct approach?

The use case is that in our server-side application we are implementing a way to completely delete from Sync Gateway the Documents that are deleted from Couchbase Server when older than a certain period of time. The goal is to reduce the amount of data transferred to mobile clients over time.

Thank you

Hi @gmaggini,

The purge endpoint accepts multiple doc IDs in the format:

{
  "doc1": ["*"],
  "doc2": ["*"],
  "doc3": ["*"]
}

I hope this helps!

Do you also need to worry about removing the documents from the clients also? If so, purge may not be an appropriate way to delete these documents, as purges aren’t replicated, and clients could resurrect the documents.

For your use-case, if you do want clients to recieve deletion notifications, the best approach may be to write documents with an expiry/TTL (you can do this within your sync function) to allow automatic cleanup of older documents.

If you do still want to rely on an external process for this cleanup, you can do a normal delete (not purge) through Sync Gateway (either on the Document REST API endpoint, or the Bulk Docs endpoint if you want to batch it). These deletes will be replicated to clients so that they’re also cleaned up there.

Thank you @bbrks for your reflection, really appreciated!

To my knowledge, (and this is our assumptions) deletions in Sync Gateway still take up space on clients and servers, as the documents are still physically present to indicate they have been deleted.

Our scenario is that the clients receive data via one-way replication Server → Clients, hence I would assume the documents won’t be resurrected.

It is also acceptable if documents received by the clients stay on the client forever, while I am more concerned that new clients would not need to download all “deletions” of documents (for a faster first-time replication).

Based on the statements above, we opted for _purge, but please let me know if we missed anything in our logic.

Thank you !

Given the nature (one-way sync to clients), Purge would work for you, if you’re OK with the external process used to remove them and don’t expect to enable two-way sync in the future.

Deletions do indeed take up some space (we call them tombstones) so that we can replicate deletions to clients. However, we have several things in place to alleviate your concerns, if you were to use expiry and deletions instead of purge:

  • New Couchbase Lite clients do not pull tombstones, only active documents, to ensure the first replication is only pulling nessesary items.
  • Tombstones are periodically cleaned up on the server after a length of time determined by a setting called the “Metadata Purge Interval
    • Reference: Auto-Compaction | Couchbase Docs
    • We suggest this is set to be approximately how long you expect clients to remain offline/unsynced for.
      • 30 days is a reasonable value for Couchbase Mobile deployments, but could be lower if you expect clients to be online more regularly.
  • Couchbase Lite also has its own compaction mechanism:

Thank you @bbrks for sharing those links. There is only one remaining aspect I did not mention explicitly: we use Channels on Sync Gateway to provide fine-grained documents permissions, and we have shared_bucket_access to true.

I read here (Cache Ejection) that it’s the channel cache which may be what causes our Couchbase Lite synchronization to grow over time, hence this is the reason why we were looking into the _purge API on SyncGW, as this should purge the cache automatically.

However, from what you linked above, we could achieve the same result automatically if we configured Auto-Compaction on the server (let’s say 10 days) and then configure the compact_interval_days on SyncGateway to 1 day (which should be the default), as the compaction on Sync Gateway will also automatically flush the cache.

Hope I got it right? :slight_smile: