Sync Gateway Replication Issue After Reingesting Documents in a Channel

Hello,

I’m reaching out for guidance regarding a replication issue we’re experiencing after updating document content in one of our “modules.”

Background:

In our setup, a module is a logical grouping of documents, each with a unique module_id. Each module is assigned to a channel named after its module_id. Our Sync Gateway setup uses a straightforward sync function that assigns each new document to the corresponding module_id channel. Until recently, Sync Gateway was working perfectly, and we could replicate module data to our clients without issue.

Problem:

We had a module with 279 documents, some of which were large (around 6MB). Here’s what we did:

  1. Deleted the Module Content: We executed the following SQL++ query to delete all documents in the module:

DELETE FROM collection_name WHERE module_id=;

  1. Reingested Updated Module Content: We reloaded the module with updated content, which now has 275 documents.

Current Issue:

After this update, we attempted to replicate the module to our clients, but it now hangs after syncing 36 documents and shows the total document count as 39, which is inconsistent with the 275 documents we expect.

Things We Have Tried:

  1. Deleting and recreating the Sync Gateway database.
  2. Running a compaction.

Neither of these steps resolved the issue.

can anyone suggest further troubleshooting steps? We’re open to any ideas on how to fully sync this updated module. We are using sync gateway 3.2.0-enterprise and couchbase lite 3.2.0 ee.

Thanks in advance for any insights!

@priya.rajagopal tagging you as you have helped me in the past on things related to sync gateway

when we delete the sync gateway database and recreate it, we are seeing a ton of logs like this:

2024-11-12T01:28:08.711Z [DBG] Import+: db:main-content col:content Ignoring delete mutation for ee1ff2e9-210c-4d0a-97fe-0f59206d3963 - no existing Sync Gateway metadata.
2024-11-12T01:28:08.712Z [DBG] Import+: db:main-content col:content 3c37bf7b-7968-4068-a766-3cc554a5e0eb will not be imported: Could not unmarshal _sync out of document body: readObjectStart: expect { or n, but found , error found in #0 byte of ...||..., bigger context ...||...

could that be an issue?

also, we switched from community version to enterprise version just recently. could that lead to any issues in sync-ing?

Reingested Updated Module Content: We reloaded the module with updated content, which now has 275 documents.

Can you elaborate a bit on how you are repopulating these documents? I think it may lead us to the cause of why SG cannot sync those docs.

After this update, we attempted to replicate the module to our clients, but it now hangs after syncing 36 documents and shows the total document count as 39, which is inconsistent with the 275 documents we expect.

Where are you getting total document count from? The client, or server?
SG logs covering the time of a replication will help to debug further.

also, we switched from community version to enterprise version just recently. could that lead to any issues in sync-ing?

There are a few config options that are different in EE, but to my recollection none that would impact syncing generally. They are mostly to enable enhanced features like import sharding (HA and sharing import workload across nodes)

when we delete the sync gateway database and recreate it, we are seeing a ton of logs like this:

That log line is basically the scenario where SG can see there was a deleted document in the bucket that had no mobile xattr data present (a deletion is a soft-delete, sometimes referred to as a tombstone - which actually contains metadata about the document/deletion).

In this case SG will ignore it because we had no reference to an older mobile-aware document.

Can you elaborate a bit on how you are repopulating these documents? I think it may lead us to the cause of why SG cannot sync those docs.

Repopulation is done by ingesting the same documents through our API. It creates new identifiers for each document.

Where are you getting total document count from? The client, or server?
SG logs covering the time of a replication will help to debug further.

Android client.

So, what i decided to do is delete those documents again, set the metadata time to 0.04 day in web console and ingested them again. After this, we are able to replicate the documents but the replicator never reaches the stopped (or completed) state. It gets stuck in IDLE state because it sees something like 1147 completed out of 1150 available (those numbers may not be exact but there is always a small difference)

I’m not really sure what action this maps to on the Couchbase Side. Is your API writing documents to the Couchbase bucket directly with an SDK? Writing via Sync Gateway’s REST API? Or something else.

By doing this you’re losing all mobile metadata on those soft-deleted documents, meaning clients cannot correlate their local copies of documents with what used to exist on server.

I am curious what the use-case is to delete and reimport documents. Deletes can be replicated to clients, so if you’re doing this to remove a subset of documents it’s unnecessary. Purging and re-importing has the potential to leave documents on the client.

I’m not really sure what action this maps to on the Couchbase Side. Is your API writing documents to the Couchbase bucket directly with an SDK? Writing via Sync Gateway’s REST API? Or something else.

Writing to bucket with an SDK

By doing this you’re losing all mobile metadata on those soft-deleted documents, meaning clients cannot correlate their local copies of documents with what used to exist on server.

I am curious what the use-case is to delete and reimport documents. Deletes can be replicated to clients, so if you’re doing this to remove a subset of documents it’s unnecessary. Purging and re-importing has the potential to leave documents on the client.

There was a change in data model. The easiest thing for us to do was to just reingest the content.

Shouldn’t the sync gateway work just fine if documents are deleted and new documents are created?