oldDoc is null when deleting document through sync gateway

Hi,

I’m seeing some behaviour on my sync gateway that I can’t make sense of. When deleting a document through the admin REST API, the oldDoc argument in the sync function is null in some cases.

One of the documents for which this happens looks like this when requesting it through the sync gateway:
{
“_id”: “cb0badc4b0e964173247fa0cb2bab898”,
“_rev”: “663-05d0ac14fc567b26f81413f85e09edff”,
“type”: “foo”,
“userId”:“bar”
}

When running a GET cb0badc4b0e964173247fa0cb2bab898?open_revs=all, I see two leaf revisions:
–9aad31a4adae350f872f600bdf6a3f44c7ccc8f50a9140a2f1e254e955ac
Content-Type: application/json

{"_deleted":true,"_id":"cb0badc4b0e964173247fa0cb2bab898","_rev":"2507-83b1c60d11c7f2f4379232f92775d946"}
--9aad31a4adae350f872f600bdf6a3f44c7ccc8f50a9140a2f1e254e955ac
Content-Type: application/json

{"_id":"cb0badc4b0e964173247fa0cb2bab898","_rev":"663-05d0ac14fc567b26f81413f85e09edff","type":"foo","userId":"bar"}
--9aad31a4adae350f872f600bdf6a3f44c7ccc8f50a9140a2f1e254e955ac--

Since rev 2507 is deleted, rev 663 should always be picked.

When I now try to DELETE cb0badc4b0e964173247fa0cb2bab898?rev=663-05d0ac14fc567b26f81413f85e09edff, my sync function fails with many “No old revision” errors until it finally fails with in one of my guards:

2018-04-10T11:27:58.825Z CRUD+: No old revision "cb0badc4b0e964173247fa0cb2bab898" / "2506-2a6b78ebefb0549fafe80795fcf5b92c"
...
2018-04-10T11:27:59.005Z CRUD+: No old revision "cb0badc4b0e964173247fa0cb2bab898" / "1508-059f837867266744554dc2dee463481d"
Sync fn rejected: new=map[_deleted:true _id:cb0badc4b0e964173247fa0cb2bab898 _rev:2507-83b1c60d11c7f2f4379232f92775d946]  old= --> 403 No parent revision

Two things here are confusing.

  1. Shouldn’t oldDoc be the body of rev 663 instead of null, since DELETE equals to an update with _deleted=true on that revision?
  2. Why is the log output showing _rev:2507-83b1c60d11c7f2f4379232f92775d946 although I specifically tried to delete ?rev=663-05d0ac14fc567b26f81413f85e09edff

It seems to me that the sync gateway is picking the wrong revision to delete. Am I missing something or did I run into a bug?

Cheers,
Anton

I agree - it’s unexpected that SG would need to walk the revision tree from 2507 when trying to DELETE 663.

Can you provide a gist with the raw _sync metadata for your document, to provide some insight on the state of the revision tree for this document?

Unfortunately I had to purge that document. But I found another one with the exact same behaviour:
Two leaf revisions, 4559 and 5067:

--2ad6ab8b1a7869d759e63bcb9f1fe708dd24dc4d627f44b59619e73a544f
Content-Type: application/json

{"_id":"1e801180ef17205a772314bb96c60dfb","_rev":"4559-5b9429c0ca943c139c698dca7f17ac28","type":"foo","userId":"d6dc690d-5171-4af2-88a3-587ba8d063a1"}
--2ad6ab8b1a7869d759e63bcb9f1fe708dd24dc4d627f44b59619e73a544f
Content-Type: application/json

{"_deleted":true,"_id":"1e801180ef17205a772314bb96c60dfb","_rev":"5067-77fa7c9e633cda5ee7a87bcb5851ff87"}
--2ad6ab8b1a7869d759e63bcb9f1fe708dd24dc4d627f44b59619e73a544f--

Deleting the winner via DELETE …/1e801180ef17205a772314bb96c60dfb?rev=4559-5b9429c0ca943c139c698dca7f17ac28 fails and generates log output with the other revision 5067 on the SG:

2018-04-11T11:16:36.437Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4074-90a3c6cd32f703f70ab0d7eb5514f18b"
2018-04-11T11:16:36.437Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4073-bee68f85909e260529a320a5a43a6374"
2018-04-11T11:16:36.437Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4072-054f8c6fe07882c509629d73ae650c27"
2018-04-11T11:16:36.438Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4071-112f37ecc9b035df98609a20ac11600d"
2018-04-11T11:16:36.438Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4070-e5880de6816b9b8c1237db764b9fa011"
2018-04-11T11:16:36.438Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4069-0c8cf138af2bda87110968fd75e8ab61"
2018-04-11T11:16:36.438Z CRUD+: No old revision "1e801180ef17205a772314bb96c60dfb" / "4068-f3c1a75ec137cfcc0c77518d70374314"
2018-04-11T11:16:36.438Z Sync fn rejected: new=map[_deleted:true _id:1e801180ef17205a772314bb96c60dfb _rev:5067-77fa7c9e633cda5ee7a87bcb5851ff87]  old= --> 403 No parent revision

The high revision numbers were due to a client-side bug that caused a lot of updates, which is probably also what broke the document. Here is a gist to the full document (with some properties stripped). Unfortunately it’s a little bigger than my previous example.

It is also worth noting that in both cases I could see the parent revision (4558-89cfa15d7f5331ad22a1cf2585b226da for this doc) in the sync metadata, but could not access it
on the SG:

GET .../1e801180ef17205a772314bb96c60dfb?rev=4558-89cfa15d7f5331ad22a1cf2585b226da
{
    "error": "not_found",
    "reason": "missing"
}

@adamf the difference in generation numbers for these revisions is high. Maybe a factor? I looked at the gist, and the revisions diverge at generation 4068. That seems awfully suspicious given the problem happens with rev 5067.

@antweb I’m guessing you know about revs_limit? (See https://developer.couchbase.com/documentation/mobile/current/guides/sync-gateway/config-properties/index.html#1.5/databases-foo_db-revs_limit). I’m wondering if the history is getting pruned and that’s triggering a problem. You might try setting revs_limit up and see if that resolves it as a workaround.

This looks like a consequence of the fact that when a tombstone causes an existing revision becomes the active revision, the sync function is executed based on the new active revision. In that scenario Sync Gateway appears to be walking up the revision tree looking for an ancestor body to use as old doc (the ‘No old revision’ logging you’re seeing), not finding one (which is expected in the circumstances), and ending up with an empty oldDoc.

This handling feels counter-intuitive, and I’ve filed https://github.com/couchbase/sync_gateway/issues/3438 for followup. Can you let me know what Sync Gateway version you’re using?

I suspect that revs_limit plays into this, since I have only observed this behaviour when the revisions diverge heavily (as you were saying, about 1000 in the second example, possibly more in the first).

The way I understand the problem at the moment is that deleting rev 5066 caused rev 4559 to become active (since it is the only non-deleted leaf). This revision has no history anymore, since it has been pruned which also explains the “not_found” on the parent rev 4885.

My question still is why I’m not simply receiving the body of 4559 as oldDoc. I can create new revisions with 4559 as parent, so implicitly deleting it via _deleted=true should be a workaround. And why does the sync gateway not start walking the revision tree at rev 4559 but at 5066?

I’m currently running SG version 1.5.1-4.