getDocument() returns null after upgrading to 2.8.5 from 2.8.1

Upgrading from CBL 2.8.1 to 2.8.5 introduced a bug which I wasn’t able to track down. Without making any changes to the app’s code other than updating dependencies seems to introduced the issue.

The issue is the following:

new Thread(() -> {
    // get document
   // update document  -- document is null -> crash
}

In the following update I removed the thread, caught the exception and logged the document ID.

    try {
        // log document ID
        // get document
        // update document  -- document is null -> crash
    } catch(Excepti0n) {
    // log exception
}

The document is still null sometimes so it wasn’t the fault of running the code off the main thread. Looking up the documents in Couchbase Server dashboard with the logged document IDs shows the documents, their contents and meta data. The meta data does not show that the documents were deleted at some point and that being the reason that getDocument() returned null.

I checked the changes of Document.java but wouldn’t be able to point at line which might have introduced this bug.

A possible related issue is that users report that things disappeared in the app. It might be that a document was null, was not able to be parsed and hence didn’t show up. Luckily I was able to observe this on one of my devices, too. It looked like the document was updating and then it disappeared. Restarting the app, etc. did not help. Updating the document on another device, made it appear on the other device again. So that means that it was not deleted via the replicator somehow but probably not parseable or findable. Unfortunately I wasn’t able to find any logs about this in my issue tracker of in sg_info logs.

I’m sure there were bugs fixed and improvements made from 2.8.1 to 2.8.5. But these two issues make me want to go back to 2.8.1. Is it safe to do so?

Hey @benjamin_glatzeder , can you give me a real repro case and some real logs?

This has been a bit of a rabbit hole but I’m none the wiser as of yet. Let’s go over what I found out one-by-one.

Logs

A NullPointerException is thrown when running code similar to

if (document.contains("myKey")

Before running this code and before retrieving the document from the database I log the document ID like so:

FirebaseCrashlytics.getInstance().log("document ID = " + documentId);

After retrieving the document I log if it is null

FirebaseCrashlytics.getInstance().log("document == null " + (document == null ? "yes" : "no"));

Here’s a screenshot of the logs in Firebase Crashlytics:

The document that is retrieved from the database and then updated is always the same in the screenshot. Copy-pasting the document ID in Couchbase Server’s dashboards shows the document.

Repro

Not yet. I might have seen another issue - which may have resulted from the failed document update - on one of my devices though a few days ago.
As briefly as possible about the code path where the issue arises:

  • one or more documents of type A are updated
  • live query picks up the changes and some code updates a single document of type B. That document stores progress statistics. That is the document that isn’t found in the database.

If two devices are active at the same time it is possible that

  • Device_1 updates documents of type A and syncs them
  • Device_2 gets the changes and updates the statistics document
  • Device_1 updates the statistics document
  • Possible conflicts

I gathered those warnings:

2021-05-25 22:06:11.583 E/CouchbaseLite/REPLICATOR: {N8litecore4repl6PusherE#2} Got error response to rev 'my::doc::id' #155-05455665d517246cad29eadc6cdff324000cb088 (seq #1383): HTTP 409 'Document revision conflict'
2021-05-25 22:06:11.760 D/LOG: document == null no
2021-05-25 22:06:11.771 W/CouchbaseLite/DATABASE: Parallel executor overflow: 1
2021-05-25 22:06:11.802 W/System.err: CouchbaseLiteException{CouchbaseLite,10409,'Document revision conflict
2021-05-25 22:06:11.803 W/System.err:    (CouchbaseLite Android v2.8.5-1 (CE/release, Commit/40e7b34008@c65a5127e818 Core/2.8.5 (1) at 2021-03-10T01:41:36.538Z) on Java; Android 10; Nokia 1 Plus)'}
2021-05-25 22:06:11.805 W/System.err:     at com.couchbase.lite.internal.CBLStatus.toCouchbaseLiteException(CBLStatus.java:96)
2021-05-25 22:06:11.805 W/System.err:     at com.couchbase.lite.internal.CBLStatus.toCouchbaseLiteException(CBLStatus.java:58)
2021-05-25 22:06:11.805 W/System.err:     at com.couchbase.lite.internal.CBLStatus.convertC4Error(CBLStatus.java:38)
2021-05-25 22:06:11.805 W/System.err:     at com.couchbase.lite.AbstractReplicator.documentEnded(AbstractReplicator.java:783)
2021-05-25 22:06:11.806 W/System.err:     at com.couchbase.lite.AbstractReplicator$ReplicatorListener.lambda$documentEnded$1(AbstractReplicator.java:279)
2021-05-25 22:06:11.807 W/System.err:     at com.couchbase.lite.-$$Lambda$AbstractReplicator$ReplicatorListener$AQbJ61Z6BP7-H2rKuYcxertPVV0.run(Unknown Source:6)
2021-05-25 22:06:11.807 W/System.err:     at com.couchbase.lite.internal.AbstractExecutionService$InstrumentedTask.run(AbstractExecutionService.java:94)
2021-05-25 22:06:11.807 W/System.err:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
2021-05-25 22:06:11.808 W/System.err:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
2021-05-25 22:06:11.808 W/System.err:     at java.lang.Thread.run(Thread.java:919)

As much as I tried I wasn’t able to force a “document == null yes” log. I’m planning to update the code and manually update the statistics document instead of observing changes and acting on them. This will make sure that a conflict won’t happen in this scenario.

Why 2.8.5

I assume that the issue was introduced after CBL 2.8.1 since the app code I described above hasn’t changed much in the last 2-3 years. The first app update released with 2.8.5 showed 97% of users didn’t run into a crash. App versions before that show nearly 100% crash-free users. The app update also included building for targetSdkVersion 30 (Android 11) from targetSdkVersion 29. The issue that a document retrieved from the database was not found was the most prominent crash and it appeared for the first time in this app version, too. Then I started logging the document ID. The issue persists, the document can be viewed in the dashboard of Couchbase Server and meta data doesn’t show that the document was deleted at some point in time.

Would it be safe to go back to 2.8.1? If the issue keeps popping up I’d know it’s not an issue with 2.8.5 at least.

Yeah… I cannot even figure out what “the issue” is.

You haven’t given me a stacktrace from the Exception, no way to repro, nothing. All I’ve got in the way of clues is that document.contains causes an NPE. The most likely scenario is that document is null. If it is a CBL Document, there might be other possibilites, but without a little more information, I really just have absolutely nothing to go on.

database.getDocument("docId") returns null in rare cases since updating to CBL 2.8.5 from CBL 2.8.1. This happened often enough that I noticed and this is the first time this issue appeared. The screenshot with the logs shows that the “statistics document” is not null, once, twice, suddenly it’s null. Yes, it’s a CBL document. Since this issue appeared I log the document ID in this code path to check if it is actually deleted. I check this in the Couchbase Server dashboard. There I click the Documents link, then I paste the logged document ID and the JSON content and meta data appears. The meta data doees not show that the document was deleted. Hence CBL should return the document, not null.

To answer the other question, in general it is safe to go back to a previous version provided that the newer version didn’t change any internal schema. 2.8.1 → 2.8.5 did not, and in cases other than this the way you will be able to tell is that you will immediately get an exception opening the database informing you that the database is too new to be opened by the library. I also happen to know that Android SDK 30 has been a real buzzkill in terms of things that it’s been changing without regard to anyone’s feelings :stuck_out_tongue: but I don’t know enough to say if that is going to be a factor here or not. I’d suspect not.

Was the document ID retrieved with a CBL query?

@hyling Thanks for chiming in! It’s a simple get, no query. Code with fully qualified class names would look like this:

com.couchbase.lite.DatabaseConfiguration configuration = new com.couchbase.lite.DatabaseConfiguration();
com.couchbase.lite.Database database = new com.couchbase.lite.Database("db-name", configuration);
com.couchbase.lite.Document document = database.getDocument("doc-id");

If I’m not mistaken you’re an iOS developer. If you don’t mind sharing did you already update to the latest CBL version on iOS and did you see any similar issue?

@borrrden Thanks for the info! I’ll release an update with CBL 2.8.1 soon and will report back if the issue persists.

@blake.meike I was thinking about the issue a bit longer and came up with a theory: it might be that the flash (hard disk drive) is busy and someone in the call chain doesn’t want to wait and throws an error and thus CBL returns null. Here’s a probable call chain illustration:

CBL-getDocument -- lower level code -- Android system code -- driver code -- flash (hard disk drive)

The issue with the getDcoument() call appeared now in several different hot code paths. It looks like it is always a one-off crash. E.g. there is one “getDocument()-crash” that appeared 72 times for 67 users.

The issue appeared on several Android OS version so far, not just Android 11 (SDK 30). As much as I tried I wasn’t able to reproduce the issue in the last days.

@benjamin_glatzeder Not a bad theory. I don’t think such asynchronicity is possible but I know that code isn’t bullet proof. Without a log and some code, I probably won’t be able to find out.

There are a number of ways that that method can return null. Most of them have to do with the document not being found, the database not existing, or it not being open.

@blake.meike @borrrden The last app update was released with CBL 2.8.1. The newly discovered crashes/exceptions are still present. Thus CBL 2.8.5 does not introduce the ‘getDocument()-returns-null’ issue.

I went over the few commits for the nth time between the last app update without the new issue and the app update where the issue first appeared. The only two things I can point to are the CBL update and upgrading Android SDK 29 to 30. Clearly it’s not the CBL update. I’ll let the exceptions fail gracefully from now on.

For what it’s worth I simulated clicks on three devices for over 24 hours on each device. Every second a click was simulated and a code path was executed which had these rare issues. Not one device ran into the issue.

Here’s some meta data of a document which shows it was updated over 100.00 times

    "rev": "54100233896330-1686db24aff200000000000000000000",
    "expiration": 0,
    "flags": 0,
    "type": "json"
  },
  "xattrs": {
    "_sync": {
      "rev": "128417-17b963907accea343060c47067ff3e9335000a40",
      "sequence": 246402248,
      "recent_sequences": [
        246402248
      ],
      "history": {
        "revs": [
          "128273-78657612715d10d2108baff8a46b1800e301079c",
          "128072-affa820d4f4ac7fa70a18ac35ff426fc1437c96e",
          "128219-c2bfd33c8a2cd720d4ff1369390b6a57b026fb80",
          "128058-8e5930cc45e7d62792902478a6cacee9782b7e60",
          "128272-5d762c31a0afca24cc9f0c99b81e60900cb04fad",
          "128199-0e091906069022e83b6d324b0294735b74b34e07",
          "128141-96d796850d80e2fe824fb9edaff242fdc7475b37",

@benjamin_glatzeder , as I keep saying, we are going to need some kind of documentation of this failure, to track it down… A logcat log or a stacktrace.

Here is the implementation of the method you are calling:

 public Document getDocument(@NonNull String id) {
        Preconditions.checkArgNotNull(id, "id");

        synchronized (lock) {
            mustBeOpen();
            try { return new Document((Database) this, id, false); }
            catch (CouchbaseLiteException ex) {
                // only 404 - Not Found error throws CouchbaseLiteException
                return null;
            }
        }
    }

It has been this way since 2.7.x, and possibly before that. As you can see from the comment, the code is assuming that the only reason that it will get an exception is that Core can’t find the document. From looking at this I conclude that either:

  1. your document is missing
  2. the assumption is incorrect and the error is generated in some other way.

For what it is worth, by the way, I think it is, fragile, to say the least, to assume that that error is a 404, not check it, and not log it. I’ve fixed that in recent versions.

@benjamin_glatzeder I’m glad you’re bringing these up. I certainly want to make sure this isn’t happening in my app!

Yes, I work on an iOS app which is still on CBL 2.8.0. I haven’t seen this issue but the way I retrieve documents would mask the issue. I always fetch the document id from a CBL query that selects for documents where the type property is X, documents that are null wouldn’t be return in the query so I would never see this issue. Documents could be disappearing and I wouldn’t know until a user reported it.

You said you saw this on one of you devices. Have you tried extracting the database from the device and querying for that doc id with the cblite tool? That might help isolate the problem, it has helped with several of my bugs. I’d also be interested in knowing if that doc id is returned if you query for all documents.

1 Like

So zero devices ran into the issue although trying over and over again.

I’ll make a note of the tool you linked! This definitely will come in handy when I finally run into the issue!