CBL 2.6.0 Performance: saving 11,000 MutableDocuments takes 20 seconds; what am I doing wrong?

TL;DR
What is the recipe or best practice to save a large collection of MutableDocuments to a database?

Back Story
Our code makes an API call to fetch about 11,000 model classes from an API call. We then persist these into a local CBL database, and we are NOT (yet) using a sync gateway. So everything is on the device, and we’re just using CBL as our local database of choice, with the goal of setting up a sync gateway in the future.

The model class has a Couchbase MutableDocument property, which is instantiated by the particular model constructor invoked by the API consumer code, since our model is immutable. I say this so you know that by the time we pass the 11,000 models to our CBL database, the MutableDocument for each model already exists, and the only work to do is to invoke Database.Save(myMutableDocument) for each MutableDocument.

The model class has 23 attributes, most of which are simple value types; only a few are collections which contain fewer than 5 items each.

I’ve done a lot of A/B testing in the code, with and without Database.InBatch, with and without PLINQ operators, but the gist of my code is this:

myMutableDocuments.AsParallel().ForAll(mutableDocument => database.Save(mutableDocument))

Also, we’re doing this in the background, on a brand new, empty database. Only after this temporary database is built do we swap it with the existing database file so that our repository classes can see it.

The 20 seconds is on older devices like iPhone 6, but even on newer devices it’s like 10-15 seconds. That just can’t be right…

Do you have indexes ? In Couchbase Lite 2.x, indexing is done eagerly at the time of update so reads are extremely performant. If you have bulk updates, it may be cheaper to create it after the update …

Only after this temporary database is built do we swap it with the existing database file so that our repository classes can see it.

Hmm… (unrelated to issue) but didn’t quite understand why you have a temporary database …Why are updates not made directly to the primary one ?

A Database instance is internally single-threaded, so trying to add the documents in parallel is a bad idea. You’ll create a lot of lock contention and keep multiple cores busy for no purpose.

Do you need to keep all the MutableDocuments in memory at once? That’s going to increase memory usage, which can itself reduce performance. It would be more efficient to create and save one document at a time, within a batch.

If you’re still having performance problems, is there a way you can profile the app during this operation and see where the time goes? Instruments is a great tool for this on iOS; it looks like you’re using .NET, but Instruments should still produce useful info by showing the stack traces inside the Couchbase Lite Core C++ library.

Hi @priya.rajagopal, yes we have some indexes, but I create them after all of the documents have been inserted, to avoid the per-insert penalty.

If we can find a way to persist 11,000 records in less than 2-3 seconds, we would happily abort our approach of building a brand new “temporary” database in the background, and then swapping it for the “live” database seen by the user. We just can’t disrupt or delay the user experience for 20 seconds on mobile.

If you use a separate Database instance on a background thread for the insertions, your regular Database instance will continue to be useable for reads. It will block on writes, though. If that’s a problem, you could have the insertion task insert docs in smaller batches so it’s only holding the write lock for a short time.

Hi @jens, thanks for the clarification and guidance on threading!

Our Refit endpoint has already returned 11,000 models in memory to upstream code, before we’re downstream at the database level. So far, memory pressure hasn’t been a significant issue from some of our profiling.

I switched over to a physical Android Pixel C tablet, since it’s better hardware than an iPhone 6. Taking your advice about single threading, here’s the updated method, which takes just under 11 seconds on Pixel C, so roughly 1 second per 1000 documents:

                _logger.Debug($"Starting to save {documents.Count()} documents...");
                var stopwatch = System.Diagnostics.Stopwatch.StartNew();
                try
                {
                    database.InBatch(() =>
                    {
                        documents
                            .ForEach(mutableDocument =>
                            {
                                try
                                {
                                    database.Save(mutableDocument);
                                }
                                catch (Exception e)
                                {
                                    _logger.Error($"Exception when saving document: ", e);
                                }
                            });
                    });
                }
                catch (Exception e)
                {
                    _logger.Error($"Exception when saving documents: ", e);
                }

                stopwatch.Stop();
                _logger.Debug($"Saved documents to database in {stopwatch.ElapsedMilliseconds}ms");

I should also note that we’re not using any listeners on the database; it’s literally just our persistent offline storage.

So on a data refresh, if the existing database has 11k documents, and I save 11k new documents (new objects in memory), how does CBL handle change detection? Will it just blindly upsert all 11k documents? Or, for each Save invocation, will it quickly abort if some kind of equality check determines that the document already exists? If so, is that equality check based on the document ID, or is it customizable? We’re already explicitly assigning a unique value as the Couchbase ID for every document.

Or, is the recommend approach for us to do the diffing, and determine which documents needs to be updated, and then just save only those documents?

As expected, using different batch sizes of documents, per database transaction, doesn’t change anything. Ultimately, inserting 11k documents consecutively is going to take about the same amount of time, whether you do it in batches or not.

From the logs on Pixel C; each of these were done in a separate app session:

Saved 10935 documents in batches of 50 to database in 11294ms
Saved 10935 documents in batches of 200 to database in 10938ms
Saved 10935 documents in batches of 1000 to database in 10471ms
Saved 10935 documents in batches of 3000 to database in 10733ms

**Disclaimer: CBL user **

I guess it is worth comparing it to a baseline app. Here’s a single Activity creating 10,000 tiny documents:

public class MainActivity extends AppCompatActivity {

    private static final String DATABASE_NAME = "db";
    private final String TAG = "CblDemo";
    private Database database;
    private int noOfDocuments = 10000;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        findViewById(R.id.button).setOnClickListener(v -> new Thread(this::createDocumentsInThread).start());
        findViewById(R.id.button2).setOnClickListener(v -> new Thread(this::createDocumentsInThreadInBatch).start());
    }

    private void createDocumentsInThread() {
        long t0 = System.currentTimeMillis();
        /* Create database */
        try {
            createDatabase();
        } catch (CouchbaseLiteException e) {
            e.printStackTrace();
        }
        long t1 = System.currentTimeMillis();
        Log.d(TAG, "time in ms to create database = " + (t1 - t0));
        for (int i = 0; i < noOfDocuments; i++) {
            try {
                createDocument(i);
            } catch (CouchbaseLiteException e) {
                e.printStackTrace();
            }
        }
        long t2 = System.currentTimeMillis();
        Log.d(TAG, "time in ms to store documents = " + (t2 - t1));
    }

    private void createDocumentsInThreadInBatch() {
        long t0 = System.currentTimeMillis();
        /* Create database */
        try {
            createDatabase();
        } catch (CouchbaseLiteException e) {
            e.printStackTrace();
        }
        long t1 = System.currentTimeMillis();
        Log.d(TAG, "time in ms to create database = " + (t1 - t0));
        try {
            database.inBatch(() -> {
                for (int i = 0; i < noOfDocuments; i++) {
                    try {
                        createDocument(i);
                    } catch (CouchbaseLiteException e) {
                        e.printStackTrace();
                    }
                }
                long t2 = System.currentTimeMillis();
                Log.d(TAG, "time in ms to store documents = " + (t2 - t1));
            });
        } catch (CouchbaseLiteException e) {
            e.printStackTrace();
        }
    }

    private void createDatabase() throws CouchbaseLiteException {
        CouchbaseLite.init(MainActivity.this);
        DatabaseConfiguration configuration = new DatabaseConfiguration();
        database = new Database(DATABASE_NAME, configuration);
    }

    private void createDocument(int i) throws CouchbaseLiteException {
        MutableDocument mutableDocument = new MutableDocument();
        mutableDocument.setInt("no", i);
        database.save(mutableDocument);
    }
}

Layout

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <Button
        android:id="@+id/button"
        android:layout_width="0dp"
        android:layout_height="wrap_content"
        android:layout_marginStart="16dp"
        android:layout_marginTop="16dp"
        android:layout_marginEnd="16dp"
        android:text="Start!"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

    <Button
        android:id="@+id/button2"
        android:layout_width="0dp"
        android:layout_height="wrap_content"
        android:layout_marginStart="16dp"
        android:layout_marginTop="16dp"
        android:layout_marginEnd="16dp"
        android:text="In Batch!"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@id/button" />
</androidx.constraintlayout.widget.ConstraintLayout>

gradle

apply plugin: 'com.android.application'

android {
    compileSdkVersion 29
    buildToolsVersion "29.0.2"
    defaultConfig {
        applicationId "com.cbl.demobulkinsert"
        minSdkVersion 21
        targetSdkVersion 29
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
    }
    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
}
dependencies {
    implementation fileTree(dir: 'libs', include: ['*.jar'])
    implementation 'androidx.appcompat:appcompat:1.1.0'
    implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
    testImplementation 'junit:junit:4.12'
    androidTestImplementation 'androidx.test.ext:junit:1.1.1'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
    implementation 'com.couchbase.lite:couchbase-lite-android:2.6.2'
}

Test device: Xiaomi Mi A2 (~160 USD on Amazon.com). App was reinstalled between test runs. Here are my test results:

Debug apk
Normal insert
D/CblDemo: time in ms to create database = 47
D/CblDemo: time in ms to store documents = 5200
In batch insert
D/CblDemo: time in ms to create database = 74
D/CblDemo: time in ms to store documents = 1466

Release apk
Normal insert
D/CblDemo: time in ms to create database = 46
D/CblDemo: time in ms to store documents = 5035
In batch insert
D/CblDemo: time in ms to create database = 53
D/CblDemo: time in ms to store documents = 1092

Inserting documents in batch increases performance by a lot! Running the test with a release build shows a further performance increase. I think it all comes down to how complex the documents are and how big their body is. This also includes field names, e.g. field name ‘s’ vs field name ‘street_name’. Writing this I found another performance increase.

// Replacing this line
MutableDocument mutableDocument = new MutableDocument();
// with
MutableDocument mutableDocument = new MutableDocument(String.valueOf(i));

This may not be recommended but it does improve insert times. Could you share your numbers running the code above? Or could you share demo code so I can do a test run?

Hi @benjamin_glatzeder, thanks for pitching in to help!

When I said “batch” above, I meant “partitioning” the 11k models into smaller “batches”, iterating over the batches, and then saving each batch of documents inside of a CBL database transaction. Yes, using a CBL transaction is faster for inserts, and we’re already doing that. I was just saying that if you do several units of work consecutively instead of concurrently, there’s very little you can gain from the order in which you structure consecutive units of work.

By way of interest, here are some updated timings which confirm that using a CBL database transaction is 24% - 30% faster:

Saved 10935 documents in batches of 50 to database in 11294ms which is 24% faster than 14910ms by using a transaction
Saved 10935 documents in batches of 200 to database in 10938ms which is 27% faster than 14980ms by using a transaction
Saved 10935 documents in batches of 1000 to database in 10471ms which is 35% faster than 16049ms by using a transaction
Saved 10935 documents in batches of 3000 to database in 10733ms which is 30% faster than 15413ms by using a transaction

Here is a single model with real-world data that has been obfuscated without changing the character count of the JSON; it’s 1375 characters. We’re already explicitly passing the entryGuid value as the first parameter of the MutableDocument constructor:

{
  "entryGuid": "7kwb5269-28a8-4c23-b90c-37c53867dcc7",
  "type": "person",
  "firstName": "Andy",
  "lastName": "Huntsman",
  "displayName": "Huntsman, Andy",
  "organizationName": "Huntsman, Andy",
  "locationGuid": "baf01c9f-18fb-4b60-b709-93051143696a",
  "siteGuid": "2fdba5c0-d6dc-468f-9046-1cab227fab8f",
  "siteName": "Main Site",
  "userStatus": "Regular User",
  "departmentName": "Information Technology > Apps",
  "residenceArea": "Anytown - NY (USA)",
  "workAreaName": "Office 1",
  "contacts": [
    {
      "type": "Work",
      "method": "Phone",
      "contact": "58119"
    },
    {
      "type": "Work",
      "method": "Fax",
      "contact": "58419"
    },
    {
      "type": "Work Alternate",
      "method": "Phone",
      "contact": "51000"
    },
    {
      "type": "Personal",
      "method": "Mobile",
      "contact": "+1 8885551212"
    },
    {
      "type": "Personal",
      "method": "Email",
      "contact": "andrewhuntsman@acme.com"
    },
    {
      "type": "Work",
      "method": "Email",
      "contact": "AJHUNTSMAN@acme.com"
    }
  ],
  "spouseEntryGuid": "2281d49d-3e84-4f5c-be90-00a505c011f2",
  "personPrimaryPhotoGuid": "3510be56-47c6-4546-8954-9116743f2efc",
  "primaryPhotoUrl": "https:\/\/myphotoserver.blob.core.windows.net\/primaryphotos\/8891be56-47c6-4546-8954-9116743f2efc?sv=2018-03-28&sr=b&sig=vJTpHXMYefdVrehAo%2FJiDHNnq60rMlwk3d4QHZQSRmk%3D&se=2019-11-15T23%3A00%3A48Z&sp=r",
  "emergencyContactGuids": [ ],
  "teamName": "Blue Team",
  "lastUpdated": "0001-01-01T00:00:00+00:00",
  "expirationDate": "0001-01-01T00:00:00+00:00",
  "couchbaseType": "Person",
  "couchbaseId": "7kwb5269-28a8-4c23-b90c-37c53867dcc7"
}

1000 inserts/sec is around what I’d expect, within an order of magnitude at least. It depends a lot on the speed of the SSD/flash storage, and on CPU. Do you have tools to profile and see where the time is going?

how does CBL handle change detection?

Our docs should explain that. We use MVCC. If you create a new Document and save it, you’ll fail with a conflict error if a doc with that ID already exists, unless you use the “always overwrite” save mode. To update an existing doc, you have to get the document first, then modify it, then save. (Again, unless you want to risk using the always-overwrite mode, which I recommend avoiding. We really just added it as a concession to people who don’t want to think about concurrency issues.)

Just to follow up quickly. I was able to create a Java POJO with a converter fast and parsing the json to a POJO, too. But writing mutable doc statements, e.g. mutableDoc.setString("entryGuid", pojo.getEntryGuid); takes too much time for me at the moment. If you could provide these statements I’m happy to benchmark with my devices.

In any case I took a shortcut and wrote a 2000 character lorem ipsum text to a single field, like so:

MutableDocument mutableDocument = new MutableDocument();
mutableDocument.setString("text", text);
database.save(mutableDocument);

and all this 10,000 times. Xiaomi Mi A2 = under 2 seconds, Xiaomi Mi A1 under 4 seconds, Nexus 5 2013 under 5 seconds. Just did another test with a relatively new Samsung tablet: Samsung Galaxy Tab A 10.5. This was consistently the slowest: below 6 seconds.
Again I’m happy to benchmark with the exact POJO and share the updated code and results!

Edit: only tested with debug code. Release build should be a bit faster.

Thanks @jens, we definitely see a correlation between performance and device hardware, which isn’t surprising. The time is literally just for saving 11k pre-existing MutableDocument objects. Here’s the method:

            if (documents?.Any() ?? false)
            {
                var batchSize = 3000;
                _logger.Debug($"Starting to save {documents.Count()} documents in batches of {batchSize}...");
                var stopwatch = System.Diagnostics.Stopwatch.StartNew();
                try
                {
                    documents
                        .InBatchesOf(batchSize)
                        .ForEach(documentBatch =>
                        {
                            database.InBatch(() =>
                            {
                                documentBatch
                                    .ForEach(mutableDocument =>
                                    {
                                        try
                                        {
                                            database.Save(mutableDocument);
                                        }
                                        catch (Exception e)
                                        {
                                            _logger.Error($"Exception when saving document: ", e);
                                        }
                                    });
                            });
                        });
                }
                catch (Exception e)
                {
                    _logger.Error($"Exception when saving documents: ", e);
                }

                stopwatch.Stop();
                _logger.Debug($"Saved {documents.Count()} documents in batches of {batchSize} to database in {stopwatch.ElapsedMilliseconds}ms");
            }

Thanks, @benjamin_glatzeder but it’s probably safe to assume that raw character count alone is not an accurate apples-to-apples comparison, since there’s likely some performance penalty relative to the number of attributes in the model.

We never mutate any of our Couchbase models, as the data is read-only and immutable. We just take the models from our API endpoint, create a MutableDocument for each one, and save them to CBL. That’s it.

The creation of the MutableDocument is NOT part of any of our timings, and is done before we go to the database, so it’s not a factor in any of our performance benchmarks or concerns.

Here’s the method in our model which creates a MutableDocument:

                new MutableDocument(EntryGuid.ToString(), new Dictionary<string, object>
                {
                    {JsonProperties.EntryGuid, EntryGuid.ToString()},
                    {JsonProperties.EntryType, EntryType},
                    {JsonProperties.FirstName, FirstName},
                    {JsonProperties.LastName, LastName},
                    {JsonProperties.MaidenName, MaidenName},
                    {JsonProperties.DisplayName, DisplayName},
                    {JsonProperties.OrganizationName, OrganizationName},
                    {JsonProperties.LocationGuid, LocationGuid.ToString()},
                    {JsonProperties.SiteGuid, SiteGuid.ToString()},
                    {JsonProperties.SiteName, SiteName},
                    {JsonProperties.UserStatus, UserStatus},
                    {JsonProperties.DepartmentName, DepartmentName},
                    {JsonProperties.ResidenceArea, ResidenceArea},
                    {JsonProperties.WorkAreaName, WorkAreaName},
                    {JsonProperties.TeamName, TeamName},
                    {JsonProperties.Contacts, Contacts.Select(contact => contact.ToDictionary()).ToList()},
                    {JsonProperties.SpouseEntryGuid, SpouseEntryGuid.ToString()},
                    {JsonProperties.PersonPrimaryPhotoGuid, PersonPrimaryPhotoGuid.ToString()},
                    {JsonProperties.PrimaryPhotoUrl, PrimaryPhotoUrl},
                    {JsonProperties.EmergencyContactGuids, EmergencyContactGuids.Select(guid => guid.ToString()).ToList()},

                    {ICouchbaseDocumentExtensions.JsonProperties.CouchbaseType, CouchbaseType.ToString()},
                    {ICouchbaseDocumentExtensions.JsonProperties.CouchbaseId, EntryGuid.ToString()},
                    {IPerishableDataExtensions.JsonProperties.LastUpdated, LastUpdated},
                    {IPerishableDataExtensions.JsonProperties.ExpirationDate, ExpirationDate}
                });

Andy, I’ll repeat that I’d love to see some profiling of your code. I’m not sure whether the interesting bits will be in the managed or native code, and there’s probably no profiler that can work on both, so maybe try both .NET profiling (if it exists for iOS?) and Instruments.

I’m currently away on vacation, but I’ll see what I can do when I return towards the end of the month.

Hi All,

Here is a simple benchmarking app that tests the scenario discussed in this thread: https://github.com/ryanholden8/CouchbaseBenchmark

Please feel free to pull down and test.

Found this interesting benchmark on Realm’s site: MongoDB For Mobile Use Cases | MongoDB
If you click the Inserts tab you’ll see 1.7K inserts per second for Couchbase.
Which is about what we are seeing as well when we look at 14 seconds (includes the 2 seconds for document creation) for 11k on our iPad Air vs 1.7k second inserts for their test on their iPad Air.
Realm’s benchmarks are from 2014.

Here are some timings from the benchmarking app:

iPhone SE - iOS 13.1.3

Delete DB: 0 ms
Create New DB: 21 ms
Create 11 Indexes: 91 ms
Create Models: 43 ms
Create 11000 Documents: 1016 ms
Batch Save 11000 Documents: 8093 ms

iPhone 6 - iOS 12.4

Delete DB: 0 ms
Create New DB: 45 ms
Create 11 Indexes: 559 ms
Create Models: 684 ms
Create 11000 Documents: 3904 ms
Batch Save 11000 Documents: 20690 ms

iPad Air - iOS 12.4

Delete DB: 0 ms
Create New DB: 34 ms
Create 11 Indexes: 231 ms
Create Models: 263 ms
Create 11000 Documents: 2308 ms
Batch Save 11000 Documents: 12621 ms

Pixel C - Oreo 8.1.0

Delete DB: 0 ms
Create New DB: 18 ms
Create 11 Indexes: 236 ms
Create Models: 126 ms
Create 11000 Documents: 912 ms
Batch Save 11000 Documents: 18955 ms

Realm’s benchmark results were for Couchbase Lite 1.0 or 1.1, which was admittedly much slower, and the implementation of their benchmark for Couchbase Lite was extremely inefficient since they didn’t make use of indexing (map/reduce views.)

It’s worth pointing out that your benchmark is specifically for the .NET version of Couchbase Lite. While the different platforms share core code, there’s overhead introduced by the bindings to the platform APIs, especially when those cross the native/managed barrier. So performance is likely to differ in Java, Objective-C and Swift.

There’s a lot of code in the Shared directory of your repo. Could you describe what the benchmark does at a high level? I can see it has custom data model classes which it then encodes to CBL Documents; what does it do after saving them?

Oh also, you can glean a lot of insight by running the iOS version of the app in Instruments. It won’t help with the C# code but it will show how much time is spent in the various parts of the C++ code in Couchbase Lite Core.

(I’d offer to do this but I don’t have a .NET development environment nor any experience with it.)

I’m happy to run a benchmark and share the results. Could you please make a release apk available. I lack the knowledge of setting up a C# developer IDE.

The end product of the build is the same as a native build (.ipa or .app) so if OP were to post a build I think you’d be able to run Instruments on it right (or will signing prevent this)?

EDIT The only caveat here is that everything in the .NET runtime is necessarily AOT compiled to run on iOS so all .NET stuff will show up as native code and might be hard to reason about.