Couchbase lite document design, structure and NIQL performance

It’s bit of design and bit of performance question.

We have some look up data which is pretty much static; however, This look up data is huge(roughly around 2 million) in a format of key values and some of them are in form of array.

This look up data is various types. Now we are designing document structure which uses this lookup data as reference.

Now we have two ways to put this data in couchbase

  1. use each lookup as one document (if we use this it will create 2 million document probably size will be 100 Bytes )
  2. Use one type of lookup data as in one document( It will create each lookup type per document probably size of one document will go 10-15 mb of each type)

Going with second approach It will reduce number document to 10-15.

Now main document which uses these lookups may read roughly 100 to 200 lookups each time.

Basically we are trying create a complete offline system of mobile where it can use complete data offline without connecting to server.

Now question is coming up how it may impact the performance at mobile side? How N1QL indexes could help us to reduce read?

If we use approach 1 -in order to potentially populate the data it may traverse 2 million docs and we use approach 2 it may just traverse those 10-15 docs, but it might keep those docs in memory.

Is there any guideline/ solution that could help us to make decision before making entire solution.

I don’t think approach 2 will work for you if you need to query those documents. Couchbase Lite 2 currently can’t index the contents of arrays, so any querying of the lookups in those big documents will have to be done by brute force, i.e. O(n).

(A lesser issue is that the replicator currently sends a modified document in its entirety (not just the fields that have changed), so large documents are expensive to replicate. But 10-15MB isn’t that bad if these docs don’t change that often.)

An intermediate approach would be to group the lookups by some criterion and put each group in one document. Then you could query on that criterion, which would be a quick indexed search, and do a linear scan for the right lookup within that document’s array. If you put, say, 100 lookups in each document you’d have 20,000 documents, which would probably give you reasonable performance.