Hello,
We are considering of using Couchbase database. I have question about how we could achieve following use-case using this storage engine:
- we would like to store many (millions) documents per company
- each document has external unique identifier
- there is dynamic list of companies - companies can be added but also removed
- as list of companies can change dynamically we would prefer to have one bucket for all those documents and distinguish somehow to which company particular document belongs
- documents are retrieved using unique identifier - main use case
- any company can ask to export all its document - very rare operation
- in case of company removal we should delete all documents that belong to given company - very rare operation
From above list of requirements we see that using single bucket where document is referred by its unique identifier would handle main use case by just using Key Value Couchbase operations.
Anyway for mentioned two very rare operations: company documents export and company documents removal we wouldn’t be probably able to achieve functionality without adding some indexes/views etc. am I right?
It leads us to two questions:
-
which indexing mechanism use: Global Secondary Index or View?
- Views couldn’t be queried using N1QL as far as I understand
- Primary index of N1QL will allow you to query any field of your JSON - as we would like to query only one property it looks like too wide and too expensive
- GSI - looks like option which would support mentioned requirements and provide lower overhead in compare with above features
- we could even consider creating ad-hoc partial SGI index dedicated to tenant just for the time of export/removal
- Is GSI still doesn’t self replicate among nodes? I’m referring to Differences between primary and secondary Indexes? - #17 by cihangirb “we don’t support replication/replicas with GSI but you can manually do this. The way to do this would be to create the 2 identical indexes on 2 separate nodes (a.k.a mirror the index). If one node fails and takes out one copy of the index, all traffic will hit the remaining available index.”
-
where store information about company name?
- We could store it as part of document identifier
doc-${COMPANY_NAME}-${DOCUMENT_ID}: { "someDocumentKey": "someDocValue" }
it would allow us to query with N1QL
LIKE
statement (Indexing Metadata Information | Couchbase Docs)// get all company document IDs SELECT META().id FROM bucketName WHERE META().id LIKE "%COMPANY_NAME%"
- alternatively we can put info about company name in custom field
doc-${DOCUMENT_ID}: { "companyName": "XYZ", "someDocumentKey": "someDocValue" }
and then query against this field
Is there any performance differentiator in case when we will query against part of document identifier instead of custom field in document?
Would both cases should result in the similar overhead in terms of additional storage/rebalance times/speed of solution?
Could you guide us which Indexing mechanism would be better and how we should approach storing info about company name to support such use case? Or maybe you see different solution that we could try?
Thanks in advance for any insights and info !