I’m trying to delete roughly 50 million documents using N1QL. I’ll be doing that in multiple queries with batching. I tested with batch size of 10k and it finished in about 5 hours.
My question is if we delete so many documents with such queries in a short duration, how would it affect Couchbase’s re-indexing? Would it make Couchbase slow? If yes, what can I do to avoid that?
Thanks in advance!
PS: To summarize: will firing so many deletes in quick succession lead to any performance issue in Couchbase? Will it get slow to respond to other queries?
Our website uses Couchbase heavily and these deletes will be performed with our site up - so I’m worried if we will see any performance hit in Couchbase.
May I know which version of Couchbase Server are you using? Is it Community Edition or Enterprise Edition?
All sorts of operations - create, update and delete of the documents will lead to index updation. But with couchbase, index updation happens asynchronously, meaning - delete operation will return before it is reflected in the index. After you delete 50 million documents, it will take some time for those deletions to get reflected in the index. Having said that, if you are using enterprise edition, those deletes will get reflected in the index fairly fast. This is because enterprise edition uses much faster storage engine, as compared to community edition.
But there can be still some delay in deletions getting reflected in the index. To ensure that the scan doesn’t return stale data, you can use consistent scan.
In any case, it is good to batch the deletions (in smaller sets) so that consistent scans won’t get slow (especially if you are using Community Edition). I suggest you to wait for some time after processing one delete batch, and before triggering next delete batch - so that deletions won’t affect scan requests adversely.
Having said that, CE also works well with millions of documents being indexed. But EE gives better guarantees when it comes to consistent (sub 10ms) latency and consistent throughput.
We are using Enterprise Edition 6.0.2. I understand the delay of re-indexing and I’m fine with that.
My question is (perhaps I should have phrased it better): will firing so many deletes in quick succession lead to any performance issue in Couchbase? Will it get slow to respond to other queries?
Our website uses Couchbase heavily and these deletes will be performed with our site up - so I’m worried if we will see any performance hit in Couchbase.
The CPU available for indexer process will be shared between (1) the task that indexes incoming deletions (or creations/updations) and (2) the task that serves the scans.
A burst of incoming deletions will surely need some CPU for processing the deletes. But if there is enough CPU available for scans (i.e. without deletes your CPU is not saturated), there will be very less impact on scan latency/throughput.
I suggest that you can try out this experiment in some non-production setup and understand the impact it may have on scans. Also, if possible, perform deletes during off-peak hours.
By the way, what type of index are you using? Memory Optimised, or Standard GSI?
We have enough CPU capacity. Usually it stands under 40% in Production and our deletion process, when run on a non-prod env, used around 20% CPU. So both together in Prod will still be around 60%.
We are using Memory Optimized GSI.
So what I am understanding from your reply is that, (given enough CPU and MO-GSI) re-indexing will not hamper scan latency, beyond negligible. Is that the right conclusion?
Yes. We can conclude like that. Its good to know that the experiment on non-prod environment was successful. Also, having memory optimised index helps at it does not require any disk I/O for the index updation. This eliminates another dimension that can affect performance.
Having said that, I still suggest to be “extra cautious” and delete the documents in batches, so that we won’t have any surprises.
Just in case someone else is also deleting a huge number of documents and wondering about Indexing, fragmentation and CPU :- our deletion process completed successfully on our “Enterprise Edition 6.0.2” Couchbase which has MO-GSI index. Process took under 3 hours to delete 30mn records - we used N1QL to fire delete queries with a batch of 5k.