What is the best way to delete documents as high as 5 million using Java SDK ? Is simply running a N1QL to delete all 5 million records in one go a good idea ? How can batching be achieved using Java sdk ? Also need to determine the doc ids which got successfully deleted and the ones which didnt.
If you delete that many documents at once using N1QL you may encounter query timeouts. You also won’t get information about what was deleted or what could not be deleted.
I recommend using the asynchronous bucket API to delete by document key (AsyncBucket.remove(String id)). Create an RxJava Observable that emits the document keys you want to delete and use flatMap to emit the Observables that will delete each document.
This approach will allow you to
manage the number of requests in flight,
selectively retry failures and
track your progress.
To control the number of requests in flight you’ll need to use the backpressure API in RxJava. One way is described here: Writing Resilient Reactive Applications (scroll to “Bulk Pattern, BackpressureException and Reactive Pull Backpressure”).
You can also limit in-flight delete requests by limiting the number of subscribers to the flatMap operator performing the deletions. Choose an Observable.flatMap() method that takes a maxConcurrent parameter. See this one.