I am trying to update 6 million+ documents in a community edition 3.0.1 server cluster. I am using java sdk and tried various ways in which I could read a batch of documents from a View, update them and replace them back to the bucket.
It seems to me that as the process progresses the throughput gets too slow that its not even 300 op/s. I tried using many ways to do this using bulk operation method (using Observable) to speed it up but in vain. I even let the process run for hours only to see Timeout exception later.
The last option I tried was to read all the document IDs into a temp file from the View so that I can read the file back and update the records. But, after 3 hrs and only 1.7m IDs read from the View, the DB gives Timeout exception.
Note that the the couchbase cluster contains 3 servers with 8 cores, 24GB RAM & 1TB SSD each and the java code running to update data is in the same network. And there is no other load running on this cluster.
It seems, reading even all the IDs from the view of the server is impossible. I checked the network throughput and the DB server was giving the data barely at 1mbps.
Below is the sample code used to read all the doc IDs from the view:
final Bucket statsBucket = db.getStatsBucket();
int skipCount = 0;
int limitCount = 10000;
System.out.println("reading stats ids ...");
try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
while (true)
{
ViewResult result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").skip(skipCount).limit(limitCount).stale(Stale.TRUE));
Iterator<ViewRow> rows = result.iterator();
if (!rows.hasNext())
{
break;
}
while (rows.hasNext())
{
out.writeUTF(rows.next().id());
}
skipCount += limitCount;
System.out.println(skipCount);
}
}
Is there a way to do this?