Unable to update 6m+ documents on community edition 3.0.1

shreyas · June 29, 2015, 6:41am

I am trying to update 6 million+ documents in a community edition 3.0.1 server cluster. I am using java sdk and tried various ways in which I could read a batch of documents from a View, update them and replace them back to the bucket.

It seems to me that as the process progresses the throughput gets too slow that its not even 300 op/s. I tried using many ways to do this using bulk operation method (using Observable) to speed it up but in vain. I even let the process run for hours only to see Timeout exception later.

The last option I tried was to read all the document IDs into a temp file from the View so that I can read the file back and update the records. But, after 3 hrs and only 1.7m IDs read from the View, the DB gives Timeout exception.

Note that the the couchbase cluster contains 3 servers with 8 cores, 24GB RAM & 1TB SSD each and the java code running to update data is in the same network. And there is no other load running on this cluster.

It seems, reading even all the IDs from the view of the server is impossible. I checked the network throughput and the DB server was giving the data barely at 1mbps.

Below is the sample code used to read all the doc IDs from the view:

final Bucket statsBucket = db.getStatsBucket();
int skipCount = 0;
int limitCount = 10000;

System.out.println("reading stats ids ...");

try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
	while (true)
	{
		ViewResult result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").skip(skipCount).limit(limitCount).stale(Stale.TRUE));

		Iterator<ViewRow> rows = result.iterator();

		if (!rows.hasNext())
		{
			break;
		}

		while (rows.hasNext())
		{
			out.writeUTF(rows.next().id());
		}

		skipCount += limitCount;
		System.out.println(skipCount);
	}
}

Is there a way to do this?

shreyas · June 30, 2015, 7:16am

@anil, @ingenthr, can anyone check if this is a known issue?

czajkowski · June 30, 2015, 2:10pm

hey @daschl can you advise please

shreyas · July 2, 2015, 7:09am

I found the solution. The ViewQuery.skip() method is not really skipping and should not be used for pagination. The skip() method will just read all the data from beginning of the view and only start giving output after the number of records are read, just like a linked list.

Solution is to use startKey() and startKeyDocId(). The ID that goes into these methods is the last item’s ID you had read. Got this solution from here: http://tugdualgrall.blogspot.in/2013/10/pagination-with-couchbase.html

So the final code to read all items in a view is:

final Bucket statsBucket = db.getStatsBucket();
int limitCount = 10000;
int skipCount = 0;

System.out.println("reading stats ids ...");

try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
    String lastKeyDocId = null;

    while (true)
    {
        ViewResult result;

        if (lastKeyDocId == null)
        {
            result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").limit(limitCount).stale(Stale.FALSE));
        }
        else
        {
            result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").limit(limitCount).stale(Stale.TRUE).startKey(lastKeyDocId).skip(1));
        }

        Iterator<ViewRow> rows = result.iterator();

        if (!rows.hasNext())
        {
            break;
        }

        while (rows.hasNext())
        {
            lastKeyDocId = rows.next().id();
            out.writeUTF(lastKeyDocId);
        }

        skipCount += limitCount;
        System.out.println(skipCount);
    }
}

simonbasle · July 3, 2015, 9:10am

ah, I should have seen that! glad that you found a solution

daschl · July 3, 2015, 9:27am

@shreyas for reference you can also take a look at the paginator we have in the 1.x series: couchbase-java-client/src/main/java/com/couchbase/client/protocol/views/Paginator.java at release14 · couchbase/couchbase-java-client · GitHub

Topic		Replies	Views
Bulkget for 8000+ records with document id Java SDK	1	1681	January 27, 2017
Cannot access view while adding new documents to Couchbase node Couchbase Server	3	1006	February 8, 2019
TimeoutException on bucket.query Java SDK	13	6171	September 23, 2021
Getting java.util.concurrent.TimeoutException when inserting more than 100,000 documents Couchbase Server java , sdk	5	12940	October 3, 2020
Time out Exception Java SDK	7	4069	November 23, 2015

Unable to update 6m+ documents on community edition 3.0.1

Related topics