Dotnet bulk subdocument API

lukeb · September 10, 2019, 5:38am

Is there a way to update multiple documents at once using the subdocument API? I need to update a single field in a large number of documents, and it seems incredibly wasteful to

Call the subdoc API thousands of times
or
Send the entire document in a batch operation when all I’m updating is a single field.

Would it be better to run this as a query string instead?
UPDATE bucket SET doc.field USE KEYS […, …, …, …, …,]

or can multiple docs be updated at once using the subdoc api?

var queryResult = await buck.MutateIn<Annotation>(annot0)
    .MutateIn<Annotation>(annot1)
    .MutateIn<Annotation>(annot2)
    .MutateIn<Annotation>(annot3)
    <.....>
    .Upsert("s", "B", false)
    .ExecuteAsync();

graham.pople · September 10, 2019, 1:23pm

Hey @lukeb

There’s no way to update multiple documents at once with Sub-Document, no. Bear in mind that a query, under the hood, will still ultimately be doing a KV update on all of those documents, so likely won’t really gain you anything over just doing the multiple Sub-Document calls.

Couchbase is a high performance key-value store designed to operate at scale, and personally, given reasonable hardware, I wouldn’t be at all concerned about a few thousand calls.

lukeb · September 11, 2019, 1:10am

I’m more concerned with the thousand+ round trips to the database to send the requests. I didn’t realise couchbase was internally sending them one-by-one even with the bulk operations. It seems like to minimize the round trips, wrapping it up into a query string is the best way to go?

As a side note, is there a limit to the size of a query string that I can send through the API? I can’t find any information on this. My USE KEYS […, …, …, …,] could potentially be really large, is there a risk of it failing at some point?

ingenthr · September 11, 2019, 4:37am

Whether a single node or many nodes (in which case, of course, you’d have to break up the requests) the Sub-Document operations are always pipelined for efficiency. You can avoid multiple roundtrips by following some of the techniques that maximize the pipelining in the batching operations section of the docs.

And yes, there is a limit to the query size. That approach would also be less efficient, since the entire statement would need to be parsed, then there is distribution among the nodes (again, assuming a multi-node cluster). I don’t remember the max size off the top of my head, but @Marco_Greco would.

lukeb · September 11, 2019, 5:40am

Thanks for the link to those docs, for some reason that didn’t come up in any of my searches. It’s not 100% clear to me how to take advantage of batching though. This statement:
“When using an SDK in an asynchronous (non-blocking) model, all requests are inherently batched.” suggests that all i need to do is make sure I’m using CompleteAsync() on the subdoc API and batching will be handled internally?

but then why do we need to manually batch as demonstrated here?:
https://docs.couchbase.com/dotnet-sdk/2.7/batching-operations.html#batching-operations-using-sdk

Marco_Greco · September 11, 2019, 9:22am

Maximum query size is 64Mb - but you also have to take into account that after you parse a USE KEYS clause that large, you then have a huge parse tree and even bigger plan.
The execution layer would then have an evaluated array of keys of probably 8 million entries, so if you go that route and use the max statement size, you could actually blow up memory.

Topic		Replies	Views
Bulk with sub-document API Java SDK	4	2645	August 3, 2022
Update multiple subdocuments in an entity SQL++ dot-net	7	2274	June 11, 2017
Bulk subdocument Update API Go SDK	2	972	August 8, 2022
Bulk update of all documents in a bucket SQL++	2	5505	October 5, 2016
Use last revision for each document on a bulk update Sync Gateway	5	1068	April 11, 2018

Dotnet bulk subdocument API

Related topics