Deleting 100k documents takes ~30 seconds

Hey, I am currently testing how long does it take to delete ~100K documents. Documents are indexed. It takes around 30 seconds. I can see in the query plan that ‘SendDelete’ takes actually 98% of the query time. Is there any way to speed it up? I am currently testing it directly on couchbase server, but I am planning to expose rest api for this operation using spring data or java sdk.

I am using this query:

delete

FROM `bucket`.`scope`.`collection`

where field1 = 'field1Value' and _class="javaClass"

Index is created obviously on field1 and _class

What’s your server version?

It is 7.6.2 and 7.2.4

OK, 7.6 certainly has parallel mutations which ought to speed things up. I presume the profile/explain is showing simply the index scan feeding a sequence of a filter and SendDelete ? (i.e. no fetch; shouldn’t have if you’re covered in your index and are not using a RETURNING clause)

Some of the performance will of course be contingent on the speed of the data service - so how much memory your collection can use, how much is cached etc. - and of course the size of the data.

Locally on a fairly small laptop (8 GiB RAM) I can delete 100k documents filtering on two indexed fields in < 5s; each document being ~ 120 bytes. Most of the time - as expected - is in “servTime” for SendDelete: the time it takes the data service to effect the deletions.

Since you’re testing, can you compare fetching the 100k keys (select meta().id ....) and deleting directly from the data service (using the relevant SDK API) perhaps ?

1 Like

Yes, explain is showing only index scan, filter and SendDelete (which is taking 98% time).

Data is fairly small - each document weights about 254 bytes.

Locally on docker couchbase it takes 10 seconds to delete the data.

The problem is on capella server 7.6.4 - deletion takes 30 seconds. For fetching 100k capella displays that it took 565 ms, but data is displayed in truncated json after around 6 seconds.

I presume your Capella cluster is 7.2.4 - since we’ve not released 7.6.4 yet. :slight_smile:

7.2.4 doesn’t have the parallelism in the SendDelete. It will be slower than 7.6 as a result. You will possibly be better off fetching the 100k keys into your client and deleting in parallel (probably up to about 4 streams). [edit: Using the direct interface to remove the documents, not issuing further SQL.]

(I also of course know nothing of the rest of the cluster configuration nor concurrent load.)

I set-up a quick Basic single-node 7.6.3 Capella cluster to test (4CPU, 16 GiB). Inserted 100k documents of about 249 bytes each, indexed two fields used in the delete. The stats reported by the Capella UI (Query editor) were:

LAST RUN  RTT       MUTATIONS  ELAPSED  EXECUTION
just now  19567 ms  100000     13.8s    13.8s

A subsequent run was 13s. Plan was as expected with SendDelete taking most of the time (12.928s).

Nearly identically a fetch of meta().id in the UI is truncated at ~ 5.7s with the clear note “Dataset too large, JSON truncated. To see all data,view as table or download.”; execution taking ~ 0.56s. If you download the results, they’re all there.

FWIW, for me, using SELECT RAW meta().id... to reduce the result size allows it all to be retrieved completely directly in the UI in ~ 5.6s with an execution time of < 0.5s (of course this is key length dependent).

Sorry, I meant 7.6.2

Is there any way to make delete like this to be used in a DELETE rest endpoint without holding connection for 30 seconds?

Every once in a while I do test N1QL to see if it has catch up and my test are always the same, in the best case scenarios they are 10X slower than views, up to 100X slower.
There are many people reporting this but they still want to phase out views.
If you want bare metal speed create a view and delete them by id.

would that actually help? the problem is not with fetching what to delete. the actual deletion takes a lot of time

The problem is not the fetch, the problem is N1QL is slow even for the most basic operations. Try it, I am not sure, but could be the reason.
And judging by DH answer I think I am right :“You will possibly be better off fetching the 100k keys into your client and deleting in parallel (probably up to about 4 streams)”

This comment specifically relates to 7.2 vs 7.6 and the addition of parallel mutations driven by the Query service. Nothing more.

Then you’ll have the parallel mutations. I can only guess the set-up or load of your Capella instance differs - my newly created AWS (local to me region) cluster doesn’t take 30 seconds. Please engage Capella support to see if there is anything that can be done.

Did not help :frowning: it takes even more time

Sorry, what did you attempt that didn’t help ?

Have you tested deleting 100k documents via the SDK (without involving SQL++) to get a relative evaluation of what the environment permits? You might try something like:

UPSERT INTO your_collection (key k, value v)
SELECT to_string(x) k
      ,{"a":1, "b":2, "c":array_range(repeat("a",20),100)} v
FROM array_range(0,100000) x
;

(With an index on (a,b) to simulate your two-field index.)

to populate a test collection with a set-of data with predictable keys you can easily generate without needing to fetch, just for the purposes of such a test.

(SQL++/Query service is intentionally designed to share resources; it can’t be responsible for pushing other services to/past limits for a single request. You can choose with your application to be “less well behaved” and dominate available processing with a single application task.)

Most importantly though, have you engaged Capella support since even via SQL++, your cluster doesn’t seem to be performing as well as it might. (i.e. as compared to my quick test yesterday.)

Can you break up the query in to ten different queries each deleting 10K and exeucte the 10 queries all at once so instead of a single query delete 100K each query delete 10K.

Delete
FROM `bucket`.`scope`.`collection`
where `field1` = 'field1Value' and `_class` = "javaClass" ORDER BY `field1` LIMIT 10000 OFFSET 0; 

Just change the offset value by 10,000

That will still take the same amount of time.

The trouble with a paginated deletion as you suggest is that you won’t be able to delete all 100k documents by changing the OFFSET. After the first execution there are only 90k documents left (starting with offset 0 of the remaining set). So at offset 50k, you’ll be beyond the remaining 50k documents and nothing will be deleted (and the 50k documents would remain). If you did want to batch deletes, just use the 10k limit.

As @mreiche noted, serially this’ll take the same time - actually assuming no issues beyond the Query service, slightly longer.

To run multiple statements like these in parallel you’d end up with many trying to delete the same document keys (they all get to read mostly the same keys from the index before the first deletions are effected) so will probably take longer since you’d have to run many more than say just 10 (of 10k each).

If you did try to have each have a different offset you may or may not get to delete all 100k still, depending on the timing of the parallel executions fetching the necessary keys vs the first deletions being effected.

If you can add a filter to break up the set (rather than pagination/limits) this would likely be best for parallel execution via SQL (e.g. date ranges). Otherwise you really need to retrieve the entire set of keys and sub-divide it, with each statement having a USE KEYS clauses for the set of keys it is responsible for deleting.

Or you do just the same but use the SDK to perform the deletion directly rather than constructing a parameter to pass to the Query service as an extra hop. (Better to pass a long key list as a parameter than literal SQL text; less parsing.)