Query to Delete old documents which will help to resolve DiskSpace Issues

Hi,

Our Couchbase server has 3 nodes with Data & Index node defined on the same, we are trying to get 2 nodes more to make them Index nodes as suggested by everyone. Now we have disk space issues reaching 97% (1.2B docs) & above and want to delete old documents so that we have space for new one. Can you suggest different types to do that and its dependencies. I am new to Couchbase and detailed information would be grateful with explanations. Thank you.

You can use DELETE statement
You can also use SDKs to set expiration on the document which will automatically deletes

Thanks for your reply, I am using the Delete statement but able to delete only 10k records at a time as it’s effecting the performance if I try more records. I am looking for a stable solution where I can delete all records older than 6 months. Appreciate your help.

Use SDK expiration option

I am using Couchbase 4.6 version. Can you help where I can setup the SDK expiration.

https://docs.couchbase.com/server/4.1/developer-guide/expiry.html

What @vsr1 is referring to is when you insert() or upsert() a document through the SDK, there is an expiration field. It won’t help with documents that are already in there, but it will help in the future if you want documents to automatically be deleted 6 months later, to set the expiration when inserting/upserting.

If you don’t have a very large number of documents, what I might suggest for your current state is to put together a simple query that looks for the meta.id() alone and then visits them to remove them. This would use a primary index, if you have one, so it’ll be efficient. You can do this with a single query in batches and then do the deletion from the SDK’s API on visitation. With virtual xattrs, there may even be a way to understand when the document was added.

This should be very efficient for the system as you wouldn’t be asking the query service to retrieve, unmarshal each doc.

What SDK are you using? I can’t guarantee it, but I might be able to point to some sample code.

1 Like

Addition to @ingenthr suggestion.

Assume timestamp is field that specifies time it created

CREATE INDEX ix1 ON bucket(timestamp);
DELETE FROM bucket WHERE timestamp < “2018-01-01” ;

You can even do batches and different ranges

Thank you and can I have a sample code please. Appreciate that.

For which language/platform?