Index Search consume much storage

Hi,

I’m using couchbase 6.0.1
I have 1 bucket contains 100 mio data.
I try to create search index with filter type_ = ‘Customer’ ( this should be around 20 mio data)

My total bucket size is around 150Gb,
My search node storage is 200 Gb.
When I try to check my search node storage, it is out of space ( consume more than 197Gb)

My question:
Why is it consume all of my storage?
If I filter the type , it should be store only the filtered one, right? but I check the total doc count is 100 mio


*FYI my index replica is 0, but when I check inside @fts folder, it genereates 6 folders @± 35Gb

@Han_Chris1 ,

What is the index type you are using in the index definition? Is this “scorch”, if not update it to “scorch”.
(this would result in index rebuild but a reduced index size)

Total doc count indicates the number of documents processed so far and not the actual items in the index. The stat label is rectified in the latest releases.

Our upcoming release would give you a much better/smaller index size. You could try the beta software version here. NoSQL Database Download | Couchbase 30-Day Free Trial

Cheers!

Hi @sreeks,

Yes, Im using default version 6.0 (Scorch)

or maybe my config is wrong?
How many folders inside @fts folder should be generated for 1 index?

The six folders correspond to the default 6 index partitions.

Is your data set is of update heavy or deletes heavy workload?

Can you please try recreating the index with the store properties mentioned here over a curl command?

Hi @sreeks ,

Thanks for your quick response,

Yes, actually I’ve reloaded the data multiple times bcs part of the testing.
Do you mean when I reload with same doc id, it will consume more space in disk?
I’ve tried to recreate the index with additional properties :

“numSnapshotsToKeep”:1,
“scorchMergePlanOptions”: {
“maxSegmentSize” : 100000}

But is this means giving the limitation for the size so there will be a possibility of missing data?

hey @Han_Chris1 ,

Yes, updates and deletes would have an impact on the index size as this we use append-only storage. Reclaiming of the obsolete data happens during the background compaction cycles. This might happen concurrently at a slower pace in the background. You may find more context here at point 5 - Full-Text Search Service Production Systems: 7 Useful Tips

The recommendations would only restrict some storage property like maximum segment size and won’t result in any data corruption.

1 Like

Thanks @sreeks, my problem is solved !