Community Docker 7.2.2 (ARM)
I have a collection of ~100M documents, 86 GiB on disk (as shown on the Couchbase web dashboard). Each document includes a receive timestamp t_r
(JSON number, fractional Unix timestmap like 1698253421.123456) and a REST path as a JSON string, indicating where the document was downloaded from. I’ve created indices for both these fields. Surprisingly, the index for the timestamp has ballooned to 900 GB, even though the reported “data size” is just 15 GiB.
98 % fragmentation??
The indexer is currently rebuilding the index following what appeared to be a ForestDB database corruption (I had to remove the corrupted file manually as the indexer wouldn’t listen to the drop command; after that, while the indexer was trying to recreate the removed index, I issued the drop command and then create index command). According to the web UI, it’s about 61% done.
Question:
Is it normal for the Couchbase indexer to experience such significant data size bloat when rebuilding an index from the ground up?
I might be cutting it too close with resources. My AWS t4g instance has only 8 GB of RAM, and after noticing some overspending by Couchbase services, I limited the indexer to just 1 GB (with 1 GB for Data and 256 MB for Search). Before I set these low limits, the indexer process seemed to get killed a few times.