Create index is slow on the huge no. of documents

webber · May 18, 2016, 7:06pm

I am creating two index on a Couchbase cluster.

One is the primary index, the other is a secondary. I am creating these indexes on approximately 30 billions documents. The secondary index is for 3 elements.

I started to create them 7 hours ago. However the current progresses are 35 and 39 percent.
Is it usually take a time to create indexes on such huge data or is something is wrong on my environment ?
When do you think that the creation will finish ?

The size of the cluster is 4 nodes (16 cores for each) and the total index RAM quota is 10GB. 2 nodes are index server.

The index Settings is as follows:

Indexer Threads: 8
In Memory Snapshot Interval: 200 ms
Stable Snapshot Interval: 5000 ms
Max Rollback Points: 5
Indexer Log Level: info

Thanks

cihangirb · May 18, 2016, 8:49pm

The index build times can be high for a few reasons;

retrieval of the information from data service is slow.
index nodes can’t save the index to disk fast enough

There are a few options;

use defer_build option to build both indexes together. defer build will ensure you scan once and build both indexes.
you could also partition your indexes and get more nodes to parallelize your index build. for partitioning you can specify a filter (WHERE clause in CREATE INDEX). However I should note that there may be some queries that may not be able to take advantage of range scans in the index that is partitioned.
Last, We have another option in 4.5 called memory optimized indexes that can build the index much faster in memory - however given the count of the docs, I don’t think you will be able to fit your index into memory.

What is the document key size and index key size? just curious.
thanks
-cihan

webber · May 19, 2016, 3:16pm

Hi cihangirb,

Thank you for your reply. The index key size is 45 bytes.
I am using 4 nodes for the cluster and each node is 16 cores and SSD storage on AWS.
I don’t think retrieval or save is slow, but what do you think ?

Thanks

eldorado · December 25, 2019, 10:36pm

@webber - Have you ever found a solution for this problem ?

varun.velamuri · December 26, 2019, 7:26am

@eldorado,

Just for your information, the underlying storage engine probably used when @webber tried this use case was ForestDB (Considering that the time of initial post is May’16). The current underlying storage engine being used is Plasma which is very different and better performant when compared to ForestDB.

Thanks,
Varun

eldorado · December 26, 2019, 8:48am

@varun.velamuri - Sure … I know plasma is better bet than ForestDB but was looking for information on what was his choice if he ever resolved the issue. Lot of cases I see dangling closure of threads with no solution . So would be really helpful to close case with resolutions . but thanks for pointing out .

Topic		Replies	Views
Why is index creation so slow? Couchbase Server index	4	332	April 9, 2024
Slow indexing speed (primary and GSI) Couchbase Server	5	1143	March 19, 2021
Index initial build times taking forever since cluster rebalance Couchbase Server	3	1451	August 11, 2017
Slow Index Create (Any suggestion please) Couchbase Server index	5	117	November 6, 2024
Creating the SGI faster Couchbase Server	78	4281	January 27, 2020

Create index is slow on the huge no. of documents

Related topics