I am trying to create indexes on several large collections to improve some queries’ performance. One collection has 84.5m items. However when I tried to create a new index, it showed 300m+ mutations were remaining. I am not sure if it is due to multiple modifications to a documents were not compacted in the DCP queue, or it is due to something else.
I would like to check if there is a way to decrease the number of mutations. It is for performance testing and there is almost no changes to documents after bulk loading.
Building indexes results in mutations to indexes - not to documents. The number of mutations depends not only the number of documents, but also on the indexes.
Thanks @mreiche. I also believe building index will not cause new mutations to the documents in the collection. However the number of remaining mutations in building an index would be related to the number of documents, and number of mutations of the documents in the keyspace/collection.
E.g., if I insert 1m documents to the collection, it will push 1m mutation events in the DCP queue for the indexes related to the collection. If I run an update command which updates all 1m documents, it will push another 1m mutation events into the DCP queue. However DCP will merge the 2m mutation events and eventually keep 1m mutation events only.
What I actually observed was a bit different. I had 85m documents in the collection. When I created an index, the index building progress message showed me 500m+ mutations events remaining. And when I create more indexes, the building time is getting longer and longer.
Please correct me if I am wrong. And it will be more appreciated if you can help give me some advice to reduce the index building time, at least make it consistent for all indexes tight to be one collection.
Now when I try to create a new index, the remaining mutations is climbing to 670m. I remember initially it was around 85m, a little over number of documents in the collection.
I’m not an index expert, but since the index usage doesn’t depend on the number of documents, why not do your iterative testing with a small number of documents? That’s the best advice I have for reducing the index building time.
I’m a bit confused with :
“Now when I try to create a new index, the remaining mutations is climbing to 670m. I remember initially it was around 85m, a little over number of documents in the collection.”
It was 300m according to your first post. Almost 4x the number of documents.
“One collection has 84.5m items. However when I tried to create a new index, it showed 300m+ mutations were remaining.”
“Now when I try to create a new index, the remaining mutations is climbing to 670m”
It appears that the number of mutations depends on the index definition. I wonder if that 670m includes deleting the original index?
Couchbase server identifies each update performed on the data with ever increasing sequence numbers. These sequence numbers are shared across multiple collections of the same bucket. Index service uses these sequence numbers to calculate “remaining mutations”. For example, if the high seq no at data service is 1000, and if the index is updated upto seq no 300, then index service will show 700 remaining mutations. But not all 700 mutations need to be transferred across services. Only those mutations will be transferred to index service which belong to the collection on which index is being built.