In our test environment we keep seeing index mutations becoming stuck, particularly on one particular bucket (ticket_bucket) but as you can see it’s also starting to occur on other buckets:
This is a quiet, low volume system so it’s not a performance issue. It’s the second time we’ve seen it (the first time we saw it, we blew the environment away and started again). Any help will be greatly appreciated.
I deleted the ticket bucket primary index and tried to recreate but got this:
“GSI CreatePrimaryIndex() - cause: Create index or Alter replica cannot proceed due to rebalance in progress, another concurrent create index request, network partition, node failover, indexer failure, or presence of duplicate index name.”
@pc , thanks for sharing the logs. It looks like one of the projectors(the process which forwards the mutations from data service to index service) is stuck and not responding. You can kill the projector process to unblock. Also, if you can share the projector log file, we can check what the issue is with the projector.
2021-06-04T10:25:41.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m33.258388772s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket location_bucket
2021-06-04T10:25:41.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m32.640886572s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket prediction_bucket
2021-06-04T10:25:42.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m33.047651308s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket weather_bucket
If you can capture both the projector and indexer log file when you see the pending mutations on the UI, then we can corelate and try to figure out the problem.
It appears that you have ended up in the bug Loading.... From the projector logs, control channel is full. This can happen when there are more than 10 buckets in the cluster
2021-06-17T08:58:25.310+00:00 [Warn] FEED[<=>MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2(127.0.0.1:8091)] ##15 control channel has 10000 messages
This issue has been fixed in 6.6.0 and later versions. As a work-around, please change the setting “projector.backChanSize” to 50000. E.g.,
curl -u : http://<indexer_ip>:9102/settings -X POST -d ‘{“projector.backChanSize”:50000}’
After the setting change, please restart projector. After restart, you should see the below message in projector logs
settings projector.backChanSize will updated to 50000