I have a cluster of few couchbase servers. One of the servers had reported a time drift. The time drift was fixed but after a few hours the server become unresponsive. After a service restart when we are trying to rebalance we are getting this error:
Rebalance exited with reason {{badmatch,failed},
[{ns_rebalancer,rebalance_body,7,
[{file,"src/ns_rebalancer.erl"},
{line,500}]},
{async,'-async_init/4-fun-1-',3,
[{file,"src/async.erl"},{line,199}]}]}.
Rebalance Operation Id = 9141d98b8c73600ffbe28c80cb4e20a4
on the server in the indexers.log file we are seeing these messages:
We don’t have a support contract. This setup is a test setup for a potential customer. but the customer is very worried that Couchbase is not stable and can’t recover by itself dispite the fact that I told them the opposite. And actually this is the first time in 10 years when a cluster is not recovering. @mreiche can you help debug this?
seeing these logs on one of the server indexer.log file
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 994
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 366
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 528
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 996
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 833
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 943
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 967
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 541
2024-12-01T15:06:27.465+00:00 [Info] StreamState::connection error - set repair state to RESTART_VB for MAINT_STREAM keyspaceId bucket vb 464
2024-12-01T15:06:27.465+00:00 [Info] Timekeeper::handleStreamConnErr Stream repair is already in progress for stream: MAINT_STREAM, keyspaceId: bucket
You could move the logs that are there, then start the cluster, do whatever operation demonstrates the problem, then upload the logs. And provide where they were uploaded to.