Hi everyone, after a hard shutdown of the VM (power loss) running Couchbase server, the queries on Couchbase cluster (single-node) do not run as the Indexer service continuously loops between “Warming Up” and “Ready” state.
It may be a reason of indexes having some problems in that situation.
Question: apart from enhancing the reliability of the VMs so they do not get turned off abruptly, and eventually adding other nodes, what other approach do you recommend for enhancing the reliability of indexes? I would expect Couchbase to be reliable in these kind of scenarios, although rare, hence perhaps you have recommendation on best configuration to use.
It is not expected of the indexer process to loop between “Warmup” and “Ready” state. Usually, it comes back to “Ready” state and stays there unless there is a crash. Can you kindly share the indexer log so that we can try to root cause the issue.
You can find the indexer logs at /opt/couchbase/var/lib/couchbase/logs/
Having index replicas would ease your problem. For a query, If indexes on one node are not available, then it will be picked up from the other available nodes. Please refer to this blog article: Couchbase Index Replicas | Drop Indexes | Couchbase for more details on index replicas.
On couchbase community edition, index replicas are not supported. You can create equivalent indexes on other index nodes to have better index availability.
'd:\Couchbase\var\lib\couchbase\data\@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.188', errno = 32: 'The process cannot access the file because it is being used by another process.'2020-05-13T10:47:05.722-07:00 [ERRO][FDB] Successfully used partially compacted file 'd:\Couchbase\var\lib\couchbase\data\@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.189' for recovery replacing old file d:\Couchbase\var\lib\couchbase\data\@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.188.
Observations:
System has 16 GB of RAM and sufficient disk space
Data service 6 GB
Index service 2 GB
Full-text service 4 GB
2 GB for Windows
No sign of performance degradation other than the error discussed here
5 buckets were configured
buckets are usually not very large (typically < 10.000 documents), only 1 bucket has 700.000 documents * It has 1 large GSI secondary index of 700 MB and many others of 20 MB each
GSI indexing is set to “circular write”, compaction set every day 00:00 (default)
We are now currently looking into dropping the large secondary index, but not sure whether this could be the problem.
We are also considering increasing the memory for Indexer service, but we could not find any sizing recommendations for Index service (only for data service is well explained here: Sizing Guidelines | Couchbase Docs)
Do you have any recommendation? And what could be causing the error reported, from your experience?
The log message Successfully used partially compacted file ‘d:\Couchbase\var\lib\couchbase\data@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.189’ for recovery replacing old file d:\Couchbase\var\lib\couchbase\data@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.188.
indicates there is a successfully compacted file that was created and before switching over to that file there was an issue and recovery had to be done. When recovery opened the last forestdb file it found the successfully compacted file and switched over to it. So, the compacted file will be used as the current file. But it looks like the open is getting called again and again and I am not sure why that is happening.
I think we need the complete indexer logs to analyze the sequence of events to debug this issue further.
Also, if you have multiple nodes in the cluster, you can try to failover this node and rebalance-in again into the cluster. This would get the indexer out of this situation. Note that the existing indexes on this node would be dropped when failover and rebalance-in happens. So, you will have to create the indexes again.
@varun.velamuri thank you. Is it possible to send you the log files privately instead of attaching them?
Regarding your note: we do have a single node per cluster.
@sduvuru do you think this problem may be somehow linked to this commit? (coincidentally, by you! )
As this commit is from August 2019, for sure it’s not included in CB Community 5.1.1, right? In which version would it be included?
You can send a mail to varun [dot] velamuri [at] couchbase [dot] com, attaching the log files . If the file size is too large, please let me know. I will create.a google drive link and share it with you.