Data loss problem (4.0.0-4051)

stefan.k · March 28, 2017, 7:48pm

We are running a 3 node cluster of couchbase community edition 4.0.0-4051

Since today some documents seem to have disappeared. Examining the cluster nodes I found that there is a problem with one node.

the memcached.log rotates like crazy showing thousands of entries like:

2017-03-28T21:32:37.612737+02:00 WARNING (BUCKET) Fatal error in persisting SET ``DOC-ID-1'' on vb 6!!! Requeue it...
2017-03-28T21:32:37.612804+02:00 WARNING (BUCKET) Fatal error in persisting SET ``DOC-ID-134'' on vb 6!!! Requeue it...
2017-03-28T21:32:37.612872+02:00 WARNING (BUCKET) Fatal error in persisting SET ``DOC-ID-14'' on vb 6!!! Requeue it...
2017-03-28T21:32:37.612957+02:00 WARNING (BUCKET) Fatal error in persisting SET ``DOC-ID-341'' on vb 6!!! Requeue it...
2017-03-28T21:32:37.613035+02:00 WARNING (BUCKET) Fatal error in persisting SET ``DOC-ID-94'' on vb 6!!! Requeue it...

Trying to run cbtransfer results in the following error:

error: could not read couch store file: /data/couchbase/data/BUCKET/6.couch.14; exception: malformed data in file

This particular file is JSON document:

{
   "ep_max_checkpoints" : "2",
   "ep_tap_queue_fill" : "0",
   "ep_flushall_enabled" : "1",
   "ep_tap_backlog_limit" : "5000",
   "mem_used" : "8752200",
   "ep_tap_queue_backfillremaining" : "0",
   "ep_chk_persistence_timeout" : "10",
   "vb_pending_queue_size" : "0",
   "vb_pending_ops_create" : "0",
   "ep_dcp_count" : "0",
   "ep_alog_sleep_time" : "1440",
...
   "ep_item_eviction_policy" : "value_only",
   "ep_vb_total" : "0",
   "ep_total_new_items" : "0",
   "vb_replica_meta_data_memory" : "0",
   "vb_replica_ops_create" : "0",
   "ep_tap_bg_fetched" : "0",
   "vb_replica_queue_fill" : "0",
   "ep_diskqueue_fill" : "0",
   "ep_max_num_workers" : "3"
}

All other files with this naming scheme are binary data files. The file’s date corresponds to a reboot of the cluster.

What is going on here? Is there a way to fix this problem?

Thanks in advance!
Stefan

drigby · March 29, 2017, 11:07am

You’ve encountered some form of filesystem or hardware-related error. I suggest you check your OS logs to see if there’s any reported issues. An fsck or similar of the filesystem is also recommended.

Having said all that, if one or more of your X.couch.Y files is showing as JSON then it’s pretty badly corrupted (they should be couchstore files) and you’ll likely need to recover from your last backup.

Topic		Replies	Views
4.1.X: all nodes are 100% load / + strange memcached log Couchbase Server	26	4511	July 3, 2016
Data loss problem after server maintenance in v 3.0.1 Couchbase Server	0	780	January 4, 2018
Unknown error for documents Couchbase Server	2	1798	March 14, 2017
CB Community Ed 5.1.1 data loss Couchbase Server server	4	1979	July 12, 2019
Loss of 150k document from bucket Couchbase Server	2	713	July 22, 2019

Data loss problem (4.0.0-4051)

Related topics