TLDR: The number of documents in my database abruptly dropped from 1,150,000 to 850,000 today, and I can’t figure out why.
After some strange sync gateway behaviour (documents weren’t being properly replicated to all client devices), I went to diagnose the issue on Couchbase server. I decided to start creating a backup before I looked at things, and then went back to the console and saw the number of documents drop by the amount mentioned above, then the console crashed. I’ve been trying to diagnose the issue for the past four hours or so, and am barely closer to figuring out what caused it.
A few things that seem important:
The first error in the logs was thrown by memchached:
Service 'memcached' exited with status 137. Restarting. Messages: 2017-10-09T10:42:20.087445Z WARNING 43:
Slow STAT operation on connection: 703 ms ([ 127.0.0.1:51593 - 127.0.0.1:11209 (Admin) ])
2017-10-09T10:45:23.882821Z WARNING 42: Slow STAT operation on connection: 539 ms ([ 127.0.0.1:53991 -
127.0.0.1:11209 (Admin) ])
2017-10-09T10:54:30.075347Z WARNING 45: Slow STAT operation on connection: 578 ms ([ 127.0.0.1:39257 -
127.0.0.1:11209 (Admin) ])
2017-10-09T10:58:33.994958Z WARNING 43: Slow STAT operation on connection: 607 ms ([ 127.0.0.1:51593 -
127.0.0.1:11209 (Admin) ])
2017-10-09T11:07:45.909795Z WARNING 43: Slow STAT operation on connection: 524 ms ([ 127.0.0.1:51593 -
127.0.0.1:11209 (Admin) ])
Following this, the the logs show this erorr: Control connection to memcached on 'ns_1@127.0.0.1' disconnected: {{badmatch, {error, closed}},
… (there’s a lot more)
and: Service 'memcached' exited with status 134. Restarting
Following this there were a few compaction errors:
Compactor for view `<my_bucket>/_design/sync_housekeeping` (pid [{type,
view},
{name,
<<"<my_bucket>/_design/sync_housekeeping">>},
…
Following this couchbase crashed
I’ve considered the possibility that compaction just hadn’t run before, but it doesn’t seem possible that that many expired documents were in the database - there’s no way for clients to delete more than one document at a time.
Happy to provide more detailed logs, or any other information, just wanted to get this up asap, because I’m really not sure what next steps are. It seems highly unlikely that my the data is actually gone, but also something has definitely gone wrong.
Thanks!