Hi guys, I have set up 2 couchbase (version 3) nodes and was trying to insert about 100 million documents for testing purposes, but it failed half-way through (at 30 million) with an error of “Write Commit Failure. Disk write failed for item in Bucket “myhoney” on node”.
I’m not too sure why this happened, but this caused one of the node to die (status: down) which prevents me from accessing the web-admin on that node on port 8091. I can still however connect to the web-admin on the other node just fine and went on to fail-over that node.
Im not too sure on how to recover the dead node, but what i did try is to restart the dead node but for some reason it keeps giving me “connection timed out”.
I’m still new to couchbase so does anyone know why i was getting the error “Write commit failure” in the first place? and how do i recover/restart this node without reinstalling couchbase?
Im not sure if this is useful, but this is what i found in info.log
[user:info,2015-06-10T13:07:03.702,ns_1@ec2-54-66-131-63.ap-southeast-2.compute.amazonaws.com:<0.15588.56>:menelaus_web_alerts_srv:global_alert:81]Write Commit Failure. Disk write failed for item in Bucket "myhoney" on node ec2-54-66-131-63.ap-southeast-2.compute.amazonaws.com.
[ale_logger:error,2015-06-10T13:07:05.597,ns_1@ec2-54-66-131-63.ap-southeast-2.compute.amazonaws.com:ale<0.35.0>:ale:handle_info:253]ale_reports_handler terminated with reason {'EXIT',
{noproc,
{gen_server,call,
['sink-disk_debug',
{log,
<<"[error_logger:error,2015-06-10T13:07:05.591,ns_1@ec2-54-66-131-63.ap-southeast-2.compute.amazonaws.com:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]\n=========================CRASH REPORT=========================\n crasher:\n initial call: ale_disk_sink:-spawn_worker/1-fun-0-/0\n pid: <0.193.57>\n registered_name: []\n exception error: no match of right hand side value {error,enospc}\n in function ale_disk_sink:'-write_data/3-fun-0-'/2 (src/ale_disk_sink.erl, line 487)\n in call from ale_disk_sink:time_stat/3 (src/ale_disk_sink.erl, line 527)\n in call from ale_disk_sink:write_data/3 (src/ale_disk_sink.erl, line 485)\n in call from ale_disk_sink:worker_loop/1 (src/ale_disk_sink.erl, line 450)\n ancestors: ['sink-disk_debug',ale_dynamic_sup,ale_sup,<0.31.0>]\n messages: []\n links: [<0.183.57>,#Port<0.197089>]\n dictionary: []\n trap_exit: false\n status: running\n heap_size: 610\n stack_size: 27\n reductions: 596\n neighbours:\n\n">>},
infinity]}}}; restarting
and here is what i found in memcached.log.13.txt
Wed Jun 10 08:16:23.622715 UTC 3: 215 Closing connection due to read error: Connection timed out
Wed Jun 10 08:16:23.622857 UTC 3: 272 Closing connection due to read error: Connection timed out
Wed Jun 10 08:16:23.622872 UTC 3: 273 Closing connection due to read error: Connection timed out
Wed Jun 10 08:16:23.622893 UTC 3: 620 Closing connection due to read error: Connection timed out
Wed Jun 10 08:16:23.622904 UTC 3: 623 Closing connection due to read error: Connection timed out
Wed Jun 10 08:16:23.622879 UTC 3: 622 Closing connection due to read error: Connection timed out
Wed Jun 10 08:16:52.092113 UTC 3: (myhoney) Requst to vbucket 1023 deletion is in EWOULDBLOCK until the database file is removed from disk
Wed Jun 10 08:16:52.095526 UTC 3: (myhoney) Deletion of vbucket 1023 was completed.
Wed Jun 10 08:16:52.096006 UTC 3: (myhoney) Requst to vbucket 1022 deletion is in EWOULDBLOCK until the database file is removed from disk
Wed Jun 10 08:16:52.099495 UTC 3: (myhoney) Deletion of vbucket 1022 was completed.
Wed Jun 10 08:16:52.099809 UTC 3: (myhoney) Requst to vbucket 1021 deletion is in EWOULDBLOCK until the database file is removed from disk
Wed Jun 10 08:16:52.103598 UTC 3: (myhoney) Deletion of vbucket 1021 was completed.