I have discovered why the server wasn’t coming back up and it’s as surprising as it is simple. We’re running out of memory on the server. I’m a little surprised that there is no clear out of memory error in the log but at least I know what it is now. Likely due to the number of buckets we’re using.
And I can’t even find why… So far I’ve been trawling the log files looking for any indication why our CouchBase database won’t come back up. So far all I’m finding are errors saying the buckets “aren’t ready yet” and finally the cluster going down because it can’t find any nodes that are alive. So far the only error I can find that is somewhat descriptive is the following:
[stats:error,2013-06-21T17:19:29.785,ns_1@127.0.0.1:<0.11698.2>:stats_collector:handle_info:106]Exception in stats collector: {exit,
{{{badmatch,{error,closed}},
[{mc_client_binary,stats_recv,4},
{mc_client_binary,stats,4},
{ns_memcached,has_started,1},
{ns_memcached,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[‘ns_memcached-careers’,
{stats,<<>>},
180000]}},
[{gen_server,call,3},
{ns_memcached,do_call,3},
{stats_collector,grab_all_stats,1},
{stats_collector,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
Our node is running close to the limit of 10 buckets and we’ve got 1 or 2 with more then 100K documents in it. We’ve built CouchDB based solutions before with more documents then this so I figure this shouldn’t be causing it not to start. Anyone got any tips or suggestions on where to look next? This is our development server so I can just wipe it and rebuild the database but I need to know why this happened so that it never happens in a production setup.
Any help or suggestions would be greatly appreciated.
I want to create an bug/improvement request to provide better feedback in this case of situation. Can you give me more information about it:
- Which OS are you using?
- RAM configuration?
- Overall size of the RAM in all machine
- RAM Quota global and per bucket
- The cluster was up and running? then crashed? and cannot restart?
What are the step to reproduce the problem?
Thanks
Tug
@tgrall