Hey guys, non-emergency, but I think you might be interested in this. Maybe it is related to the other memcached crashes.
This is reproducible! I have a cluster of 3 physical machines running Version: 4.5.0-2601 Enterprise Edition (build-2601) on Ubuntu 4.4.0.x.
I removed one bucket that I did not need anymore from the cluster but did not tell the web server about it (Glassfish 4.1.1 , 4th machine). When the web server hits the cluster with the first read, the whole CB cluster is going down, the nodes try to come up and re-balance but I did not let them sit and try for longer than a few minutes. A cluster stop and restart comes up clean.
The fixed code (on my side) with the extra bucket removed runs clean. I am leaving the buggy code around in case you are interested in checking any fixes against it. But to set it up, it seems: Create a web app that is configured to run with 2 or more buckets, then remove one bucket but don’t update the web server or maybe create an app that is configured to use a bucket that is not existing.
I will upgrade to 4.6 when it is out.
It’d definitely be interesting if you could upload a cbcollect_info and maybe point to it here on the forums. I suspect @drigby or @pvarley would like to take a look and may be able to tell you if it’s a known issue.
Note that 4.6 is available as a DP at the moment.
I did a cbcollect_info this morning and uploaded it to …/neurocollective/cbcrash.zip. I hope there is everything in it (first time user of cbcollect_info). If not, I will keep the environment around to reproduce.
What host / machine did you upload it to?
Dave.
I folllowed the online instructions, sent it to:
https://s3.amazonaws.com/customers.couchbase.com/neurocollective/cbcrash.zip
Tom
For a start you’re running on an unsupported OS (Ubuntu 16.04) - Ubuntu 14.04 is the newest Ubuntu supported. Suggest you try 14.04 and see how that works.
I’ll try and take a look at the crash dump soon.
Dave, I am running on 16.04 since it came out and have not experienced any problems with Couchbase on it, except for a few configuration documentation that are different on 16.04. This issue here is a programming error: Not updating the application server about a change in the Couchbase configuration. It is fixed by correcting the code. The only reason this is posted here is because I can see that this could be used as a malicious attack on a Couchbase cluster and the cluster is not graciously recovering from it … and … I have been a software developer for 30 plus years, I find it useful if an issue like this is reproducible, it allows tracking right into the faulty code. Tom
Hi Tom,
I was able to reproduce your issue by installing the same Couchbase version on a clean installation of Ubuntu 16.04.1 in virtualbox, but when I tried to reproduce the issue on a clean installation of Ubuntu 14.04.5 everything works as expected.
To be fair you are copying files built agains one system over to another system, and we’ve not guaranteed any binary compatibility for all of the libraries we’re using between the various os versions. It might work, or it might not work. I did write some extra unit tests to simulate this situation and I was not able to reproduce the crash when I was building (and running) on the Ubuntu 16 system (and running the full server also works without a problem).
Ubuntu 16 is currently an unsupported platform and issues like this will be fixed once we certify for the platform (I’m currently looking into other authentication related problems, so I’ll keep an eye out for this to see if there could be something related).