We have a 7 node cluster with 577218160 keys RAM quota 45000MB / per node, 315000 MB total. Every so often the server throws temporary errors to the clients. I finally discovered that the source of this problem is the following:
Number of back-offs sent per second to drivers due to “out of memory” situations from this bucket (measured from ep_tmp_oom_errors).
My question is, how can I reduce the amount of items in the Memcache and avoid ep_tmp_oom_errors?
This TMPFAIL usually occurs when you are adding items faster than they can be persisted to disk. The best approach is usually to add more IO, possibly by adding more nodes. The other approach may be to add more memory since IO is batched and a larger amount of memory would let you buffer more. There’s no guarantee adding memory would work though, since the batching may not help enough and you’d still need more IO.
Depending on your use case, Couchbase Server 3.0’s “tunable memory” option, which allows you to keep less metadata in memory at the cost of having disk IO (or SSD) style latencies for item access outside the working set, may also let you reduce or eliminate this occasional OOM condition.
I have to say, I don’t agree that it’s a limitation and we do document it in a few places.
Memory is faster than disk so if you want to allow for both acknowledgement at memory and acknowledgement at memory and disk (Couchbase supports both), you’ll have to decide what to do when memory becomes full.
Pushing back on the application is a normal thing to do when building scalable systems. For instance this is what an HTTP 503 does.
Well you might not agree but you are not the one who has to explain to his customers why is the CB cluster down when everything is working perfectly. We are running on 4 x SSDsRAID 10 and I can’t configure my system to use disk only for oler items or remove the non-used elements from Memcached. If it runs out of Memcached memory it just starts to throw temporary failures instead of more aggressive ejection.
You also not the one who spent 4 days on debugging why is CB returning 503s. After reading your documentation -that is btw. incorrect so many places that I lost count- finally came to the conclusion that you did not give too much control over what is kept in memory so if your system reaches this limit it starts to fail and the only thing what I can do is to give more memory of more nodes.
Now, you admitted that this is a problem by introducing more control over the metadata in 3.0. If it was not a problem why is that feature added?
We can disagree about how these distributed systems should work, but at least you have to be upfront about it like Riak is. I was aware of the trade offs of the different data stores just by reading through their documentation that is correct and comprehensive compare to CB.
I have run into roughly ~5-8 major bugs since I was forced to use CB and non of them are fun, but this one is the most annoying one of them all. This problem itself would be enough reason for me not to consider CB for anything serious.
Talking about other bugs:
node managements is FUBAR, if need to add or remove nodes we tear down the entire cluster wipe out all of the data and rebuild, this became our official policy to deal with CB
iterating over a view where you can’t do startkey and endkey is getting slow by every iteration because the seek time for skip=N ~ N, so if you have 10000000 keys you can’t use this feature
your documentation states that you can configure the low and the high watermark for the memcahced with percentage but unfortunately it is in bytes so you just trolling anybody who is new to this system at such extent that is outrageous
rebalancing failed, this is my favorite. I am not sure why am in the business of telling a data system that it needs to rebalance itself but once you have this feature it should probably work
Thanks for your support, now I can just use this little thread to go to management and justify why we are going to rip out CB from our production infrastructure and never put it back.
I would say that the limitation of “All meta data is stored in RAM” is a pretty severe limitation. Couchbase basically says you fetch and store as many keys as you want, and it basially overflows to disk. Except in the case where the shear number of keys you have exceeds what can be in RAM. Then you have to ask your self “what limitation am i pushing up against for this to happen??”. That by definition is a limitation.
I don’t want to argue with you on Couchbase since based on your tone I assume you already made up your mind on Couchbase.
It was not a problem, but a new feature added. Every database system has tradeoffs, and if you do not size your cluster properly you will run into errors on any of them. We saw more and more requirements where the data size is much larger than RAM available so we added this feature and you can enable optionally. I don’t see anything wrong with how we approach things. And if you look at distributed systems, my opinion is that is much better to signal quickly back to the producer that the consuming system is overloaded instead of just accepting things and swamping the whole system.
Because you are more intelligent than a system is, ever. If you are a sys admin you can add and remove nodes at the same time. The choice was consciously made to split up the actual node add/removal from the rebalance part. Since you read the documentation I think you also know how failover works and plays into it nicely with rebalance.
Other than the strong language here: this is not how to operate a Couchbase cluster, so I wonder what you are doing over there. I recommend you to get help from our support team if you don’t know how to handle a Couchbase cluster and manage it properly. We have lots of customers running clusters 24/7 and not running into issues like you do.
I’ll stop arguing for now, but feel free to ask specific questions or if you have other troubles. I’m sure with normal tone we can resolve them.
I don’t agree with you here. It is a limitation, but one that has been made consciously since there are always tradeoffs. If you care about predictable performance in all cases (and we are coming from a very cache oriented background), that’s the only way to do that properly, even when data is ejected.
That said, with 3.0 we added a new feature that you can enable optionally to work the way you expect it to here, but keep in mind that there are still tradeoffs. It’s always about tradeoffs.
I understand its about tradeoffs… And understand the reasoning…I also understand that it was conscious decision. I’m not totally sure what is being disagreed with here. I just pointed out that said decision does in fact create a limitation on the number of keys you can store… And a pretty painful one at that. We see our cluster falling over in production because of this, and there is nothing we can do about it other than upgrade. And I have to tell you its hard to sell that as a feature…and not as a limitation.
@siriele On 3.0 and forward, on the bucket details there is a new selection button it says “Full Eviction” for Cache Metadata. The default is “Value Eviction” which is the old/default behaviour.
Well the feature is not that it starts to sending TMPFAIL but to give you consistent performance (and this is a side effect from it).
Oh really?.. Full eviction? That’s pretty snazzy actually. Does the SDK need to be upgraded as well?.. I’m assuming…yes?.. Or is it just good practice to keep the SDK up to date?. By the way thank you for pointing this out…I never know how the relationship between the SDK and server versions actually shakes out
@siriele in general we try to keep the SDKs compatible, so you are able to use older clients together with say 3.0, but you may be missing some new features that are available on the newer ones.
So for example with java you can use 1.4.5 with 3.0, but you won’t get SSL support, this is available in 2.0.1 of the java client (which is also backwards compatible).
In general, we recommend to stay at least on the latest bugfix release on your series, so if you are on 1.4 in java you should run 1.4.5 if possible. For 2.0, you should go to 2.0.1.
We also try to keep a compat matrix in the 2.0 docs, so for Java: http://docs.couchbase.com/developer/java-2.0/overview.html
If you tell us which SDKs you want to run we can guide you in the right direction of course.
The latest libcouchbase release is 2.4.3, though likewise, the 2.4.x series is fairly stable.
There’s a compatibility matrix in our documentation (at least for the C library) which shows which features are supported on which client and server, but this feature matrix is limited to the specific features which require coordination between both client and server. Most of the features of Couchbase are typically independent of the client, unless they expose APIs directly to the application: