Couchbase is consuming more RAM

Hi,

We are inserting 2 Billion documents into Couchbase.

We have 3 node cluster with each 15GB RAM and replica factor is also 1.

But when insertion reached 245 million it stopped inserting and in the logs we saw reach 80% usage of RAM.

We have 500GB disk storage and it just used 126GB a of it. But RAM usage reached limit. and stoped our program which does inserting, But this RAM is not getting released.

Does nebula store some documents in RAM?

Why it is storing docs in RAM even though we have disk storage?

Thanks,
Akhil

@akhilravuri2,

First and foremost RAM provides a cache so we always expect to utilize RAM as much as possible in Couchbase.

Now as to your issue “RAM usage reached limit. and stop[p]ed” what were your bucket settings for your “Ejection Method”?

Value Ejection: During ejection, only the value will be ejected (key and metadata will remain in memory).

Full Ejection: During ejection, everything (including key, metadata, and value) will be ejected.

Value Ejection needs more system memory, but provides the best performance. Full Ejection reduces the memory overhead requirement.

If you have your bucket set to “Value Ejection” (the default for “Couchstore”) your system will keep the “key” in memory and this stopping can indeed can happen if you do a test in the UI you might see some warnings pop-up like

[17 Jul, 2022, 10:03:31 AM] - Metadata overhead warning. 
Over 56% of RAM allocated to bucket "bb" on node "127.0.0.1" 
is taken up by keys and metadata.

I did a test with a single 100MB couchstore bucket with the ejection method “Value Ejection” on a single node test setup. I could only load 565,543 documents like the below:

{"type":"vbs_seed","id":40600001,"dummy":"not_set","timeStamp":"December 5, 2018"}

Essentially I had a source file of 100K docs like the above called “100k.json” then I used the following script to quickly hit capacity.

for i in `seq 1 500`; do
echo "seq $i trying 100K load"
/opt/couchbase/bin/cbimport json -c couchbase://127.0.0.1 -u $CB_USERNAME -p $CB_PASSWORD -b bb -d file://./100k.json -f lines -t 12 -g bigtest:#UUID#:#MONO_INCR#
done

Now changing the ejection method “Full Ejection” I could once again load data (just rerun the bash script above) - I ran the test 50M with no issues.

I do want to caution that Couchbase generally officially only supports a 10% residency ratio on buckets of type “Couchstore” our next release 7.2 will have a storage backend of “Magma” (the default ejection method here will be “Full Ejection” ) that will support very low residency ratios across all of our services.

Best

Jon Strabala
Principal Product Manager - Server‌

@jon.strabala

Thanks for the detailed explanation.

We have used Value Ejection.

Based on Couchbase server sizing we have a created table and please let us now if we go with Value Ejection is the below table requirements are correct or not:

|Variable|Calculation|Our Data Requirement|
|---|---|---|

|no_of_copies|1 + number_of_replicas| 2|

|total_metadata|(documents_num) * (metadata_per_document + ID_size) * (no_of_copies)|200cr * (56+44) * 2 bytes = 40k cr bytes = 400 GB|

|total_dataset|(documents_num) * (value_size) * (no_of_copies)|200cr * 1000 * 2 bytes =    4L cr = 4000 GB Data|

|working_set|total_dataset * (working_set_percentage)|4000 GB * 0.2 = 800 GB|
|Cluster RAM quota required|(total_metadata + working_set) * (1 + headroom) / (high_water_mark)|(400 + 800) * (1+0.25) / 0.85 = 1765 GB|

|Number of nodes|Cluster RAM quota required / per_node_ram_quota|1765/ 20 = 88 Nodes|

Does Couchbase support storing for LRU keys for 15GB RAM(or what ever the RAM we allocate)?

Thanks,
Akhil

@akhilravuri2 sizing the data service isn’t part of my focus at couchbase.

When you use a Bucket Type of “Couchbase” and a Storage Backend of “Couchstore” and have at least a Residency ratio of 10% or greater along with “Full Ejection" you should have no issues.

If I read your comment correctly (ignoring headroom and saying that metadata and the doc fit within 4TB) you have 4TB that you are storing and want a 800 GB active set or about 20% of your 4TB dataset. As long as your KV nodes contribute at least 400GB RAM to the bucket in question you would have a 10% residency (our minimal support guideline).

Professional services via your sales channel can give you a definitive answer for your exact use case.

Best

Jon Strabala
Principal Product Manager - Server‌