Couchbase pod consumes high CPU and memory continously. on idle state

Running a couchbase single node cluster on kubernetes, utilization gets as high as 7CPU, 7Gi Mem on idle state.
inside the container memcached process is consuming 600% CPU and utilization does not come down even after pod is restarted.

why is memcached using 600% cpu continously on idle state. is there a way to free memory and other resources.

couchbase version 6.6.0- CE, observed same on EE as well.
cluster has 10 buckets(with indexes present)
POD Resources,
CPU limit: N/A
MEM limit: 10 Gi

Please let me know if more information is required.

Hi @dushyant,

Buckets are heavy weight iems, think as every bucket like it’s own database. In the past we supported only 10 buckets max more recently we support 30 buckets (version 6.X). In the latest version 7.0.0 we know support collections as defined by keyspaces “bucket.scope.collection”

So in Couchbase version 7.0.0 can put 100 (or 1000 collections) in one bucket this is much more efficient WRT resources resource utilization on low end clusters is something we are currently investigating quite seriously now.

Now back to your observation 600% cpu continuously on idle state under 6.6.0 CE, under the hood couchbase shards your data into 1024 vbuckets per bucket. The statistics side of the fence is responsible for a lot of this CPU and activity. In 7.0.0 this has been addressed with re-architecting the design of statistics. You also say you have indexes present this too can create CPU load.

In order to give you some advice (other then dropping your bucket count down in 6.6.X and use a “type”: value filed to discriminate your document categories) I would need to know a few more things:

  • Is your docker instance running on Linux, MacOS, or a PC?
  • Does your hypervisor run other VMs?
  • How many documents do you have?
  • How many indexes do you have?
  • What is your mutation rate (or number of changes per second)?
  • When you say 7 CPU ( I assume you mean vCPUs or 1/2 a core) right?

Best

Jon Strabala
Principal Product Manager - Server‌

Hi @jon.strabala ,

Thank you for your response.

Please find my answers below:

Is your docker instance running on Linux, MacOS, or a PC?
Docker is running on Linux.

Does your hypervisor run other VMs?
Only docker is running. (let me know if you expect some other detail)

How many documents do you have?
[in current idle state]

“interestingStats”: {
“cmd_get”: 0,
“couch_docs_actual_disk_size”: 1159702158,
“couch_docs_data_size”: 1083027893,
“couch_spatial_data_size”: 0,
“couch_spatial_disk_size”: 0,
“couch_views_actual_disk_size”: 0,
“couch_views_data_size”: 0,
“curr_items”: 22333,
** “curr_items_tot”: 22333,**
“ep_bg_fetched”: 0,
“get_hits”: 0,
“mem_used”: 1938554944,
“ops”: 0,
“vb_active_num_non_resident”: 16264,
“vb_replica_curr_items”: 0
}

In actual run documents will be dynamically created/deleted/updated.

How many indexes do you have?
we have 20 indexes created.

What is your mutation rate (or number of changes per second)?
As I mentioned before, System is idle , no operations on couchbase (please let me know if you expect some other detail. )

When you say 7 CPU ( I assume you mean vCPUs or 1/2 a core) right?
Yes 7vCPU.

The pod is running since 24+hrs. I am also adding some details from ‘couchbase-cli server-info’. Please see below.

{
“cbasMemoryQuota”: 2130,
“eventingMemoryQuota”: 256,
“ftsMemoryQuota”: 1024,
“hostname”: “[::1]:8091”,
“indexMemoryQuota”: 1024,
“interestingStats”: {
“cmd_get”: 0,
“couch_docs_actual_disk_size”: 1159702158,
“couch_docs_data_size”: 1083027893,
“couch_spatial_data_size”: 0,
“couch_spatial_disk_size”: 0,
“couch_views_actual_disk_size”: 0,
“couch_views_data_size”: 0,
“curr_items”: 22333,
“curr_items_tot”: 22333,
“ep_bg_fetched”: 0,
“get_hits”: 0,
“mem_used”: 1938554944,
“ops”: 0,
“vb_active_num_non_resident”: 16264,
“vb_replica_curr_items”: 0
},
“mcdMemoryAllocated”: 617993,
“mcdMemoryReserved”: 617993,
“memoryQuota”: 4000,
“nodeEncryption”: false,
“os”: “x86_64-unknown-linux-gnu”,
“ports”: {
“direct”: 11210,
“distTCP”: 21100,
“distTLS”: 21150,
“httpsCAPI”: 18092,
“httpsMgmt”: 18091
},
“recoveryType”: “none”,
“services”: [
“fts”,
“index”,
“kv”,
“n1ql”
],
“status”: “healthy”,
“storageTotals”: {
“ram”: {
“quotaTotal”: 4194304000,
“quotaTotalPerNode”: 4194304000,
“quotaUsed”: 2726297600,
“quotaUsedPerNode”: 2726297600,
“total”: 16106127360,
“used”: 6935519232,
“usedByData”: 1938554944
}
},
“systemStats”: {
“allocstall”: 0,
“cpu_stolen_rate”: 0,
“cpu_utilization_rate”: 30.19281732614457,
“mem_limit”: 16106127360,
“swap_total”: 0,
“swap_used”: 0
},
“uptime”: “96993”

Looking forward to your comments/suggestions.

Thanks.

-Dushyant

First I don’t use CE, but IMHO it does seem rather odd given the limited number of documents you have but you can try the following one step at a time. I don’t do Indexing I am an Eventing person but I think your issue might be related to the indexer and the projector (both part of indexing).

  • Experiment 1 try to add more quota for the INdex service say a total 3GB if you can happens to the CPU/MEM?

  • Experiment 2 drop all your indexes what happens to the CPU/MEM?

  • Experiment 3 drop down to 5 buckets, 2 buckets, or just one bucket what happens to the CPU/MEM?

    • If you have empty bucket drop them (if you can)
    • If you can’t drop your buckets add a “type”: “sometag” to your documents to combine them into one bucket, when you do this you may need to modify your app.
  • Experiment 4 Don’t use services like Eventing or FTS if you don’t need them, make a new cluster with less services what happens to your CPU/MEM?

  • Experiment 5 export your data and try in Couchbase 7.0.0 but only have one bucket with 10 collections (should be much much more lean WRT resources) what happens to your CPU/MEM?

If you can do any of the above great if you can’t let me know and I will pass this off to someone up the proverbial food chain here at Couchbase.

Other

Couchbase is designed for high scalability 10 to a 100 nodes and it is a database where you string things together via microserves where each micro service can be individually scaled. I am concerned that you run on a single node (zero data redundancy). Only in the most recent release 7.0.0 have we started to aggressively address low end clusters (and servers).

The above is a 2 vCPU 2GB cluster (it has 3 nodes) which is smaller than what Couchbase currently recommends, but you can see how as you add buckets your CPU will go up (RED) but if you add collections you CPU resources are used much more efficiently (BLUE). I am including this graph to point out the switching form buckets to collections should be a big win for your use case.

Best

Jon Strabala
Principal Product Manager - Server‌

There is also some useful analysis of CPU resource usage due to virtualised Linux environments on https://issues.couchbase.com/browse/MB-39618

Although I note improvements for this should be in the version you’re using but it may give some other avenues of investigation.

@dushyant You can look for periodic Index statistics dumps in indexer_stats.log (newer versions) or indexer.log (where they used to be), such as:

2021-09-08T10:23:03.864-07:00 indexer {“avg_disk_bps”:0,“avg_drain_rate”:0,“avg_mutation_rate”:0,“avg_resident_percent”:100,“cpu_utilization”:6.459354064593541,“index_not_found_errcount”:0,“indexer_state”:“Active”,“memory_free”:7799390208,“memory_quota”:1258291200,“memory_rss”:195198976,“memory_total”:34359738368,“memory_total_storage”:16551936,“memory_used”:108728680,“memory_used_queue”:0,“memory_used_storage”:1118099,“needs_restart”:false,“num_cgo_call”:16633,“num_connections”:5,“num_cpu_core”:12,“num_goroutine”:705,“num_indexes”:14,“num_storage_instances”:17,“storage_mode”:“plasma”,“timestamp”:“1631121783864162000”,“timings/stats_response”:“58 5559871 614040832595”,“total_data_size”:630722,“total_disk_size”:778240,“uptime”:“1m0.845640755s”}

Formatted:

{
    "avg_disk_bps": 0,
    "avg_drain_rate": 0,
    "avg_mutation_rate": 0,
    "avg_resident_percent": 100,
    "cpu_utilization": 6.459354064593541,
    "index_not_found_errcount": 0,
    "indexer_state": "Active",
    "memory_free": 7799390208,
    "memory_quota": 1258291200,
    "memory_rss": 195198976,
    "memory_total": 34359738368,
    "memory_total_storage": 16551936,
    "memory_used": 108728680,
    "memory_used_queue": 0,
    "memory_used_storage": 1118099,
    "needs_restart": false,
    "num_cgo_call": 16633,
    "num_connections": 5,
    "num_cpu_core": 12,
    "num_goroutine": 705,
    "num_indexes": 14,
    "num_storage_instances": 17,
    "storage_mode": "plasma",
    "timestamp": "1631121783864162000",
    "timings/stats_response": "58 5559871 614040832595",
    "total_data_size": 630722,
    "total_disk_size": 778240,
    "uptime": "1m0.845640755s"
}

In particular avg_resident_percent, memory_quota, memory_rss (memory-resident-set size) might help diagnose.