I’m building my own monitoring dashboards in Grafana for my Couchbase Cluster and got confused with the following metrics:
ep_queue_size - Number of items queued for storage.
ep_diskqueue_items - Total items in disk queue.
disk_write_queue - Disk write queue, items.
Graphs for ep_queue_size and ep_diskqueue_items are identical. Graph for disk_write_queue most of the time equals to the other two, but sometimes it’s two times bigger.
Could someone please explain what is the difference between the three metrics?
A general notice about the stats data, it’s very difficult to find a good explanation of the metrics in the official documentation and some metrics seems to be undocumented at all.
ep_queue_size and ep_diskqueue_items are aliases to the same statistic - the number of items waiting to be persisted to disk. The only reason there’s two names is historical, there was an effort to give more consistent / user-friendly names, but we didn’t want to delete the “old” stat in case some users were depending on it.
disk_write_queue - this is a compound stat generated by ns_server (hence why you won’t find it listed in the stats.org link above). I just checked the ns_server source and it’s the sum of ep_queue_size and ep_flusher_todo. As per stats.org, the definition of ep_flusher_todo is:
ep_flusher_todo - Number of items currently being written