Question about Ephemeral buckets

I have a normal couchbase bucket that have something like 300k documents. In a Java application, we use this bucket heavily and need to query it a lot… and this is hindering the performance. So now we’re thinking of in-memory data/caching. From what I understand, Ephemeral buckets are in-memory only, no persistance… But can I initialize an Ephemeral bucket from a normal couchbase bucket on couchbase startup for example? What are my options here?

Thanks,

1 Like

No - it’s a different bucket type (like Couchbase or memcached) - you have to select the bucket type when you create the bucket.

You can populate an Ephemeral bucket using the same methods as a Couchbase bucket - restore from a backup; use XDCR to copy data from another live bucket or some application-specific way to populate data.

Thanks for the information… I did benchmarks between an Ephemeral bucket and a normal couchbase bucket and to my surprise the performance is almost similar on both.

For example, I did 300 queries (one query per loop (java)) with some minor operations and that took 1083ms
Same thing on the normal bucket and it took 1092ms

Maybe having an ssd helps the normal couchbase bucket.

That sounds like you’re mostly benchmarking your network environment :wink:

If you just do that you’ll have only one concurrent operation in flight at any one time, which is unlikely to match a real-world workload (I’d expect you’d have many 10s, 100s or even 1000s of requests in flight).

Additionally, you say “query” - are you actually benchmarking a N1QL query, or a direct Key/Value operation? If it’s the former, then most of your time is likely spent in the query engine, and not in the Key/Value Data Service.

If you post your benchmark code we might be able to comment further.

I re-did the benchmarks with different approaches, would like to hear your opinion. I think only approach #3 actually uses the N1QL service but it’s only doing a key scan.

Think I got three questions

  1. Why is the ephemeral bucket and normal bucket performing almost the same?
  2. What would the best approach to stick with? In a scenario where I need to fetch X amount of documents from the database to read values from.
  3. Is the slowness in last approach due to the overhead of converting jsondocs to actual mapped entities?

Notes: The normal couchbase bucket have 1GB of allocated memory, the Ephemeral have around 2.5GBs. Java code and couchbase are on the same machine.

Well, your benchmark is measuring GET performance of a small number (566) of keys. Both Couchbase and Ephemeral buckets will cache documents in memory - for Couchbase buckets it can eject those items (and re-read from disk) if there’s insufficient memory. So it would be expected that measuring GET times will be similar (if not identical) durations.

Depends on what workload you are trying to model / what your requirements are :wink:

Approach 1 - where you use async(), allowing the SDK to issue multiple requests in parallel - will likely to be the fastest, but can be more complex, especially when dealing with errors.

Approach 2 - a simple for() loop - is the simplest and might be fine for non-performance sensitive use-cases, or where you want to make sure you have 1 document (and maybe perform some updates on it) before accessing the next document

Approach 3 - N1QL query - will always be slower than a simple K/V operation, but it does allow greater flexibility if you want to apply a WHERE clause to what you fetch. However if you already know the key(s) you need, direct KV is virtually always preferred.

No idea tbh, I don’t know much about Spring performance.

Note that Ephemeral buckets aren’t really about being faster than Couchbase buckets - both will cache cache in memory for fast reads, and for writes Couchbase buckets will by default queue the disk write so the SET() will will return often before the item has hit disk (see Observe() if you want to change this behaviour).

Ephemeral’s purpose is one of efficiency and simplicity - you don’t need to provision or manage disk; and because there’s no cost of writing to disk you can often perform the same amount of work on a smaller machine / cluster.

Note if you really push a cluster - such that nodes are actually CPU-bound then you should see Ephemeral actually have lower latency / higher throughput - but until you reach that point you’ll likely see them perform the same, except Couchbase buckets would be expected to have higher CPU usage.

For example on our own testing environment where we run with:

10 CPU cores, doing 50% reads 50% writes to 200 million documents of size 512Bytes

We see Ephemeral do 1,340,000 op/s, and Couchbase do 998,000 op/s -34% higher throughput for Ephemeral compared to Couchbase bucket.

2 Likes