Hi, I am trying to improve performance on our server by using sync gateway _bulk_get calls instead of the default couchbase nodeJS library multiget (which tends to fail immediately). I found that the sync gateway _bulk_get has really bad performance though as the number of documents increases. I have 4 _bulk_gets going at once, each one with 2000 to 5000 documents, and they are taking 10 to 17 seconds each. Is this the expected performance level of sync gateway _bulk_get? If not is there any server level config or hardware config I may be missing that could improve the performance?
The sync gateway server is running on the same host as the requester, and it is a dev server so there is no other load on the server. Sync Gateway v1.1.1
Here is some of my raw test data, with Start and End being in Epoch MS and Total being in ms:
NUM_DOCS–2179 START–1457112126419
NUM_DOCS–3894 START–1457112127003
NUM_DOCS–4925 START–1457112127406
NUM_DOCS–2745 START–1457112127748
NUM_DOCS–2179 END–1457112137220 TOTAL–10801
NUM_DOCS–2745 END–1457112137667 TOTAL–9919
NUM_DOCS–3894 END–1457112142450 TOTAL–15447
NUM_DOCS–4925 END–1457112144064 TOTAL–16658
EDIT: In Comparison here are the results for the same calls using couchbase nodejs getMulti
couchBaseGetMulti NUM_DOCS–2179 START–1457114480373
couchBaseGetMulti NUM_DOCS–2745 START–1457114480557
couchBaseGetMulti NUM_DOCS–3894 START–1457114480634
couchBaseGetMulti NUM_DOCS–4925 START–1457114480677
couchBaseGetMulti NUM_DOCS–2179 END–1457114481033 TOTAL–660
couchBaseGetMulti NUM_DOCS–2745 END–1457114481286 TOTAL–729
couchBaseGetMulti NUM_DOCS–3894 END–1457114481679 TOTAL–1045
couchBaseGetMulti NUM_DOCS–4925 END–1457114482159 TOTAL–1482
Using Sync Gateway for bulk retrieval is usually going to be less efficient that retrieval directly against the bucket using the Couchbase SDK, as Sync Gateway will be doing extra work to identify the active revision for the document, apply read security, and strip out the Sync Gateway metadata.
Is this a one-time operation, or do you expect to be making multiple requests for the same 2-5K docs? If it’s the latter, you might see some performance improvement by increasing the size of the revision cache for your Sync Gateway (rev_cache_size in your SG config, more details here http://developer.couchbase.com/documentation/mobile/1.2/develop/guides/sync-gateway/configuring-sync-gateway/config-properties/index.html). This will result in additional memory requirements for Sync Gateway, but will avoid the round trip to Couchbase Server for each call.
Thanks for the info, but it doesn’t quite apply to my situation since many different users are going to be making these requests and they will be asking for different docs each time. I usually try to keep requests to a few hundred docs max, but this was a stress test to see how it would handle so many at once. I find with a few dozen to a couple hundred it is a very fast request so overall I’m happy with the performance, I just wanted to stress test it with an extreme case to see how it would do.
Our server uses nodejs, so we’ve only been using the couchbase nodejs library to do direct retrievals. I wanted to try sync gateway bulk get since the getMulti call in the nodejs library is spotty and randomly fails if there is normal load on the server. I don’t know if it’s an issue with the library or with my server setup, but until I have time to investigate which one it is I figured I should try other options.