Hi, I am trying to figure out why my couchbase nodejs Bucket.get requests seem to randomly fail (getMulti is the same way) in our production instance. I enabled logging and tried running a single getMulti (with 1000 docIds to retrieve) and saw a ton of errors in the logs, each 0x17 error corresponds to it failing to retrieve a single document. If I do the same thing on a different server that does not usually fail I see no errors. I think I have the server configured incorrectly but I followed the couchbase install steps and didn’t see a ton of stuff about special server configuration being required. Based on the logs below can you tell if I have something setup wrong? My prod instance is 2 nodes connected in a cluster, but the script I am running is running locally on one of the nodes.
The following is what the log looks like on the server that is working:
2016-04-08T18:08:31.457Z INFO - ================== BEGINNING MULTI-GET PERFORMANCE TESTING ====================
0ms [I0] {53551} [INFO] (instance - L:372) Version=2.4.0, Changeset=0ac7fc4
0ms [I0] {53551} [INFO] (instance - L:373) Effective connection string: couchbase://127.0.0.1/todos. Bucket=todos
0ms [I0] {53551} [DEBUG] (instance - L:63) Adding host 127.0.0.1:8091 to initial HTTP bootstrap list
0ms [I0] {53551} [DEBUG] (instance - L:63) Adding host 127.0.0.1:11210 to initial CCCP bootstrap list
99ms [I0] {53551} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
99ms [I0] {53551} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
99ms [I0] {53551} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
99ms [I0] {53551} [DEBUG] (confmon - L:96) Provider HTTP is ENABLED
99ms [I0] {53551} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
99ms [I0] {53551} [TRACE] (confmon - L:292) Start refresh requested
99ms [I0] {53551} [TRACE] (confmon - L:271) Current provider is CCCP
99ms [I0] {53551} [INFO] (cccp - L:118) Requesting connection to node 127.0.0.1:11210 for CCCP configuration
99ms [I0] {53551} [DEBUG] (lcbio_mgr - L:416) <127.0.0.1:11210> (HE=0x1607eb0) Creating new connection because none are available in the pool
99ms [I0] {53551} [DEBUG] (lcbio_mgr - L:321) <127.0.0.1:11210> (HE=0x1607eb0) Starting connection on I=0x16084e0
99ms [I0] {53551} [INFO] (connection - L:441) <127.0.0.1:11210> (SOCK=0x1608610) Starting. Timeout=2000000us
99ms [I0] {53551} [TRACE] (connection - L:335) <127.0.0.1:11210> (SOCK=0x1608610) Received completion handler. Status=0. errno=0
99ms [I0] {53551} [INFO] (connection - L:116) <127.0.0.1:11210> (SOCK=0x1608610) Connected
99ms [I0] {53551} [DEBUG] (lcbio_mgr - L:271) <127.0.0.1:11210> (HE=0x1607eb0) Received result for I=0x16084e0,C=0x1608610; E=0x0
99ms [I0] {53551} [DEBUG] (lcbio_mgr - L:223) <127.0.0.1:11210> (HE=0x1607eb0) Assigning R=0x158edb0 SOCKET=0x1608610
99ms [I0] {53551} [DEBUG] (ioctx - L:101) <127.0.0.1:11210> (CTX=0x160bae0,unknown) Pairing with SOCK=0x1608610
99ms [I0] {53551} [DEBUG] (ioctx - L:151) <127.0.0.1:11210> (CTX=0x160bae0,sasl) Destroying. PND=0,ENT=1,SORC=1
100ms [I0] {53551} [DEBUG] (ioctx - L:101) <127.0.0.1:11210> (CTX=0x1608cf0,unknown) Pairing with SOCK=0x1608610
100ms [I0] {53551} [DEBUG] (ioctx - L:151) <127.0.0.1:11210> (CTX=0x1608cf0,bc_cccp) Destroying. PND=0,ENT=1,SORC=1
100ms [I0] {53551} [INFO] (lcbio_mgr - L:491) <127.0.0.1:11210> (HE=0x1607eb0) Placing socket back into the pool. I=0x16084e0,C=0x1608610
100ms [I0] {53551} [INFO] (confmon - L:174) Setting new configuration. Received via CCCP
100ms [I0] {53551} [DEBUG] (bootstrap - L:55) Instance configured!
2016-04-08T18:08:31.570Z - Couchbase connected to todos bucket
194ms [I0] {53551} [INFO] (lcbio_mgr - L:407) <127.0.0.1:11210> (HE=0x1607eb0) Found ready connection in pool. Reusing socket and not creating new connection
194ms [I0] {53551} [DEBUG] (lcbio_mgr - L:223) <127.0.0.1:11210> (HE=0x1607eb0) Assigning R=0x160bd50 SOCKET=0x1608610
194ms [I0] {53551} [DEBUG] (ioctx - L:101) <127.0.0.1:11210> (CTX=0x17cdec0,unknown) Pairing with SOCK=0x1608610
194ms [I0] {53551} [DEBUG] (server - L:507) <127.0.0.1:11210> (SRV=0x160ba40,IX=0) Setting initial timeout=2483ms
2016-04-08T18:08:31.832Z INFO - PERFORMANCE:: Time to retrieve 1000 docs: 368ms
2016-04-08T18:08:31.832Z INFO - Get Multi was successful!
2016-04-08T18:08:31.832Z INFO - ================== ENDING MULTI-GET PERFORMANCE TESTING ====================
373ms [I0] {53551} [DEBUG] (server - L:722) <127.0.0.1:11210> (SRV=0x160ba40,IX=0) Finalizing ctx 0x17cdec0
373ms [I0] {53551} [DEBUG] (ioctx - L:151) <127.0.0.1:11210> (CTX=0x17cdec0,memcached) Destroying. PND=0,ENT=0,SORC=1
On the production server the log looks like this:
2016-04-08T18:07:22.585Z - info: ================== BEGINNING MULTI-GET PERFORMANCE TESTING ====================
0ms [I0] {21732} [INFO] (instance - L:372) Version=2.4.0, Changeset=0ac7fc4
0ms [I0] {21732} [INFO] (instance - L:373) Effective connection string: couchbase://127.0.0.1/todos. Bucket=todos
0ms [I0] {21732} [DEBUG] (instance - L:63) Adding host 127.0.0.1:8091 to initial HTTP bootstrap list
0ms [I0] {21732} [DEBUG] (instance - L:63) Adding host 127.0.0.1:11210 to initial CCCP bootstrap list
19ms [I0] {21732} [DEBUG] (confmon - L:89) Preparing providers (this may be called multiple times)
19ms [I0] {21732} [DEBUG] (confmon - L:99) Provider FILE is DISABLED
19ms [I0] {21732} [DEBUG] (confmon - L:96) Provider CCCP is ENABLED
19ms [I0] {21732} [DEBUG] (confmon - L:96) Provider HTTP is ENABLED
19ms [I0] {21732} [DEBUG] (confmon - L:99) Provider MCRAW is DISABLED
19ms [I0] {21732} [TRACE] (confmon - L:292) Start refresh requested
19ms [I0] {21732} [TRACE] (confmon - L:271) Current provider is CCCP
19ms [I0] {21732} [INFO] (cccp - L:118) Requesting connection to node 127.0.0.1:11210 for CCCP configuration
19ms [I0] {21732} [DEBUG] (lcbio_mgr - L:416) <127.0.0.1:11210> (HE=0x2075f60) Creating new connection because none are available in the pool
19ms [I0] {21732} [DEBUG] (lcbio_mgr - L:321) <127.0.0.1:11210> (HE=0x2075f60) Starting connection on I=0x20763f0
19ms [I0] {21732} [INFO] (connection - L:441) <127.0.0.1:11210> (SOCK=0x2076520) Starting. Timeout=2000000us
19ms [I0] {21732} [TRACE] (connection - L:335) <127.0.0.1:11210> (SOCK=0x2076520) Received completion handler. Status=0. errno=0
19ms [I0] {21732} [INFO] (connection - L:116) <127.0.0.1:11210> (SOCK=0x2076520) Connected
19ms [I0] {21732} [DEBUG] (lcbio_mgr - L:271) <127.0.0.1:11210> (HE=0x2075f60) Received result for I=0x20763f0,C=0x2076520; E=0x0
19ms [I0] {21732} [DEBUG] (lcbio_mgr - L:223) <127.0.0.1:11210> (HE=0x2075f60) Assigning R=0x20ecc30 SOCKET=0x2076520
19ms [I0] {21732} [DEBUG] (ioctx - L:101) <127.0.0.1:11210> (CTX=0x20bd820,unknown) Pairing with SOCK=0x2076520
19ms [I0] {21732} [DEBUG] (ioctx - L:151) <127.0.0.1:11210> (CTX=0x20bd820,sasl) Destroying. PND=0,ENT=1,SORC=1
19ms [I0] {21732} [DEBUG] (ioctx - L:101) <127.0.0.1:11210> (CTX=0x20bdbe0,unknown) Pairing with SOCK=0x2076520
20ms [I0] {21732} [DEBUG] (ioctx - L:151) <127.0.0.1:11210> (CTX=0x20bdbe0,bc_cccp) Destroying. PND=0,ENT=1,SORC=1
20ms [I0] {21732} [INFO] (lcbio_mgr - L:491) <127.0.0.1:11210> (HE=0x2075f60) Placing socket back into the pool. I=0x20763f0,C=0x2076520
20ms [I0] {21732} [INFO] (confmon - L:174) Setting new configuration. Received via CCCP
20ms [I0] {21732} [DEBUG] (bootstrap - L:55) Instance configured!
2016-04-08T18:07:22.615Z - Couchbase connected to todos bucket
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:416) <10.0.0.10:11210> (HE=0x20c2350) Creating new connection because none are available in the pool
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:321) <10.0.0.10:11210> (HE=0x20c2350) Starting connection on I=0x20c28e0
719ms [I0] {21732} [INFO] (connection - L:441) <10.0.0.10:11210> (SOCK=0x20c2a30) Starting. Timeout=2500000us
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:416) <10.0.0.30:11210> (HE=0x20c3370) Creating new connection because none are available in the pool
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:321) <10.0.0.30:11210> (HE=0x20c3370) Starting connection on I=0x20c39d0
719ms [I0] {21732} [INFO] (connection - L:441) <10.0.0.30:11210> (SOCK=0x20c3b00) Starting. Timeout=2500000us
719ms [I0] {21732} [TRACE] (connection - L:335) <10.0.0.30:11210> (SOCK=0x20c3b00) Received completion handler. Status=0. errno=0
719ms [I0] {21732} [INFO] (connection - L:116) <10.0.0.30:11210> (SOCK=0x20c3b00) Connected
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:271) <10.0.0.30:11210> (HE=0x20c3370) Received result for I=0x20c39d0,C=0x20c3b00; E=0x0
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:223) <10.0.0.30:11210> (HE=0x20c3370) Assigning R=0x20c3320 SOCKET=0x20c3b00
719ms [I0] {21732} [TRACE] (server - L:488) <10.0.0.30:11210> (SRV=0x218bd40) Session not yet negotiated. Negotiating
719ms [I0] {21732} [DEBUG] (ioctx - L:101) <10.0.0.30:11210> (CTX=0x20c4680,unknown) Pairing with SOCK=0x20c3b00
719ms [I0] {21732} [DEBUG] (ioctx - L:151) <10.0.0.30:11210> (CTX=0x20c4680,sasl) Destroying. PND=0,ENT=1,SORC=1
719ms [I0] {21732} [DEBUG] (ioctx - L:101) <10.0.0.30:11210> (CTX=0x20ccc40,unknown) Pairing with SOCK=0x20c3b00
719ms [I0] {21732} [DEBUG] (server - L:507) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Setting initial timeout=2490ms
719ms [I0] {21732} [TRACE] (connection - L:335) <10.0.0.10:11210> (SOCK=0x20c2a30) Received completion handler. Status=0. errno=0
719ms [I0] {21732} [INFO] (connection - L:116) <10.0.0.10:11210> (SOCK=0x20c2a30) Connected
719ms [I0] {21732} [DEBUG] (lcbio_mgr - L:271) <10.0.0.10:11210> (HE=0x20c2350) Received result for I=0x20c28e0,C=0x20c2a30; E=0x0
720ms [I0] {21732} [DEBUG] (lcbio_mgr - L:223) <10.0.0.10:11210> (HE=0x20c2350) Assigning R=0x20be840 SOCKET=0x20c2a30
720ms [I0] {21732} [TRACE] (server - L:488) <10.0.0.10:11210> (SRV=0x20bd780) Session not yet negotiated. Negotiating
720ms [I0] {21732} [DEBUG] (ioctx - L:101) <10.0.0.10:11210> (CTX=0x20c4c20,unknown) Pairing with SOCK=0x20c2a30
727ms [I0] {21732} [DEBUG] (ioctx - L:151) <10.0.0.10:11210> (CTX=0x20c4c20,sasl) Destroying. PND=0,ENT=1,SORC=1
727ms [I0] {21732} [DEBUG] (ioctx - L:101) <10.0.0.10:11210> (CTX=0x20c7a10,unknown) Pairing with SOCK=0x20c2a30
727ms [I0] {21732} [DEBUG] (server - L:507) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Setting initial timeout=2483ms
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185570, opaque=198, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x21855c0, opaque=200, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185610, opaque=201, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185660, opaque=202, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x21856b0, opaque=203, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185700, opaque=204, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185750, opaque=206, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x21857a0, opaque=208, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x21857f0, opaque=210, opcode=0x0) with error 0x17
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185840, opaque=211, opcode=0x0) with error 0x17
... (about 200 more lines like this)
3216ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x2185840, opaque=211, opcode=0x0) with error 0x17
3224ms [I0] {21732} [INFO] (bootstrap - L:191) Not requesting a config refresh because of throttling parameters. Next refresh possible in 6796ms or 99 errors. See LCB_CNTL_CONFDELAY_THRESH and LCB_CNTL_CONFERRTHRESH to modify the throttling settings
3224ms [I0] {21732} [ERROR] (server - L:458) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Server timed out. Some commands have failed
3224ms [I0] {21732} [DEBUG] (server - L:462) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Scheduling next timeout for 0 ms
3224ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20a9680, opaque=124, opcode=0x0) with error 0x17
3224ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20a96d0, opaque=128, opcode=0x0) with error 0x17
3224ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20a9720, opaque=129, opcode=0x0) with error 0x17
3224ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20a9770, opaque=130, opcode=0x0) with error 0x17
3224ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20a97c0, opaque=135, opcode=0x0) with error 0x17
3224ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20a9810, opaque=139, opcode=0x0) with error 0x17
... (about 400 more lines like this)
3243ms [I0] {21732} [WARN] (server - L:375) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Failing command (pkt=0x20c1090, opaque=999, opcode=0x0) with error 0x17
3243ms [I0] {21732} [INFO] (bootstrap - L:191) Not requesting a config refresh because of throttling parameters. Next refresh possible in 6777ms or 98 errors. See LCB_CNTL_CONFDELAY_THRESH and LCB_CNTL_CONFERRTHRESH to modify the throttling settings
3243ms [I0] {21732} [ERROR] (server - L:458) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Server timed out. Some commands have failed
3243ms [I0] {21732} [DEBUG] (server - L:462) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Scheduling next timeout for 2500 ms
3243ms [I0] {21732} [WARN] (server - L:375) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Failing command (pkt=0x20b0f80, opaque=614, opcode=0x0) with error 0x17
... (about 200 lines more)
2016-04-08T18:07:25.839Z - info: PERFORMANCE:: Time to retrieve 1000 docs: 3250ms
2016-04-08T18:07:25.840Z - info: Get Multi was successful!
2016-04-08T18:07:25.840Z - info: ================== ENDING MULTI-GET PERFORMANCE TESTING ====================
3252ms [I0] {21732} [INFO] (bootstrap - L:191) Not requesting a config refresh because of throttling parameters. Next refresh possible in 6768ms or 97 errors. See LCB_CNTL_CONFDELAY_THRESH and LCB_CNTL_CONFERRTHRESH to modify the throttling settings
3252ms [I0] {21732} [ERROR] (server - L:458) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Server timed out. Some commands have failed
3252ms [I0] {21732} [DEBUG] (server - L:462) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Scheduling next timeout for 2500 ms
3252ms [I0] {21732} [WARN] (server - L:624) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Got socket error 0x10
3252ms [I0] {21732} [INFO] (server - L:756) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Got handler after close. Checking pending calls
3252ms [I0] {21732} [DEBUG] (server - L:722) <10.0.0.30:11210> (SRV=0x218bd40,IX=1) Finalizing ctx 0x20ccc40
3252ms [I0] {21732} [DEBUG] (ioctx - L:151) <10.0.0.30:11210> (CTX=0x20ccc40,memcached) Destroying. PND=0,ENT=1,SORC=1
3252ms [I0] {21732} [WARN] (server - L:624) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Got socket error 0x10
3252ms [I0] {21732} [INFO] (server - L:756) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Got handler after close. Checking pending calls
3252ms [I0] {21732} [DEBUG] (server - L:722) <10.0.0.10:11210> (SRV=0x20bd780,IX=0) Finalizing ctx 0x20c7a10
3252ms [I0] {21732} [DEBUG] (ioctx - L:151) <10.0.0.10:11210> (CTX=0x20c7a10,memcached) Destroying. PND=0,ENT=1,SORC=1