There seems to be plenty of bug reports for the infamous cluster object was closed
issue around here for the Node SDK… I just wanted to chime in on this thread with my own report.
Context
- Couchbase (6.6.0 and 6.6.1), installed via the kubernetes operator
- Node SDK 3.0.4 to 3.1.1 - tried all versions, used with intra-cluster DNS with the recommended connection string (
my-cluster-srv
DNS service)
- ~4 pods with the same nodejs app connected to the cluster, and 2 sync gateway pods
Behaviour
Things seem to work fine for a few minutes, until suddenly on one given container (but NOT the others), the logs (activated with DEBUG=couchnode:lcb:error
) are flooded with these errors:
2021-01-21T07:46:53.722Z couchnode:lcb:error (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:187) <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_TIMEOUT (201)
2021-01-21T07:46:53.726Z couchnode:lcb:error (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:187) <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_TIMEOUT (201)
2021-01-21T07:46:53.730Z couchnode:lcb:error (cccp @ ../deps/lcb/src/bucketconfig/bc_cccp.cc:187) <NOHOST:NOPORT> (CTX=(nil),) Could not get configuration: LCB_ERR_TIMEOUT (201)
Until it all comes to a stop with this error:
FATAL ERROR:
libcouchbase experienced an unrecoverable error and terminates the program
to avoid undefined behavior.
The program should have generated a "corefile" which may used
to gather more information about the problem.
If your system doesn't create "corefiles" I can tell you that the
assertion failed in ../deps/lcb/src/mcserver/negotiate.cc at line 50
This does not crash the container, but it seems to make it hang somehow, since there’s no more logs (including application logs), and the port that the app listens to becomes unresponsive, causing my livenessProbe
to fail, and kubernetes to eventually kill and restart the container.
Other pods seem to do fine at the same time but will also randomly fail in the same way.
Sync Gateway is fine all along.
Diags:
Couchbase UI is fine.
sdkdoctor
never seems to complain:
|====================================================================|
| ___ ___ _ __ ___ ___ ___ _____ ___ ___ |
| / __| \| |/ /__| \ / _ \ / __|_ _/ _ \| _ \ |
| \__ \ |) | ' <___| |) | (_) | (__ | || (_) | / |
| |___/___/|_|\_\ |___/ \___/ \___| |_| \___/|_|_\ |
| |
|====================================================================|
Note: Diagnostics can only provide accurate results when your cluster
is in a stable state. Active rebalancing and other cluster configuration
changes can cause the output of the doctor to be inconsistent or in the
worst cases, completely incorrect.
08:54:32.016 INFO ▶ Parsing connection string `couchbase://oaf-couchbase-srv.default.svc.cluster.local/fs-bucket-v0`
08:54:32.016 INFO ▶ Connection string was parsed as a potential DNS SRV record
08:54:32.020 INFO ▶ Connection string identifies the following CCCP endpoints:
08:54:32.020 INFO ▶ 1. 10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local:11210
08:54:32.020 INFO ▶ 2. 10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local:11210
08:54:32.020 INFO ▶ 3. 10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local:11210
08:54:32.020 INFO ▶ Connection string identifies the following HTTP endpoints:
08:54:32.020 INFO ▶ Connection string specifies bucket `fs-bucket-v0`
08:54:32.027 WARN ▶ The hostname specified in your connection string resolves both for SRV records, as well as A records. This is not suggested as later DNS configuration changes could cause the wrong servers to be contacted
08:54:32.027 INFO ▶ Performing DNS lookup for host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local`
08:54:32.029 INFO ▶ Bootstrap host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local` refers to a server with the address `10.32.0.19`
08:54:32.030 INFO ▶ Performing DNS lookup for host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local`
08:54:32.031 INFO ▶ Bootstrap host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local` refers to a server with the address `10.36.0.7`
08:54:32.032 INFO ▶ Performing DNS lookup for host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local`
08:54:32.034 INFO ▶ Bootstrap host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local` refers to a server with the address `10.35.0.39`
08:54:32.034 INFO ▶ Attempting to connect to cluster via CCCP
08:54:32.035 INFO ▶ Attempting to fetch config via cccp from `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local:11210`
08:54:32.042 INFO ▶ Attempting to fetch config via cccp from `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local:11210`
08:54:32.050 INFO ▶ Attempting to fetch config via cccp from `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local:11210`
08:54:32.054 WARN ▶ Bootstrap host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0005.oaf-couchbase.default.svc`. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
08:54:32.054 WARN ▶ Bootstrap host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0003.oaf-couchbase.default.svc`. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
08:54:32.054 WARN ▶ Bootstrap host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0004.oaf-couchbase.default.svc`. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
08:54:32.054 INFO ▶ Selected the following network type: external
08:54:32.054 INFO ▶ Identified the following nodes:
08:54:32.054 INFO ▶ [0] 95.216.208.78
08:54:32.054 INFO ▶ mgmtSSL: 30971, eventingAdminPort: 30535, mgmt: 31351
08:54:32.054 INFO ▶ n1ql: 30386, fts: 30561, eventingSSL: 31810
08:54:32.054 INFO ▶ cbas: 30104, capi: 30103, kv: 31941
08:54:32.054 INFO ▶ kvSSL: 31297, capiSSL: 32655, n1qlSSL: 30074
08:54:32.054 INFO ▶ ftsSSL: 31779, cbasSSL: 32761
08:54:32.054 INFO ▶ [1] 95.217.218.135
08:54:32.054 INFO ▶ eventingSSL: 30673, n1qlSSL: 31678, kvSSL: 31871
08:54:32.054 INFO ▶ capiSSL: 30075, n1ql: 30863, cbas: 31413
08:54:32.054 INFO ▶ cbasSSL: 30705, mgmtSSL: 30953, ftsSSL: 30924
08:54:32.054 INFO ▶ kv: 32210, capi: 30896, fts: 30585
08:54:32.054 INFO ▶ eventingAdminPort: 31922, mgmt: 31705
08:54:32.054 INFO ▶ [2] 135.181.30.248
08:54:32.054 INFO ▶ n1ql: 32549, eventingAdminPort: 32752, eventingSSL: 31661
08:54:32.054 INFO ▶ capiSSL: 32329, kv: 31872, capi: 30976
08:54:32.054 INFO ▶ n1qlSSL: 32370, fts: 30763, cbasSSL: 31852
08:54:32.054 INFO ▶ mgmt: 32453, kvSSL: 30068, ftsSSL: 32228
08:54:32.054 INFO ▶ mgmtSSL: 32355, cbas: 30578
08:54:32.054 INFO ▶ Fetching config from `http://95.216.208.78:31351`
08:54:32.090 INFO ▶ Received cluster configuration, nodes list:
[
{
"addressFamily": "inet",
"alternateAddresses": {
"external": {
"hostname": "95.216.208.78",
"ports": {
"capi": 30103,
"capiSSL": 32655,
"kv": 31941,
"mgmt": 31351,
"mgmtSSL": 30971
}
}
},
"clusterCompatibility": 393222,
"clusterMembership": "active",
"configuredHostname": "oaf-couchbase-0003.oaf-couchbase.default.svc:8091",
"couchApiBase": "http://oaf-couchbase-0003.oaf-couchbase.default.svc:8092/",
"couchApiBaseHTTPS": "https://oaf-couchbase-0003.oaf-couchbase.default.svc:18092/",
"cpuCount": 8,
"externalListeners": [
{
"afamily": "inet",
"nodeEncryption": false
},
{
"afamily": "inet6",
"nodeEncryption": false
}
],
"hostname": "oaf-couchbase-0003.oaf-couchbase.default.svc:8091",
"interestingStats": {
"cmd_get": 0,
"couch_docs_actual_disk_size": 4752118309,
"couch_docs_data_size": 3594756877,
"couch_spatial_data_size": 0,
"couch_spatial_disk_size": 0,
"couch_views_actual_disk_size": 12589014,
"couch_views_data_size": 12589014,
"curr_items": 1091813,
"curr_items_tot": 2185369,
"ep_bg_fetched": 0,
"get_hits": 0,
"mem_used": 1808067560,
"ops": 0,
"vb_active_num_non_resident": 656502,
"vb_replica_curr_items": 1093556
},
"mcdMemoryAllocated": 25088,
"mcdMemoryReserved": 25088,
"memoryFree": 13474230272,
"memoryTotal": 32884228096,
"nodeEncryption": false,
"nodeUUID": "bccd30747f9e69e0269c24020361c680",
"os": "x86_64-unknown-linux-gnu",
"otpNode": "ns_1@oaf-couchbase-0003.oaf-couchbase.default.svc",
"ports": {
"direct": 11210,
"distTCP": 21100,
"distTLS": 21150,
"httpsCAPI": 18092,
"httpsMgmt": 18091
},
"recoveryType": "none",
"services": [
"cbas",
"eventing",
"fts",
"index",
"kv",
"n1ql"
],
"status": "healthy",
"systemStats": {
"allocstall": 0,
"cpu_cores_available": 8,
"cpu_stolen_rate": 0,
"cpu_utilization_rate": 31.35483870967742,
"mem_free": 13474230272,
"mem_limit": 32884228096,
"mem_total": 32884228096,
"swap_total": 0,
"swap_used": 0
},
"thisNode": true,
"uptime": "44946",
"version": "6.6.1-9213-enterprise"
},
{
"addressFamily": "inet",
"alternateAddresses": {
"external": {
"hostname": "95.217.218.135",
"ports": {
"capi": 30896,
"capiSSL": 30075,
"kv": 32210,
"mgmt": 31705,
"mgmtSSL": 30953
}
}
},
"clusterCompatibility": 393222,
"clusterMembership": "active",
"configuredHostname": "oaf-couchbase-0004.oaf-couchbase.default.svc:8091",
"couchApiBase": "http://oaf-couchbase-0004.oaf-couchbase.default.svc:8092/",
"couchApiBaseHTTPS": "https://oaf-couchbase-0004.oaf-couchbase.default.svc:18092/",
"cpuCount": 8,
"externalListeners": [
{
"afamily": "inet",
"nodeEncryption": false
},
{
"afamily": "inet6",
"nodeEncryption": false
}
],
"hostname": "oaf-couchbase-0004.oaf-couchbase.default.svc:8091",
"interestingStats": {
"cmd_get": 0,
"couch_docs_actual_disk_size": 4599740551,
"couch_docs_data_size": 3572644462,
"couch_spatial_data_size": 0,
"couch_spatial_disk_size": 0,
"couch_views_actual_disk_size": 11864147,
"couch_views_data_size": 11864147,
"curr_items": 1091273,
"curr_items_tot": 2181978,
"ep_bg_fetched": 0,
"get_hits": 0,
"mem_used": 1846524952,
"ops": 0,
"vb_active_num_non_resident": 640880,
"vb_replica_curr_items": 1090705
},
"mcdMemoryAllocated": 25088,
"mcdMemoryReserved": 25088,
"memoryFree": 9966014464,
"memoryTotal": 32884191232,
"nodeEncryption": false,
"nodeUUID": "00583abf725fca65006ff32e80185f0c",
"os": "x86_64-unknown-linux-gnu",
"otpNode": "ns_1@oaf-couchbase-0004.oaf-couchbase.default.svc",
"ports": {
"direct": 11210,
"distTCP": 21100,
"distTLS": 21150,
"httpsCAPI": 18092,
"httpsMgmt": 18091
},
"recoveryType": "none",
"services": [
"cbas",
"eventing",
"fts",
"index",
"kv",
"n1ql"
],
"status": "healthy",
"systemStats": {
"allocstall": 0,
"cpu_cores_available": 8,
"cpu_stolen_rate": 0,
"cpu_utilization_rate": 76.33289986996098,
"mem_free": 9966014464,
"mem_limit": 32884191232,
"mem_total": 32884191232,
"swap_total": 0,
"swap_used": 0
},
"uptime": "44130",
"version": "6.6.1-9213-enterprise"
},
{
"addressFamily": "inet",
"alternateAddresses": {
"external": {
"hostname": "135.181.30.248",
"ports": {
"capi": 30976,
"capiSSL": 32329,
"kv": 31872,
"mgmt": 32453,
"mgmtSSL": 32355
}
}
},
"clusterCompatibility": 393222,
"clusterMembership": "active",
"configuredHostname": "oaf-couchbase-0005.oaf-couchbase.default.svc:8091",
"couchApiBase": "http://oaf-couchbase-0005.oaf-couchbase.default.svc:8092/",
"couchApiBaseHTTPS": "https://oaf-couchbase-0005.oaf-couchbase.default.svc:18092/",
"cpuCount": 8,
"externalListeners": [
{
"afamily": "inet",
"nodeEncryption": false
},
{
"afamily": "inet6",
"nodeEncryption": false
}
],
"hostname": "oaf-couchbase-0005.oaf-couchbase.default.svc:8091",
"interestingStats": {
"cmd_get": 0,
"couch_docs_actual_disk_size": 4690227191,
"couch_docs_data_size": 3548986919,
"couch_spatial_data_size": 0,
"couch_spatial_disk_size": 0,
"couch_views_actual_disk_size": 12224473,
"couch_views_data_size": 12224473,
"curr_items": 1090141,
"curr_items_tot": 2179107,
"ep_bg_fetched": 0,
"get_hits": 0,
"mem_used": 1873142368,
"ops": 0,
"vb_active_num_non_resident": 739779,
"vb_replica_curr_items": 1088966
},
"mcdMemoryAllocated": 25088,
"mcdMemoryReserved": 25088,
"memoryFree": 20557443072,
"memoryTotal": 32884228096,
"nodeEncryption": false,
"nodeUUID": "ae669a001fa9bf0f31524b8c5aef9195",
"os": "x86_64-unknown-linux-gnu",
"otpNode": "ns_1@oaf-couchbase-0005.oaf-couchbase.default.svc",
"ports": {
"direct": 11210,
"distTCP": 21100,
"distTLS": 21150,
"httpsCAPI": 18092,
"httpsMgmt": 18091
},
"recoveryType": "none",
"services": [
"cbas",
"eventing",
"fts",
"index",
"kv",
"n1ql"
],
"status": "healthy",
"systemStats": {
"allocstall": 0,
"cpu_cores_available": 8,
"cpu_stolen_rate": 0,
"cpu_utilization_rate": 36.88946015424165,
"mem_free": 20557443072,
"mem_limit": 32884228096,
"mem_total": 32884228096,
"swap_total": 0,
"swap_used": 0
},
"uptime": "42851",
"version": "6.6.1-9213-enterprise"
}
]
08:54:32.093 INFO ▶ Successfully connected to Key Value service at `95.216.208.78:31941`
08:54:32.099 INFO ▶ Successfully connected to Management service at `95.216.208.78:31351`
08:54:32.103 INFO ▶ Successfully connected to Views service at `95.216.208.78:30103`
08:54:32.105 INFO ▶ Successfully connected to Query service at `95.216.208.78:30386`
08:54:32.106 INFO ▶ Successfully connected to Search service at `95.216.208.78:30561`
08:54:32.108 INFO ▶ Successfully connected to Analytics service at `95.216.208.78:30104`
08:54:32.109 INFO ▶ Successfully connected to Key Value service at `95.217.218.135:32210`
08:54:32.118 INFO ▶ Successfully connected to Management service at `95.217.218.135:31705`
08:54:32.119 INFO ▶ Successfully connected to Views service at `95.217.218.135:30896`
08:54:32.121 INFO ▶ Successfully connected to Query service at `95.217.218.135:30863`
08:54:32.121 INFO ▶ Successfully connected to Search service at `95.217.218.135:30585`
08:54:32.124 INFO ▶ Successfully connected to Analytics service at `95.217.218.135:31413`
08:54:32.131 INFO ▶ Successfully connected to Key Value service at `135.181.30.248:31872`
08:54:32.137 INFO ▶ Successfully connected to Management service at `135.181.30.248:32453`
08:54:32.142 INFO ▶ Successfully connected to Views service at `135.181.30.248:30976`
08:54:32.144 INFO ▶ Successfully connected to Query service at `135.181.30.248:32549`
08:54:32.149 INFO ▶ Successfully connected to Search service at `135.181.30.248:30763`
08:54:32.155 INFO ▶ Successfully connected to Analytics service at `135.181.30.248:30578`
08:54:32.163 INFO ▶ Memd Nop Pinged `95.216.208.78:31941` 10 times, 0 errors, 0ms min, 1ms max, 0ms mean
08:54:32.169 INFO ▶ Memd Nop Pinged `95.217.218.135:32210` 10 times, 0 errors, 0ms min, 0ms max, 0ms mean
08:54:32.182 INFO ▶ Memd Nop Pinged `135.181.30.248:31872` 10 times, 0 errors, 0ms min, 1ms max, 0ms mean
08:54:32.182 INFO ▶ Diagnostics completed
Summary:
[WARN] The hostname specified in your connection string resolves both for SRV records, as well as A records. This is not suggested as later DNS configuration changes could cause the wrong servers to be contacted
[WARN] Bootstrap host `10-36-0-7.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0005.oaf-couchbase.default.svc`. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
[WARN] Bootstrap host `10-32-0-19.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0003.oaf-couchbase.default.svc`. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
[WARN] Bootstrap host `10-35-0-39.oaf-couchbase-srv.default.svc.cluster.local` is not using the canonical node hostname of `oaf-couchbase-0004.oaf-couchbase.default.svc`. This is not neccessarily an error, but has been known to result in strange and challenging to diagnose errors when DNS entries are reconfigured.
Found multiple issues, see listing above.
I haven’t tried downgrading to 2.x - not sure it works with Typescript…