CB 7 setup on CentOS7 fails

We have been using older versions of CB succesfully on CentOS 6 and 7 for quite some time.
Now, when I install CB server 7.0.x CE (have tried 7.0.0, 7.0.1 and 7.0.2) on a clean CentOS 7.9 VM, the setup after installation fails.

Starting point is a CentOS 7.9 VM:

  • yum update
  • firewalld stopped
  • THP disabled
  • 20 GB disk
  • ulimit settings applied
  • install couchbase-server-community-7.0.2-centos7.x86_64.rpm

All looks Ok so far:

Complete!
[root@cb7-a ~]# systemctl status couchbase-server
● couchbase-server.service - Couchbase Server
   Loaded: loaded (/usr/lib/systemd/system/couchbase-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-11-23 14:41:07 CET; 43s ago
     Docs: https://docs.couchbase.com
 Main PID: 5276 (beam.smp)
   CGroup: /system.slice/couchbase-server.service
           ├─5276 /opt/couchbase/lib/erlang/erts-10.7.2.7/bin/beam.smp -A 16 -sbwt none -- -root /opt/couchbase/lib/erlang -progname erl -- -home /opt/couchbase -- -smp enable -kernel logger [{handler, default, undefined}] inetr...
           ├─5291 /opt/couchbase/lib/erlang/erts-10.7.2.7/bin/epmd -daemon
           ├─5346 erl_child_setup 200000
           ├─5370 /opt/couchbase/lib/erlang/erts-10.7.2.7/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -sbwt none -MMmcs 30 -e102400 -- -root /opt/couchbase/lib/erlang -progname erl -- -home /opt/couchbase -- -smp enable...
           ├─5392 erl_child_setup 200000
           ├─5415 sh -s disksup
           ├─5417 /opt/couchbase/lib/erlang/lib/os_mon-2.5.1.1/priv/bin/memsup
           ├─5418 /opt/couchbase/lib/erlang/lib/os_mon-2.5.1.1/priv/bin/cpu_sup
           ├─5422 /opt/couchbase/lib/erlang/erts-10.7.2.7/bin/beam.smp -P 327680 -K true -- -root /opt/couchbase/lib/erlang -progname erl -- -home /opt/couchbase -- -pa /opt/couchbase/lib/erlang/lib/asn1-5.0.12/ebin /opt/couchba...
           ├─5429 erl_child_setup 200000
           ├─5454 sh -s disksup
           ├─5455 /opt/couchbase/lib/erlang/lib/os_mon-2.5.1.1/priv/bin/memsup
           ├─5457 /opt/couchbase/lib/erlang/lib/os_mon-2.5.1.1/priv/bin/cpu_sup
           ├─5462 inet_gethost 4
           ├─5463 inet_gethost 4
           ├─5464 /opt/couchbase/bin/priv/godu
           ├─5491 inet_gethost 4
           ├─5492 inet_gethost 4
           ├─5496 /opt/couchbase/bin/saslauthd-port
           ├─5627 /opt/couchbase/bin/goport -graceful-shutdown=false -window-size=524288
           ├─5632 /opt/couchbase/bin/goxdcr -sourceKVAdminPort=8091 -xdcrRestPort=9998 -isEnterprise=false -ipv4=required -ipv6=optional
           ├─5705 sh -s ns_disksup
           ├─5706 /opt/couchbase/bin/priv/godu
           ├─5712 /opt/couchbase/bin/goport -graceful-shutdown=false -window-size=524288
           ├─5717 /opt/couchbase/bin/prometheus --config.file /opt/couchbase/var/lib/couchbase/config/prometheus.yml --web.enable-admin-api --web.enable-lifecycle --storage.tsdb.retention.size 1024MB --storage.tsdb.retention.tim...
           └─5728 portsigar for ns_1@cb.local 5276

But when trying to initialise the CB cluster via the cli, this being the first node, this fails with varying responses:

[root@cb7-a ~]# /opt/couchbase/bin/couchbase-cli cluster-init -c 127.0.0.1 --cluster-username <un> --cluster-password <pwd> --cluster-ramsize=256
ERROR: Unable to connect to host at http://127.0.0.1:8091: HTTPConnectionPool(host='127.0.0.1', port=8091): Max retries exceeded with url: /pools (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd1e01fabe0>: Failed to establish a new connection: [Errno 111] Connection refused'))
[root@cb7-a ~]# /opt/couchbase/bin/couchbase-cli cluster-init -c 127.0.0.1 --cluster-username <un> --cluster-password <pwd> --cluster-ramsize=256 
 ERROR: Internal server error, please retry your request
[root@cb7-a ~]# /opt/couchbase/bin/couchbase-cli cluster-init -c 127.0.0.1 --cluster-username <un> --cluster-password <pwd> --cluster-ramsize=256 
ERROR: Unable to connect to host at http://127.0.0.1:8091: HTTPConnectionPool(host='127.0.0.1', port=8091): Max retries exceeded with url: /settings/stats (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f11c86e47f0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Also when trying via the Web Console, it doesn’t get further than the ‘Couchbase > New Cluster > Configure’ screen, hitting the Save butten sometimes shows an error, other times just stays on the same screen without any feedback about success/failure.

Trying to retrieve the server list results in “unknown pool”:

[root@cb7-a ~]# /opt/couchbase/bin/couchbase-cli server-list -c 127.0.0.1 --username <un> -p <pwd> 
ERROR: "unknown pool"

It’s had to determine the relevant errors from the logs, but /opt/couchbase/var/lib/couchbase/logs/info.log mostly shows lines:

[ns_server:warn,2021-11-23T15:25:46.406+01:00,ns_1@cb.local:memcached_refresh<0.271.0>:ns_memcached:connect:1196]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}.

while /opt/couchbase/var/lib/couchbase/logs/error.log mostly indicates issues with Prometheus:

[ns_server:error,2021-11-23T14:41:22.564+01:00,ns_1@cb.local:<0.867.0>:prometheus:post_async:188]Prometheus http request failed:
URL: http://127.0.0.1:9123/api/v1/query
Body: query=%7Bname%3D~%60kv_curr_items%7Ckv_curr_items_tot%7Ckv_mem_used_bytes%7Ccouch_docs_actual_disk_size%7Ccouch_views_actual_disk_size%7Ckv_ep_db_data_size_bytes%7Ckv_ep_bg_fetched%60%7D+or+kv_vb_curr_items%7Bstate%3D%27replica%27%7D+or+kv_vb_num_non_resident%7Bstate%3D%27active%27%7D+or+label_replace%28sum+by+%28bucket%2C+name%29+%28irate%28kv_ops%7Bop%3D%60get%60%7D%5B1m%5D%29%29%2C+%60name%60%2C%60cmd_get%60%2C+%60%60%2C+%60%60%29+or+label_replace%28irate%28kv_ops%7Bop%3D%60get%60%2Cresult%3D%60hit%60%7D%5B1m%5D%29%2C%60name%60%2C%60get_hits%60%2C%60%60%2C%60%60%29+or+label_replace%28sum+by+%28bucket%29+%28irate%28kv_cmd_lookup%5B1m%5D%29+or+irate%28kv_ops%7Bop%3D~%60set%7Cincr%7Cdecr%7Cdelete%7Cdel_meta%7Cget_meta%7Cset_meta%7Cset_ret_meta%7Cdel_ret_meta%60%7D%5B1m%5D%29%29%2C+%60name%60%2C+%60ops%60%2C+%60%60%2C+%60%60%29+or+sum+by+%28bucket%2C+name%29+%28%7Bname%3D~%60index_data_size%7Cindex_disk_size%7Ccouch_spatial_data_size%7Ccouch_spatial_disk_size%7Ccouch_views_data_size%60%7D%29&timeout=5s
Reason: {failed_connect,[{to_address,{"127.0.0.1",9123}},
                         {inet,[inet],econnrefused}]}

although Prometheus is listening on port 9123:

[root@cb7-a ~]# ss -ntalp
State       Recv-Q Send-Q    Local Address:Port    Peer Address:Port  
LISTEN      0      128           127.0.0.1:21300              *:*      users:(("beam.smp",pid=5422,fd=17))
LISTEN      0      128                   *:22                 *:*      users:(("sshd",pid=885,fd=3))
LISTEN      0      100           127.0.0.1:25                 *:*      users:(("master",pid=1104,fd=13))
LISTEN      0      128                   *:8091               *:*      users:(("beam.smp",pid=5370,fd=53))
LISTEN      0      128                   *:58491              *:*      users:(("rpc.statd",pid=893,fd=8))
LISTEN      0      128                   *:8092               *:*      users:(("beam.smp",pid=5422,fd=27))
LISTEN      0      128                   *:671                *:*      users:(("ypbind",pid=914,fd=7))
LISTEN      0      128           127.0.0.1:9123               *:*      users:(("prometheus",pid=27196,fd=8))
LISTEN      0      128                   *:21100              *:*      users:(("beam.smp",pid=5370,fd=33))
LISTEN      0      128           127.0.0.1:9998               *:*      users:(("goxdcr",pid=27184,fd=11))
LISTEN      0      128                   *:111                *:*      users:(("rpcbind",pid=627,fd=8))
LISTEN      0      128           127.0.0.1:21200              *:*      users:(("beam.smp",pid=5276,fd=17))
LISTEN      0      128                   *:4369               *:*      users:(("epmd",pid=5291,fd=3))
LISTEN      0      64                    *:32915              *:*     
LISTEN      0      128                [::]:22              [::]:*      users:(("sshd",pid=885,fd=4))
LISTEN      0      64                 [::]:38936           [::]:*     
LISTEN      0      100               [::1]:25              [::]:*      users:(("master",pid=1104,fd=14))
LISTEN      0      128                [::]:53255           [::]:*      users:(("rpc.statd",pid=893,fd=10))
LISTEN      0      128                [::]:111             [::]:*      users:(("rpcbind",pid=627,fd=11))
LISTEN      0      128                [::]:4369            [::]:*      users:(("epmd",pid=5291,fd=4))

(Firewalld is disabled and stopped)

As indicated above, same result with 7.0.0, 7.0.1 and 7.0.2 (ofcourse removing the previous cb rpm and /opt/couchbase directory in between). With CB CE 6.6.0 the installation and setup does succeed on this VM.

After a number of hours the started use a lot of CPU and it turned out prometheus had gone in high load.
Note CB still had failed to setup, so traffic as going through CB whatsoever…

top - 09:09:12 up 17:16,  1 user,  load average: 6.72, 6.84, 6.73
Tasks: 136 total,   2 running, 134 sleeping,   0 stopped,   0 zombie
%Cpu(s): 62.3 us,  9.8 sy,  0.0 ni, 27.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8008732 total,  6599820 free,   481352 used,   927560 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7259872 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                      
23414 couchba+  20   0  770964  80284  19136 S 137.5  1.0   0:07.25 prometheus 
 1362 couchba+  20   0 3410580 100292   4704 S 131.2  1.3   1358:43 beam.smp  
23410 couchba+  20   0   10200   5212   1108 S  18.8  0.1   0:00.74 goport         

Just something to confirm: do you have SSE4.2 enabled on the VM?
(There are differing system requirements between 6 and 7: System Resource Requirements | Couchbase Docs )

Ah, indeed SSE4.2 was not enabled in the VM, enabling it fixed it. Thanks!

With such an important (i.e. blocking) change in requirements, it would be good to include a check in the pre-install section of the rpm package. It does check/report on memory, THP and CPU cores (already since CB 3, I think), but not on this new requirement.
(I expect this to fail also on existing platforms at our customers to be upgraded, so we will need to add our own check as well).