Issues with cluster and index creation

So I set up a cluster to play with. Still in test environment - but now I have tried to migrate all my production data (from another database platform) into CB.

I’m using CB Community Edition 5.1.1. Running on CentOS 7.5. First I thought this issue was related to the Java SDK I’m using (as that is where I get the error) - but I found out I could reproduce the issue using the admin console…

The cluster consists of two nodes: db1.domain.dk and db2.domain.dk. If both servers are running then all is good. However, one of the reasons for having a cluster is that it can run even though one of the servers is “gone”. So I stopped the couchbase server on db2 - and then problems started. My migration code checks for indexes - and create them if they don’t exist. However, creating the index fails when one of the cluster nodes is down.

And I can even proof this by just running the query:

CREATE INDEX `def_type` ON `data`(`type`)

in the admin console of the cluster server that is running. The query takes forever - or rather: errors | elapsed: 600032.00ms | execution: 600032.00ms | count: | size: 0 - and the response is:

{
  "status": "Gateway Timeout",
  "status_detail": "The query workbench only supports queries running for 600 seconds. This value can be changed in the preferences dialog. You can also use cbq from the command-line for longer running queries. Certain DML queries, such as index creation, will continue in the background despite the user interface timeout."
}

This is similar to the stacktraces I get from the SDK. I put them in here for your information (but without the code) - just in case they may provide some other relevant info:

Caused by: javax.faces.FacesException: Can't instantiate class: 'dk.dtu.aqua.catchlog.bean.MigrateBean'.. null
	at com.sun.faces.config.ManagedBeanFactory.newInstance(ManagedBeanFactory.java:234)
	at com.sun.faces.application.ApplicationAssociate.createAndMaybeStoreManagedBeans(ApplicationAssociate.java:291)
	... 58 more
Caused by: java.security.PrivilegedActionException: java.lang.ClassNotFoundException: class dk.dtu.aqua.catchlog.bean.MigrateBean : java.lang.RuntimeException: java.util.concurrent.TimeoutException: {"b":"data","r":"192.168.42.211:8093","s":"n1ql","c":"0B0184A12A9528A0/FFFFFFFF89DFEC8B","t":75000000,"i":"d653fae1-2c64-4b4d-a10e-786c93b81bda","l":"192.168.42.226:3900"}
	at java.security.AccessController.doPrivileged(AccessController.java:698)
	at com.sun.faces.config.ManagedBeanFactory.newInstance(ManagedBeanFactory.java:216)
	... 59 more
Caused by: java.lang.ClassNotFoundException: class dk.dtu.aqua.catchlog.bean.MigrateBean : java.lang.RuntimeException: java.util.concurrent.TimeoutException: {"b":"data","r":"192.168.42.211:8093","s":"n1ql","c":"0B0184A12A9528A0/FFFFFFFF89DFEC8B","t":75000000,"i":"d653fae1-2c64-4b4d-a10e-786c93b81bda","l":"192.168.42.226:3900"}
	at java.beans.Beans.instantiate(Beans.java:244)
	at java.beans.Beans.instantiate(Beans.java:88)
	at com.sun.faces.config.ManagedBeanFactory$1.run(ManagedBeanFactory.java:222)
	at java.security.AccessController.doPrivileged(AccessController.java:694)
	... 60 more
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: {"b":"data","r":"192.168.42.211:8093","s":"n1ql","c":"0B0184A12A9528A0/FFFFFFFF89DFEC8B","t":75000000,"i":"d653fae1-2c64-4b4d-a10e-786c93b81bda","l":"192.168.42.226:3900"}
	at rx.exceptions.Exceptions.propagate(Exceptions.java:57)
	at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:463)
	at rx.observables.BlockingObservable.single(BlockingObservable.java:340)
	at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:634)
	at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:556)
	at dk.dtu.aqua.catchlog.base.BaseCouchbaseDAO.rawQuery(BaseCouchbaseDAO.java:536)
	at dk.dtu.aqua.catchlog.bean.MigrateBean.createIndex(MigrateBean.java:157)
	at dk.dtu.aqua.catchlog.bean.MigrateBean.initIndexes(MigrateBean.java:168)
	at dk.dtu.aqua.catchlog.bean.MigrateBean.(MigrateBean.java:110)
	at java.lang.J9VMInternals.newInstanceImpl(Native Method)
	at java.lang.Class.newInstance(Class.java:1762)
	at java.beans.Beans.instantiate(Beans.java:240)
	... 63 more
Caused by: java.util.concurrent.TimeoutException: {"b":"data","r":"192.168.42.211:8093","s":"n1ql","c":"0B0184A12A9528A0/FFFFFFFF89DFEC8B","t":75000000,"i":"d653fae1-2c64-4b4d-a10e-786c93b81bda","l":"192.168.42.226:3900"}
	at com.couchbase.client.java.bucket.api.Utils$1.call(Utils.java:131)
	at com.couchbase.client.java.bucket.api.Utils$1.call(Utils.java:127)
	at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)
	at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber.onTimeout(OnSubscribeTimeoutTimedWithFallback.java:166)
	at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber$TimeoutTask.call(OnSubscribeTimeoutTimedWithFallback.java:191)
	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
	at java.util.concurrent.FutureTask.run(FutureTask.java:277)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:191)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(Thread.java:811)

So my questions are:

  1. Could there be anything in the configuration/setup that causes this?
  2. Should I not expect to be able to run a cluster with Community Edition???
  3. How else can I resolve this issue?

Ok, it seems I have to confirm the failover to the db1 node - and then it started working.

Then I started the db2 server again - and I now see the option to “Rebalance”. Once clicking that and confirming it - db2 node disappeared. I guess it is re-sync’ing the cluster - but the server itself is not busy at all…

Edit
Well, it seems to be “working as designed” - just me not having read that bit :wink: https://docs.couchbase.com/server/5.5/install/deployment-considerations-lt-3nodes.html

Not sure why, but I ended up having to add the server to the cluster again…

Will let it have some time to see if it gets back up and running. Right now it has state:

New node | Not taking traffic | ADD pending rebalance

I guess that is Ok. We’ll see… :wink:

Edit:
Ok, got tired of waiting - and pressed the [Rebalance] button. It seems to kick off some work… :wink:
… so I guess that is also a manual task?