I defined 3 node cluster without auto fail.
When i shut down one server and tried to make query in the other server on the cluster, i saw that the query was now working, till i click on fail over on the server that was down.
As i see it, this is not suppose to be like that, in production when users do a lot of actions in the site and suddenly 1 server stop working, the data for the db is stopping working.
If you want a speedy recovery in the event of a node failure, then you must rely on auto failover. The replica is there so you don’t lose data during a node failure and to quickly make the document available after a fail over. The replica can be accessed via the client SDKs, but only using a document ID lookup, not a N1QL query.
In your scenario the Query Service will ask the Index Service for the relevant documents which it then retrieves from the Data Service. If the Data Service node with the primary or active copy of the document is down, the query service must fail the query. There’s no mechanism, that I know of, that will cause the Query Service to fallback to replicas if it cannot get the active copy.
If you use a covering index and a covered query, you can avoid going to the Data Service because the index holds all the data needed for the query.
hi @jkurtz, thanks you for your response.
I think my question was not clear enough, i will try to explain again.
All i want to do is to setup 2 or 3 servers in case one of them will die , the db will continue work as normal.
Thats it.
The problem, when i setup the server and shutdown one of them and then tried to make any query from one of the online server (through the dashboard), i get error and it prevent from the db to continue work as normal.
The question:
How can i allow the server continue to work as normal even one of the server is down?
I can see why you would want the system to support N1QL queries while a node is down, but it does not work that way. See N1QL replica read for resiliency?.
With a three node cluster you can configure the server for auto-failover so the downtime is reduced to as little as 30 seconds. That can be lower in the upcoming 5.0 release.
So what does facebook or google or other big organziation do if one of their db server is down?
All the traffic of the users in world stop working for 30 seconds ?!?, this is not sound sense.
Couchbase data service is a CP system, favoring consistency over Availability (in AP systems).
Hence, during the failover, data will be unavailable and the same is true for query when it has to access data owned by the node being failed over.
Some couchbase customers create multiple clusters, replicate data from one to the other, failover the application if one of the nodes is down. This may be expensive, but it’s one way to handle in CP systems like Couchbase.