Couchbase Server 4.0 introduced Global Secondary Indexes (GSI) to support the N1QL query engine. Now, with Couchbase Server 5.0 (as of the April developer build), we have added the ability to automatically replicate and rebalance these indexes.
At first, GSIs were managed very manually:
- To ensure High Availability, the end-user manually configured multiple indexes with the same definition and specified which nodes of the cluster they should be deployed on. If one node failed, the end-user had to manually re-create the index that was lost.
- When adding and removing nodes either for scaling out, upgrading or after a failure, the end-user had to manually move and recreate the indexes on the nodes that were coming and going.
I’m extremely proud to report that both of these key operational tasks are now automatically built-in to Couchbase Server and I will walk you through them here.
GSI Replication
One of the most frequent requests that we heard from customers after the introduction of GSIs in 4.0 was for automatic management of replication: “I just want to tell you how many replicas and have you figure it out”
Now you can:
1 |
CREATE INDEX `idx_airportname` ON `travel-sample`(`airportname`) WITH {"num_replica":1}; |
With one simple command, Couchbase will:
- Create two identical indexes
- Deploy and build those indexes on separate nodes
- Honor the Rack-Zone Awareness configuration
If not enough nodes are available, you are quickly presented with an error:
1 2 3 4 5 6 7 |
[ { "code": 5000, "msg": "GSI CreateIndex() - cause: Fails to create index. There are not enough indexer nodes to create index with replica count of 2", "query_from_user": "CREATE INDEX `idx_airportname` ON `travel-sample`(`airportname`) WITH {\"num_replica\":2};" } ] |
Pretty cool eh?
When using “num_replica”, you are relying on Couchbase to determine the proper index layout within the cluster. For even more fine-grained control, specify the exact nodes you wish a single index to be replicated across:
1 |
CREATE INDEX `idx_airportname` ON `travel-sample`(`airportname`) WITH {"nodes":["ec2-35-167-251-49.us-west-2.compute.amazonaws.com:8091", "ec2-54-69-121-55.us-west-2.compute.amazonaws.com:8091"]}; |
Voila!
These indexes will now be automatically load balanced by the query engine and any failure (so long as at least one matching index remains) will not cause any disruption to the queries.
GSI Rebalancing
It’s not enough just to have automatic replica creation, we also needed to add automatic management in the form of rebalancing.
Again, as of the April developer build of Couchbase Server 5.0, the index service is now “rebalance aware”:
- When an index node is removed from the cluster, its indexes will be redistributed to the remaining nodes automatically. However, adding a new index node to the cluster will not cause the indexes to be re-distributed. This is by design to prevent excess movement of indexes and also to preserve the placement that an administrator may have specified. This redistribution is something that might be added in a later release if needed.
- If an index node fails and is replaced, the appropriate replicas will be re-created. Similarly, those index replicas are re-created on the remaining nodes of a cluster if a rebalance is performed without replacing the failed node.
I encourage you to give this all a try and find that it “just works”! Click here to download Couchbase Server.
Q1:
WITH {"num_replica":1};
does this mean that this index will have one replica?
Q2: and CB cluster should have 2 at least Index Service node?
Q3: how about if CB cluster have more than 2 Index Service node? How Index Service choose the node (if one node have more than one service(index service/data service/ query service))?
Q4: will this 2 index works as Master-Slave or Master-Master mode?
Q5: if index works as Master-Master mode, what algorithm does query service choose index? random algorithm or more complicated algorithm?
Q6: if I want to add more index replica for scaling out, should I re-create index with another
num_replica
value?
hey @atom992, great questions! I’ll answer them here and see what makes sense to add back into the main blog:
1 –
"num_replica"
indicates how many _additional_ copies of the index you want created (similar to how the data service works). So
WITH {"num_replica":1};
will create two copies of the index you are requesting.
2 – Yes, the cluster needs to have at least that number of index services nodes. If you don’t have enough nodes, the command will return an error immediately.
3 – With more nodes, Couchbase will automatically choose which nodes to place the index and its replica(s) on. At the moment it only takes into account that a replica can’t be on the same node as the other index, and also takes into account the Rack-Zone Awareness configuration of the nodes to ensure an index and its replica(s) are spread across different groups as well. If you want to be more specific about where the indexes are placed, the administrator can use
WITH {"nodes": [node1,node2]}
4 – All indexes and their replicas are master-master.
5 – The algorithm for load balancing between replicas is constantly improving. In the first release it was only round-robin, lately we have added some latency heuristics so that if one index is slower than another it will get used less. In the future this will continue to improve, so I’m hesitant to describe _exactly_ what it is today. This should ideally be transparent to you.
6 – Yes, today you’ll need to re-create the index with a higher
num_replica
and then drop the previous one (a
DROP
deletes an index and all its replicas so no need to manage that manually). We are working on an
ALTER_INDEX
command but that may not quite be ready with 5.0 so I didn’t write about it here.
Hope that helps answer your questions, please let me know if there’s anything else you would like more information about.
is there a release date for couchbase 5.0 ?
Hey Greg! We’re working on the Beta right now, and are usually able to get to GA a few months after that so we should definitely be there by the end of the year and likely quite a bit before that.
Will we be able to adjust the number of replicas after index creation? A use case would be extending a big MOI with an additional field. Because index RAM could be too small for the new extended index and the old index at the same time, we used this workflow in such situations:
– deleted the “manual” index replica on the first index node. The other index node with the second copy is still able to answer the index requests.
– Built the new index on the first index node. After completion, this node is able to answer the requests to this index.
– delete & rebuild the index on the second node.
Without being able to adjust the number of replicas we would have to accept a down time in these situations or provide much spare RAM we don’t actually need.
Hi Klaus, the first iteration of this enhancement does not allow for modifying an index after creation. However, we are working on an ALTER INDEX statement for a next release that will allow you to do this.
You can still leverage much tighter manual control of your indexes, even if just temporarily when performing this sort of maintenance/expansion.
Let’s say you have one index with a replica and you want to migrate/change it. You “could” create a new temporary index without a replica, delete the old one, then create yet another new one with the replica and drop the temp one. I admit it’s not particularly elegant, but it does keep your application up and running while also not requiring 2x the amount of RAM.
[…] With Couchbase 5.0 Beta, you can not only create indexes to speed up and scale queries to a new level, but also enjoy better index availability, and manageability. Just specify the number of index replicas to create, and the system will dynamically manage the placements of the index replicas across different nodes, server groups, and availability zones. Couchbase Server 5.0 also brings support for rebalancing indexes without any system downtime. For more information, see here. […]