We’re super excited to announce that Couchbase Server 4.5 developer preview (DP) has been released! This is an early developer milestone release that comes with many awesome improvements for more efficient querying, advanced data access, powerful indexing, and more comprehensive administration.
Some of the notable features of this DP release include integrated query workbench, integrated full-text search, sub-document API, array indexing, extended N1QL join syntax, and more. This milestone on the way to the GA release also includes more platform coverage than before. We’ve introduced Debian 8 builds and this adds to the existing wide list of platforms we already run on (Windows, Mac OS X, Ubuntu, SuSE, Red Hat, Debian and even Oracle Linux).
as always, we will ship this as enterprise edition first and within a few months CE version will be available for 4.5. Enterprise edition (EE) is available for all for development and testing for free.
You likely already know this but incase it isn’t clear to others, EE is available through subscription for production deployment and when you need support in test and developer environments. CE is available for any type of deployment.
thanks
-cihan
Have you gathered any information you can share at this point concerning the performance impact when doing a sub-document update, as compared to updating a whole document? Would it be essentially like doing N1QL as opposed to using the API?
As is often the case with performance questions, the answer is “it depends”
As an extreme example, if you have a 20MByte document then to change a single byte field in that document would previously require you to fetch the whole thing (20MB), change the 1 byte locally, then send back 20MB - so a total of 40MB sent, or 320Mbit. On a 1Gbit link that’s going to take 320/1000 or ~320 milliseconds minimum, excluding any network latency (you need 2-round-trips), and excluding time on either end to process the data.
With subdoc, you only need to send the request to change that 1 byte - something like REPLACE("key", "path.to.my.field", "new_value") - which is less than 100 bytes on the wire - so of the order of 100/1000000000 or 0.1 microseconds - i.e. over 3 million times faster (!)
Now obviously that’s not a real-world example - I’ve ignored any cost of parsing / processing the request, and so it won’t really be 3 million times faster - also most people don’t have 20MB documents. However the larger point remains - there’s cost to sending data over a network, so if you can reduce how much you send (and have to receive) you can realise both bandwidth reduction and latency improvements.
There’s some general information on the Sub-document API about the suitability of subdoc - the takeaway is probably the following quote:
In general, sub-document API is a good fit for applications where network bandwidth is at a premium, and at least one of the following is true:
* The document being operated on is not very small.
* The fragment being requested/modified is a small fraction of the total document size.
Hi Dennis,
I’m the product manager for both the Couchbase ES connector and for FTS.
The most important difference is that FTS is currently DP and it will only go GA once it meets our internal performance KPIs. We don’t expect FTS to hit those KPIs in time for Couchbase Server 4.5, so when Couchbase Server is GA, FTS will be shipped with it but will remain as a DP feature.
To your question:
Both systems can handle the use cases you mentioned and handle them in similar ways. When it comes to scoring, FTS uses TF/IDF scoring and supports query time boosting. FTS scoring will be very familiar to most people who have used Solr/ES/Lucene. ES has more possibilities to customize its scoring, although that’s unlikely to be super significant in case you’re mostly using prefix and fuzzy search.
ES is a powerful and mature search product, so there are things ES can do that FTS doesn’t do, and more things that you can customize with ES. A full comparison is quite long because even though FTS is a new component for us, it’s been in the works for some time and has a lot of functionality in it too. Some obvious things ES has that Couchbase does not: support for the ELK stack (Elasticsearch / Logstache / Kibana), percolation, a query DSL, manual control of sharding are a few that spring to mind.
With FTS we’ve tried to stick to straightforward and common search use cases and make those easy. Easy in this context means not requiring a customer to set up and manage another system, manage transport between systems, secure those systems, and distribute and replicate indexes between systems (note that distributed indexes were not shipped in the Cb 4.5 DP that you can get today). Ease of management is the main advantage that FTS might have over other systems when you just look at the search features.
Is a sub-document mutation replicated to other nodes as a sub-document mutation or does it result in a copy of the full document, like how append() currently works?
@czajkowski Hello, how safe is it to use 4.5 right now? Is there a chance to corrupt data? If whatever reason we want to go back to 4.1 stable version, is it easy?
Hi @moon0326
4.5 is a preview as of now and we do not recommend placing valueable data in the preview. There are other limitations with the preview.
We are stabilizing 4.5 quickly and will have a refresh in the next few weeks but we won’t make it generally available until late Q2.
thanks
-cihan
I have always wondered, is the version on Github on par with your internal development or it’s only on par with the community edition on the download page?
Hi There @moon0326,
The new part of this in 4.5 is the at_plus consistency flag you can use in the API with N1QL.
Here is the capability in short: with request_plus you achieve consistency at least up to the point of your requests “timestamp” - I make a request at 10.00 AM (t3 below), the index have to catch up to that moment. With the new atplus consistency, (the sample isn’t clearly showing this so we’ll correct this), you get to instead, wait only up to the moment of your last update timestamp. So if your setting updated its last key at 9.58 AM (t1 below), we can run your query as soon as indexer has indexed all updates up to 9.58. The 2 seconds (or 20ms) can make a huge amount of difference in performance if you have a system with ongoing mutations. In the picture below t represents time and time flows from t1 to t3 and so on.