I’ve a use case where I store ~100m events (daily) gathered from client applications. Each event takes up to ~1.2 KB together with its metadata, so that makes up to a minimum of 115 GB disk space. The events are immutable and the planned retention will be varied from 6 months to a year which is impossible for me to afford with this amount of disk usage. There are also lots of indices and views defined for different needs.
I also stored same amount of events into ElasticSearch with the highest compression rate and its disk usage is ~15gb. (1.3b events = 170 gb)
I know that Couchbase uses SNAPPY compression library for storing documents. Is there a setting to increase its compression ratio? Do you have any suggestions besides trimming down event documents (shorten keys and values)?
Sorry I’m just going to chime in here and ask, so are you saying ElasticSearch and its compression is better than Couchbase and its compression on disk? I’m trying to decide if I should have our developers use the CB FTS features and ditch ElasticSearch or not.
Both 5.5 and 6.0 have significant improvements in disk space used with FTS indexes. In particular, look into “scorch” which is available for FTS index definitions in 5.5. It goes beyond simple snappy compression into new data structures optimized for a number of cases (see: vellum, the go library).
Maybe @tyler.mitchell can chime in with some other resources.
I’ve several different document types stored in the same bucket and have some use cases where each document is queried by different fields. In some cases, the user decides which fields to query by. Therefore I had to add multiple indexes and views per document type which may have increased the overall disk storage. I don’t think FTS would cover this scenario so I had to store copies of different documents on Elasticsearch just to query the way I want.