Indexing and Search on Huge Bucket

jproyo · May 2, 2018, 2:09pm

Hi Community,

We are facing a challenge regarding how to handle huge amount of data on Couchbase.
Basically we are going to register statistics everyday in a specific bucket, which are going to be 30 millions of documents per month. Over these stats we are writing several views with stale OK strategy in order to not stress out Couchbase engines but these views are going to be to summarize, count and group data in a very specific way.

On the other hand we are going to need specific SQL like searches on these documents. What i mean by specific SQL like searches is to be able to search for a couple of documents on this millions or billions of documents based on some criteria for example date range, type, name, etc.

We have been exploring some approaches like Apache Spark to connect to Couchbase and handle all this data with this kind of tool but we are not 100% sure this is the most accurate way to solve it in Couchbase.

Is there another way to handle this kind of query on huge buckets as i am describing above?
Could it be possible to setup a couple of N1QL secondary indexes on billions document’s bucket without stressing out Couchbase engines and everything continuing working well?

Best,

ssmotra · May 2, 2018, 10:17pm

Hi,

Couchbase Analytics, which is in Developer Preview, would be a great fit for the use case you’ve described above. It is designed to allow ad-hoc querying of data in a Couchbase cluster without impacting the operational workloads.

You can download the 5.5. beta to try the analytics service. We have a simple tutorial to get you started.

I am happy to chat more to give you an overview and show you a demo of the Analytics service.

Thanks,
Sachin

mmarmol · May 3, 2018, 2:00pm

Anyway it can be done on the community version? As Juan is stating we are saving 1M records per day. So we need to query ideally over 1B records.
Records are from different clients, on each query we search for a subset of a client on a particular date range. Map-reduce for general stats works, but looking individual records on a 1B looks like not doable.

papa-n1ql · May 3, 2018, 10:23pm

N1QL and GSI will work well for your secondary lookups and range scans, even across very large datasets. You won’t need analytics for that.

jproyo · May 4, 2018, 6:35am

Thank you for the response.

Does this also apply for Community Edition which doesnt support Multidimensional Scaling? We are thinking in search in several billions of records.

Thanks,

Topic		Replies	Views
N1QL: performance and advanced features SQL++	3	1762	September 6, 2016
Index building optimization Couchbase Server	4	1424	October 7, 2016
How does Couchbase perform with aggregations? Couchbase Server	4	3644	April 20, 2017
When use Couchbase Views instead of N1QL? SQL++	6	8305	October 8, 2015
Need Help In Indexing In Couchbase Couchbase Server n1ql	2	1800	December 10, 2015

Indexing and Search on Huge Bucket

Related topics