Hi,
I would like to use CB4 for a big data project contains 10 Billion documents in a bucket.
Each document contains up to 100 name value integers, 1 thumbnail image (20KB) and 1 binary file (1KB).
I mainely need to filter document by date, time and other integer properties (no free text search needed).
I use .NET and Java SDKs.
My questions:
- Is couchbase the right technology?
- can CB4 hold such amount of documents in a single bucket?
- How many nodes do I need (without replication)?
Thanks
Oren
What’s your expected ‘active’ dataset and can you tolerate higher latencies? (Resident ratio less than 100%, some cache misses.)
Can you tolerate negative lookups going to disk? (Value vs Full ejection, higher latency and IO utilization)
What’s the insert/update/index rate?
Technically a single couchbase node with large enough storage and using full ejection can “hold” the data… it’s all about how you’re going to use it…
Thanks for the response. Some additional info since I’m getting bad perfomance.
I have 2M documents, each contains 30 properties.
I use a single node on a strong machine with 40 logical CPU and 32GB RAM.
When filter by time range and 6 additional properties the query took more then 1 minutes.
I have all relevant properties indexed (GSI)
The N1QL looks somthing like:
Pasted image732x55 4.25 KB
and it return 100 documents after 1 minutes.
How can I improve it? On SQL Server it takes 5-10 seconds
SELECT id, time from suspectentity where time >= ‘2016-01-19T09:00’ and time >= ‘2016-01-19T11:00’ and channelId=12 and metadata.faceHat=true and metadata.faceBeard=false and metadata.clothingShirtColor=1 and metadata.clothingPantsColor=1 LIMIT 10000;
I have no experience with n1ql but most likely you’ll need to also add what your indexes and views look like for any expert to help diagnose this.
If I was to guess though, I’d say your indexes is/are not defined properly for your need.