I am trying to create view indices for my data filters.
Let’s say my data filters are ‘by country’, ‘by gender’, and 'date.'
And note, ‘by gender’ can have ‘all’ option.
METHOD 1
view1 - ‘by gender’ all option
[“country”, 20150503]
view2
[“country”, “male”, 20150503]
METHOD 2
view1 - Having “all” for ‘by gender’ option which will increase the total index size
[“country”, “all”, 20150503]
It looks like METHOD 2 will create the larger index tree and longer time to lookup, but instead, it will be easy to manage the view.
Here I say the example with 3 options, but my application actually has 9 different options.
So my question is which METHOD would be more efficient?
The answer most depends on how you want to query the data. The compound key views work best if the data is hierarchical since you can’t filter each section separately. This is why something like [country, state, county, city] works well but [country, name] doesn’t. If you wanted to find everyone with a particular name, you’d have to search all countries.
I’d recommend you read through this doc on views.
@ingenthr Then, it would be very ineffective if the compound key length is long and each element is not hierarchical to each other.
For example,
[country, gender, first_name, last_name, age, height]
It seems like longer the compound key is more difficult to query the intended data.
If I want to do like…
SELECT * FROM the_compound_key WHERE first_name = 'John' AND age = 18;
In this case, I can only think of query from the compound key multiple times and remove the duplicates.
1 Like
Yes, you’re correct here. Sometimes you’ll build with multiple views over the same data and do the union of the views or have one view you query multiple times and do the union or difference. There are some use cases where this works out great-- typically the aggregation kinds of cases-- but others where it requires a lot of work to get to what you describe in the SELECT statement.
It’s worth noting that in order to have that SELECT statement execute quickly, it too will require multiple indexes (which are like views). One on first_name and one on age. SQL will “fail back” to just scanning all of the data though. It’s not exactly the same, but the steps are sort of the same if you’re implementing it in views. The burden is higher on the application code for sure-- this could be good or bad.
Also note that N1QL (which is in our 4.0 dev preview) gives you something closer to what you’re describing there without the typical relational database tradeoffs of needing to have a fixed schema and not being able to scale. You can define indexes on fields in the JSON documents.
1 Like