I store books in the database, every book might have multiple authors.
In 1.x, I wrote:
var v = db.viewNamed("ViewAuthor")
v.documentType = "book"
v.setMapBlock({ (doc, emit) in
for a in doc["author"] as! [String] {
emit(a, nil)
}
}, version: "1")
How can I create an index for array property in CBL 2.0?. Also, there are other types of document in the database, I want the index only apply on books, not other types.
So I made a small test case to see if I need to index an array property (Tests are running in debug mode and the db is not encrypted):
What I want to do:
Get all the books ID of a specific author.
Get all the authors inside the database.
TestCase1:
I create 10K books in the database, each book have an property called authors (an array of 20 strings). Each book has exact the same authors.
Tests:
For the first task, I create one query for each author. The query (uses ArrayFunction.Contains) will return the ID of all books where authors property contains the specific author.
For the second task, I create one query to just return the authors property from the 10K document, then remove duplicates.
Result:
Each query finished in 50ms - 70ms
Finshed in 70ms
TestCase2:
On top of TestCase1, I create documents of type Relationship to represent the pairing of a book and a author, which results in 20 * 10K more document in the database. And I also created two index:
// I want to use this index to distinguish Book from Relationship.
db.CreateIndex("TypeIndex", IndexBuilder.ValueIndex(
ValueIndexItem.Expression(Expression.Property("type"))
));
// I want the db to use this index to select the right Relationship for me.
db.CreateIndex("RelIndex", IndexBuilder.ValueIndex(
ValueIndexItem.Expression(Expression.Property("type")),
ValueIndexItem.Expression(Expression.Property("author"))
));
Tests:
Test 1 from TestCase1.
Test 2 from TestCase1.
For the first task, I create one query for each author. The query will return all the Relationship document which author matches the specific author. From the Relationship, I will get the book’s ID.
For the second task, I create one query to fetch author from all the Relationship and remove duplicates.
Result:
Each query finished in 90ms - 100ms
Finished in 110ms
Each query finished in ~60ms
Finished in 1 second.
Conclusion:
The ArrayFunction.Contains is fast enough that I don’t need to index the array.
Interesting findings:
If I close and reopen the database after inserting 210K document. Queries seems to be faster ( 90ms drops to 50ms ).
Adding another 200K document to database slow down queries, even if I CreateIndex for type property. (I was hoping this index will make the query only checks the Book document. But it seems the query still need some extra work for the 200K Relationship document)
Thanks for all the interesting info! A note about your last findings: Couchbase will optimize a database on close so that could be the cause of the speed up. Also indexes will check every entry in the DB always. The speed up comes from checking a certain property on a document during querying (more specifically when ordering it I think). Index in 2.0 has a different meaning than 1.x. In 1.x it meant you had control over which documents were added to the map portion of map reduce but with 2.0 it means if you index a certain property certain operations involving that property will become faster.
So what is the purpose of Index if creating a index doesn’t speed up query? (I tried select 20K docs from 40K docs, it costs the same time with or without index. )
Would you provide information about when exactly an Index can speed up the query? Maybe only for very complex query?
The index does speed up the query, just not quite in the way that you think. It doesn’t make plain selects any faster (at least I don’t think so @jens is the expert in this area). The two areas I am pretty sure there is an impact is WHERE clauses involving the indexed property, and ORDER BY involving the indexed property.
Sorry for not being clear enough. The tested query has where clause (the document must meet two condition to be selected), and using query.explain() shows something about using a specific index. Tried both in .net (4d ago) and swift (today), yet I don’t see any big difference. Not sure if 10K docs is enough to test the performance.
My response won’t be explaining your observations but I was curious if you attempted a JOIN query to fetch author details from “relationship” docs that matched the author Ids that you fetched from “books” document and if that made a difference.
If you are looking for an example, you can check out "JOIN Expression with Functions section in this blog
Indexes make a huge difference on queries that need to look up one or more single values from a document. For example, a query for docs of type patient where the age property is between 55 and 80.
The query you’re trying to do involves searching for multiple values — all the items in the author array — per document. The reason 2.0 can’t index that is because the underlying engine (SQLite) simply doesn’t support that type of index. We want to add support for this in the future, but it will involve building our own index system on top of SQLite, basically creating a table to use as an index. This happens to be exactly how we implemented map/reduce in 1.0, so we will likely end up porting over part of the map/reduce code and grafting it into the index and query support.