HI,
I know this question as alredy been posted but I cannot find a good answer.
I have this query
SELECT * FROM store WHERE _type=“Product” ORDER BY updated LIMIT 30
which take 4.8s with 70k products.
If I remove the order by clause the time is ~9ms
I have this indexes:
CREATE INDEX ALL__type ON store(_type)
CREATE INDEX ALL__updated ON store(updated)
CREATE INDEX ALL__updated2 ON store(_type,updated)
and this is the EXPLAIN result
[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__type”,
“index_id”: “1cefee2f50af8100”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“Product”"
],
“Inclusion”: 3,
“Low”: [
"“Product”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((store._type) = “Product”)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” ORDER BY updated LIMIT 30”
}
]
How can I improve the performance of this query with the order by? Have I created the right indexes?
Use the following index and query. How long does this take.
CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product';
SELECT *
FROM store USE INDEX ( Product__updated )
WHERE _type = 'Product' AND updated IS NOT NULL
ORDER BY updated ASC LIMIT 30;
(1) If a document does not contain “updated”, the index will not contain it.
(2) You don’t have to. But then the query will intersect all matching indexes, which will be slower.
(3) You can use DESC. In that case, the query will use the index to match the results, but it cannot use the index to sort, so the query will perform a sort.
We have a workaround for sort involving -MILLIS( ).
Can you explain why Couchbase is not able to use Product__updated index when updated IS NOT NULL condition is missing ? What is happening behind the scene ?
We can simply copy paste this query but we would like to understand what happen to be able to write other queries by ourselves.
SELECT * FROM store USE INDEX ( Product__updated ) WHERE _type="Product" AND updated IS NOT NULL ORDER BY updated LIMIT 100
Using an index with IS NOT NULL or IS NOT MISSING may help, but that means the document has the attribute (the “updated” attribute here).
This is forcing us to add an “updated” attribute to every document in the bucket (if we want to use the index).
In our case, we want to sort the “updated” documents, including the ones that have never been updated (so, those don’t include the “updated” attribute).
Thus, using an index implies that every document has every attributes used by the index.
In Mongo they differentiate Sparse index and Non-Sparse indexes:
Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is “sparse” because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.
In other words: the explicit index creation means CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product' AND updated IS NOT MISSING;
Is there a way to create an index like: CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product' AND updated CAN BE MISSING;
Good question. I did not realize that some of your documents do not have the updated attribute. Yes, you can handle this with N1QL. Let me propose something.