N1ql 4.5 order by performance

fabrizio.ruggeri · October 31, 2016, 12:42pm

HI,
I know this question as alredy been posted but I cannot find a good answer.
I have this query

SELECT * FROM store WHERE _type=“Product” ORDER BY updated LIMIT 30

which take 4.8s with 70k products.
If I remove the order by clause the time is ~9ms

I have this indexes:
CREATE INDEX ALL__type ON store(_type)
CREATE INDEX ALL__updated ON store(updated)
CREATE INDEX ALL__updated2 ON store(_type,updated)

and this is the EXPLAIN result
[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__type”,
“index_id”: “1cefee2f50af8100”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“Product”"
],
“Inclusion”: 3,
“Low”: [
"“Product”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((store._type) = “Product”)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” ORDER BY updated LIMIT 30”
}
]
How can I improve the performance of this query with the order by? Have I created the right indexes?

geraldss · October 31, 2016, 3:20pm

Add to your query WHERE clause.

AND updated IS NOT NULL

fabrizio.ruggeri · October 31, 2016, 3:25pm

So, I changed the query to be

SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated DESC LIMIT 30

The time is always ~4s
Ah, all the documents have “updated” with a value

Here the explain:

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__updated2”,
“index_id”: “95395d770a48a8a4”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(“Product”)”
],
“Inclusion”: 0,
“Low”: [
"“Product”",
“null”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((store._type) = “Product”) and ((store.updated) is not null))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated DESC LIMIT 30”
}
]

geraldss · October 31, 2016, 3:36pm

Ah, you want descending. How long does it take with

ORDER BY updated ASC

We currently have a limitation with DESC. We have a temporary workaround.

fabrizio.ruggeri · October 31, 2016, 3:38pm

Ah yes, I didn’t realize I sent you the version with DESC. By the way, nothing change if I put ASC, ~3.90s (do you want explain?).

Thank you

geraldss · October 31, 2016, 5:39pm

Yes, EXPLAIN please.

fabrizio.ruggeri · November 2, 2016, 8:53am

Here the explain for
EXPLAIN SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated ASC LIMIT 30
which took ~4s

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__updated2”,
“index_id”: “95395d770a48a8a4”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(“Product”)”
],
“Inclusion”: 0,
“Low”: [
"“Product”",
“null”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((store._type) = “Product”) and ((store.updated) is not null))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated ASC LIMIT 30”
}
]

geraldss · November 2, 2016, 5:06pm

Use the following index and query. How long does this take.

CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product';

SELECT *
FROM store USE INDEX ( Product__updated )
WHERE _type = 'Product' AND updated IS NOT NULL
ORDER BY updated ASC LIMIT 30;

fabrizio.ruggeri · November 2, 2016, 5:19pm

Ok, I set it up and the query works, it take some ms.
I discovered this query works too:

SELECT *
FROM store USE INDEX ( ALL__updated )
WHERE _type="Product"
AND updated IS NOT NULL
ORDER BY updated ASC LIMIT 30;

fabrizio.ruggeri · November 2, 2016, 5:26pm

So:

why I need “updated IS NOT NULL”?
why have I to specify explicitly the index to use?
why doesn’t it work with DESC?

geraldss · November 2, 2016, 6:14pm

(1) If a document does not contain “updated”, the index will not contain it.

(2) You don’t have to. But then the query will intersect all matching indexes, which will be slower.

(3) You can use DESC. In that case, the query will use the index to match the results, but it cannot use the index to sort, so the query will perform a sort.

We have a workaround for sort involving -MILLIS( ).

fabrizio.ruggeri · November 3, 2016, 8:21am

(1) If a document does not contain “updated”, the index will not contain it.

Yes, I understand. But why is the query slower if I do not include “and updated is not null”?

Can you point me to an explanation of the workaround please?

Thank you so much for the support.

geraldss · November 3, 2016, 2:55pm

Because in that case, the query cannot and does not use the index. That is the meaning of point (1).

bguerout · November 15, 2016, 9:39pm

Hello,
Thanks for your answer.

Can you explain why Couchbase is not able to use Product__updated index when updated IS NOT NULL condition is missing ? What is happening behind the scene ?

We can simply copy paste this query but we would like to understand what happen to be able to write other queries by ourselves.

SELECT * FROM store USE INDEX ( Product__updated ) WHERE _type="Product" AND updated IS NOT NULL ORDER BY updated LIMIT 100

geraldss · November 15, 2016, 10:40pm

Sure. The index only contains documents that contain the “updated” field.

Therefore, in order to use the index, the query must only return documents that contain the “updated” field.

In the query, you can also use

updated IS NOT MISSING

yves · November 16, 2016, 10:39am

Hi @geraldss

Using an index with IS NOT NULL or IS NOT MISSING may help, but that means the document has the attribute (the “updated” attribute here).

This is forcing us to add an “updated” attribute to every document in the bucket (if we want to use the index).

In our case, we want to sort the “updated” documents, including the ones that have never been updated (so, those don’t include the “updated” attribute).

Thus, using an index implies that every document has every attributes used by the index.

In Mongo they differentiate Sparse index and Non-Sparse indexes:

Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is “sparse” because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.

In other words: the explicit index creation means CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product' AND updated IS NOT MISSING;

Is there a way to create an index like: CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product' AND updated CAN BE MISSING;

Thanks for your help on the matter.

geraldss · November 16, 2016, 6:12pm

Hi @yves,

Good question. I did not realize that some of your documents do not have the updated attribute. Yes, you can handle this with N1QL. Let me propose something.

Topic		Replies	Views
N1QL is very slow with "order by" clause SQL++ n1ql	4	2291	September 11, 2016
ORDER BY Clause is causing performance degradation Couchbase Server n1ql , index	3	765	June 9, 2020
"Order by " making the N1QL Query slow SQL++ query , n1ql	2	749	October 10, 2018
Couchbase Server DP4.5 ORDER BY performance SQL++	7	3476	August 19, 2016
N1QL query default order with offset and limit SQL++ n1ql , index	2	656	June 26, 2023

N1ql 4.5 order by performance

Related topics