N1ql 4.5 order by performance

HI,
I know this question as alredy been posted but I cannot find a good answer.
I have this query

SELECT * FROM store WHERE _type=“Product” ORDER BY updated LIMIT 30

which take 4.8s with 70k products.
If I remove the order by clause the time is ~9ms

I have this indexes:
CREATE INDEX ALL__type ON store(_type)
CREATE INDEX ALL__updated ON store(updated)
CREATE INDEX ALL__updated2 ON store(_type,updated)

and this is the EXPLAIN result
[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__type”,
“index_id”: “1cefee2f50af8100”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“Product”"
],
“Inclusion”: 3,
“Low”: [
"“Product”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((store._type) = “Product”)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” ORDER BY updated LIMIT 30”
}
]
How can I improve the performance of this query with the order by? Have I created the right indexes?

Add to your query WHERE clause.

AND updated IS NOT NULL

So, I changed the query to be

SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated DESC LIMIT 30

The time is always ~4s
Ah, all the documents have “updated” with a value

Here the explain:

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__updated2”,
“index_id”: “95395d770a48a8a4”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(“Product”)”
],
“Inclusion”: 0,
“Low”: [
"“Product”",
“null”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((store._type) = “Product”) and ((store.updated) is not null))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated DESC LIMIT 30”
}
]

Ah, you want descending. How long does it take with

ORDER BY updated ASC

We currently have a limitation with DESC. We have a temporary workaround.

Ah yes, I didn’t realize I sent you the version with DESC. By the way, nothing change if I put ASC, ~3.90s (do you want explain?).

Thank you

Yes, EXPLAIN please.

Here the explain for
EXPLAIN SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated ASC LIMIT 30
which took ~4s

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “ALL__updated2”,
“index_id”: “95395d770a48a8a4”,
“keyspace”: “store”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(“Product”)”
],
“Inclusion”: 0,
“Low”: [
"“Product”",
“null”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “store”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((store._type) = “Product”) and ((store.updated) is not null))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “30”,
“sort_terms”: [
{
“expr”: “(store.updated)”
}
]
},
{
"#operator": “Limit”,
“expr”: “30”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT * FROM store WHERE _type=“Product” AND updated IS NOT NULL ORDER BY updated ASC LIMIT 30”
}
]

Use the following index and query. How long does this take.

CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product';

SELECT *
FROM store USE INDEX ( Product__updated )
WHERE _type = 'Product' AND updated IS NOT NULL
ORDER BY updated ASC LIMIT 30;

Ok, I set it up and the query works, it take some ms.
I discovered this query works too:

SELECT *
FROM store USE INDEX ( ALL__updated )
WHERE _type="Product"
AND updated IS NOT NULL
ORDER BY updated ASC LIMIT 30;

1 Like

So:

  • why I need “updated IS NOT NULL”?
  • why have I to specify explicitly the index to use?
  • why doesn’t it work with DESC?

(1) If a document does not contain “updated”, the index will not contain it.

(2) You don’t have to. But then the query will intersect all matching indexes, which will be slower.

(3) You can use DESC. In that case, the query will use the index to match the results, but it cannot use the index to sort, so the query will perform a sort.

We have a workaround for sort involving -MILLIS( ).

(1) If a document does not contain “updated”, the index will not contain it.

Yes, I understand. But why is the query slower if I do not include “and updated is not null”?

Can you point me to an explanation of the workaround please?

Thank you so much for the support.

Because in that case, the query cannot and does not use the index. That is the meaning of point (1).

Hello,
Thanks for your answer.

Can you explain why Couchbase is not able to use Product__updated index when updated IS NOT NULL condition is missing ? What is happening behind the scene ?

We can simply copy paste this query but we would like to understand what happen to be able to write other queries by ourselves.

SELECT * FROM store USE INDEX ( Product__updated ) WHERE _type="Product" AND updated IS NOT NULL ORDER BY updated LIMIT 100

Sure. The index only contains documents that contain the “updated” field.

Therefore, in order to use the index, the query must only return documents that contain the “updated” field.

In the query, you can also use

updated IS NOT MISSING

Hi @geraldss

Using an index with IS NOT NULL or IS NOT MISSING may help, but that means the document has the attribute (the “updated” attribute here).

This is forcing us to add an “updated” attribute to every document in the bucket (if we want to use the index).

In our case, we want to sort the “updated” documents, including the ones that have never been updated (so, those don’t include the “updated” attribute).

Thus, using an index implies that every document has every attributes used by the index.

In Mongo they differentiate Sparse index and Non-Sparse indexes:

Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is “sparse” because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.

In other words: the explicit index creation means CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product' AND updated IS NOT MISSING;

Is there a way to create an index like: CREATE INDEX Product__updated ON store( updated ) WHERE _type = 'Product' AND updated CAN BE MISSING;

Thanks for your help on the matter.

Hi @yves,

Good question. I did not realize that some of your documents do not have the updated attribute. Yes, you can handle this with N1QL. Let me propose something.