Query performance ORDER BY

With just 20k documents, the following statement takes over 9 seconds to execute:

SELECT * FROM bucket WHERE type='apple' ORDER BY date_created

Indexes exist for type and date_created and every document has a value for date_created

But the following statement takes just 300ms to execute:

SELECT * FROM bucket WHERE type='apple' AND date_created IS NOT NULL ORDER BY date_created

I’m quite confused. If every document has a date_created, then why does filtering to WHERE date_created IS NOT NULL accomplish any performance improvement at all, much less the extreme difference shown here.

Will I truly be required to append a WHERE condition that checks for null any time I ORDER BY?

1 Like

It comes down to index selection; indexes are only selected based on filters - so without a filter on date_created you’ll find the expected index isn’t used to order the results (likely using an index on type only). (You could also use date_created IS NOT MISSING, the commonly recommended dummy filter).

(Ref: Selecting Indexes | Couchbase Docs - “Couchbase Server attempts to select an appropriate secondary index for a query, based on the filters in the WHERE clause.”)

You should be able to observe the change in plan if you EXPLAIN the two statements. (Ref: EXPLAIN | Couchbase Docs )#

HTH.

1 Like

If you have index

CREATE INDEX ix1 ON bucket(type,date_created);
CREATE INDEX ix11 ON bucket(type, date_created) WHERE type = "apple";

It will work.

UPSERT INTO bucket VALUES("f01", {"date_created":"2022-05-27","type":"apple"});
UPSERT INTO bucket VALUES("f02", { "type":"apple"});
CREATE INDEX ix1 ON bucket(type,date_created);
CREATE INDEX ix2 ON bucket(date_created) WHERE type = "apple";

Couchbase secondary index will not index the document when leading index key evaluate to MISSING (This is due to schema less and bucket can have unrelated documents and index size can ballon).
ix1 will have “f01”, “f02”
ix2 will have f01 only because above restriction date_created MISSING in f02

SELECT * FROM bucket WHERE type=‘apple’ ORDER BY date_created
According to query semantics it must return date_created value MISSING too. So ix2 will not qualify service query (to avoid wrong results). As you mentioned you have date_created in all documents but no way query knows. That is reason you need additional predicate of your intentions.

Checkout MB-30249

1 Like

Thanks for the info.

Here are my tests, and they are QUITE confusing:

date_created IS NOT NULL - 500ms
date_created IS VALUED - 9 seconds
date_created IS NOT MISSING - 9 seconds

My understanding is that IS VALUED is functionally equivalent to IS NOT NULL, yet they perform completely differently. What gives?

In each case, have you checked the EXPLAIN output for plan (index selection) changes?

I don’t really understand how to interpret the EXPLAIN output, but there are a couple of differences between date_created IS NOT NULL and date_created IS VALUED:

date_created IS NOT NULL produces:
OrderedIntersectScan (on idx_date_created)
while
date_created IS VALUED produces:
IntersectScan (on idx_date_created), but then lists the "Order" operator later, referencing the date_created column.

I’m assuming that the OrderedIntersectScan results in fast performance, while the plain IntersectScan does not, but again, I was under the impression from the docs that IS NOT NULL and IS VALUED mean the same thing and should therefore produce identical results with identical performance.

Thanks again for the input

If you’re able to post the [explain] plans we can take a look.

Sounds like a different index choice - hence the change from OrderedIntersectScan to IntersectScan; have you considered indexes recommended by the index advisor ? (Ref: ADVISE | Couchbase Docs | index-advisor.couchbase.com/)

IS VALUED isn’t exactly the same as IS NOT NULL. IS VALUED is the equivalent of (IS NOT NULL AND IS NOT MISSING). Specifically “Value is neither missing nor NULL” from Comparison Operators | Couchbase Docs

Side note: For best performance you’d typically be looking to avoid intersect scans:

"In general, the preference of scan is:

Covering Index
Index Scan
Intersect Scan
Union Scan, and finally
Fetch"

HTH.

Use composite index that can be used filters(if possible which can use index order) avoid intersectscans by providing index hints.