Couchbase Server DP4.5 ORDER BY performance

Hi

There has been discussion on this forum that DP 4.5 should improve ORDER BY performance in N1QL queries.

Using the index and query below it seems that ORDER BY still slows the query down quite a bit. Actually the value of LIMIT does not seem to matter much if ORDER BY is present. Array indexing introduced in DP4.5 seems to improve performance a great deal btw.

For example the commit below should have improved the ORDER BY performance according to another topic.
http://review.couchbase.org/#/c/57495/5

Is the commit in DP4.5 or are there other improvements coming? It’s hard to tell by looking at couchbase-query repository history since there’s no branch or tag for 4.5. Still the commit has been in master branch since early December so one could expect it to be included. I guess I could also build the latest version from master branch and test it.

Index:

CREATE INDEX ievents ON events (company, DISTINCT ARRAY r.`key` FOR r IN related END, time) WHERE docType = "EVENT";

Query:

SELECT time
FROM events 
WHERE docType = "EVENT" AND company="company_9064af3e-bcf5-4c1f-b783-f723924856bf"
AND (ANY r IN related SATISFIES r.`key` = "item_07139ab1-5bac-48b8-8a92-373de9103edf" END)
ORDER BY time
LIMIT 100;

If ORDER BY is removed from the query the results are still ordered by time probably because of the index. It seems a little odd that ORDER BY still has a great impact. LIMIT value makes a huge difference in performance when there’s no ORDER BY.

Could you please check if there’s something wrong with the index or query and tell the current status of any planned improvements?

BR,
Mikko

Hi Mikko,

The checkin http://review.couchbase.org/#/c/57495/5 improves the sort performance, but not the plan itself.

Right now, for this query, after the index scan, we fetch all the documents, then sort for ORDER BY.
Because the keys for ORDER BY (time) is different from index keys, we’d still need to sort the intermediate resultset, and then apply LIMIT.

In this query/index combination, we could avoid fetching the documents themselves. Look for this improvement in the upcoming refreshes.

Just out of curiosity, what are your data sizes and timings for your query? Thanks.

Thanks, Keshav.

Thanks for reply @keshav_m.

It would be great to get some insight on the planned roadmap for the improvement and I understand there may be no accurate plans at this point. By the upcoming refreshes do you mean 4.5 or something later? Also any estimate of the schedule would be appreciated (spring, summer, later this year, next year…).

I was running the queries on my development system and actually CB is running in a VM. My setup is definitely not optimal in performance sense but the difference is clearly there. I had something like 30 000 event documents in the bucket and nothing else. The query matched about 2 000 documents. Without ORDER BY it took about 200 - 300 ms and the ORDER BY query about 1.5 s. Later I generated more data and when the query matched more results the performance hit of ORDER BY got more severe which makes perfect sense to me now.

In production queries we probably want to narrow down the matching results anyway. A better ORDER BY performance would still definitely help. Also we would probably want to get full documents but I’d expect that if ORDER BY can utilize index data not every document will have to be fetched as long as LIMIT is there.

BR,
Mikko

Hi @keshav_m and others,

DP 4.5 seems to contain important N1QL performance improvements, thanks for your efforts!

However this ORDER BY issue is still worrying me and my understanding is that it affects very many N1QL users. I think it is quite typical that quite a big bunch of documents match some query criteria entered by application user, and results should be ordered (e.g. according to a timestamp) and usually also limited/paged.

Like @mikkogy asked above, it would be nice to hear when might we expect to see improvements in this issue?

Kind regards,
Taneli

Hi @tanelih,

These issues have been fixed. Look for the new beta in early April. If you urgently need a build before that, we can look into it.

Thanks,
Gerald

Great news!

I will try with the forthcoming beta.

Thank you very much,
Taneli

Hi @taneli , I face the same problem as you , I tried 4.1 CE and 4.5 EE and still the order by increase the query execution time a lot , can you please tell me if the fix @geraldss talk about has helped you or you get the same difference when apply order by ?

Hi @serbanalexandru94,

At least the original issue I asked about was fixed in 4.5 beta. Hopefully this helps at all.

BR, Mikko