There has been discussion on this forum that DP 4.5 should improve ORDER BY performance in N1QL queries.
Using the index and query below it seems that ORDER BY still slows the query down quite a bit. Actually the value of LIMIT does not seem to matter much if ORDER BY is present. Array indexing introduced in DP4.5 seems to improve performance a great deal btw.
Is the commit in DP4.5 or are there other improvements coming? It’s hard to tell by looking at couchbase-query repository history since there’s no branch or tag for 4.5. Still the commit has been in master branch since early December so one could expect it to be included. I guess I could also build the latest version from master branch and test it.
Index:
CREATE INDEX ievents ON events (company, DISTINCT ARRAY r.`key` FOR r IN related END, time) WHERE docType = "EVENT";
Query:
SELECT time
FROM events
WHERE docType = "EVENT" AND company="company_9064af3e-bcf5-4c1f-b783-f723924856bf"
AND (ANY r IN related SATISFIES r.`key` = "item_07139ab1-5bac-48b8-8a92-373de9103edf" END)
ORDER BY time
LIMIT 100;
If ORDER BY is removed from the query the results are still ordered by time probably because of the index. It seems a little odd that ORDER BY still has a great impact. LIMIT value makes a huge difference in performance when there’s no ORDER BY.
Could you please check if there’s something wrong with the index or query and tell the current status of any planned improvements?
Right now, for this query, after the index scan, we fetch all the documents, then sort for ORDER BY.
Because the keys for ORDER BY (time) is different from index keys, we’d still need to sort the intermediate resultset, and then apply LIMIT.
In this query/index combination, we could avoid fetching the documents themselves. Look for this improvement in the upcoming refreshes.
Just out of curiosity, what are your data sizes and timings for your query? Thanks.
It would be great to get some insight on the planned roadmap for the improvement and I understand there may be no accurate plans at this point. By the upcoming refreshes do you mean 4.5 or something later? Also any estimate of the schedule would be appreciated (spring, summer, later this year, next year…).
I was running the queries on my development system and actually CB is running in a VM. My setup is definitely not optimal in performance sense but the difference is clearly there. I had something like 30 000 event documents in the bucket and nothing else. The query matched about 2 000 documents. Without ORDER BY it took about 200 - 300 ms and the ORDER BY query about 1.5 s. Later I generated more data and when the query matched more results the performance hit of ORDER BY got more severe which makes perfect sense to me now.
In production queries we probably want to narrow down the matching results anyway. A better ORDER BY performance would still definitely help. Also we would probably want to get full documents but I’d expect that if ORDER BY can utilize index data not every document will have to be fetched as long as LIMIT is there.
DP 4.5 seems to contain important N1QL performance improvements, thanks for your efforts!
However this ORDER BY issue is still worrying me and my understanding is that it affects very many N1QL users. I think it is quite typical that quite a big bunch of documents match some query criteria entered by application user, and results should be ordered (e.g. according to a timestamp) and usually also limited/paged.
Like @mikkogy asked above, it would be nice to hear when might we expect to see improvements in this issue?
Hi @taneli , I face the same problem as you , I tried 4.1 CE and 4.5 EE and still the order by increase the query execution time a lot , can you please tell me if the fix @geraldss talk about has helped you or you get the same difference when apply order by ?