We have a query which is running slower than expected or if I may say slower than accepted. When I look at the explain plan, its the hashjoin that is taking 50-70% of the time. When I click on it, I see kernTime taking the most.
Is there anything that I can do OR is it that kernTime means that the server CPU is taking time?
There are 4 datasets in a bucket that are being joined. 2 datasets have 4k & 5k records, the other 2 have about 100 records. Fetch & IndexScan is not taking time. Is there a way that I can reduce the time taken by hashjoin?
Build phase of Hash Join might taken time. You can reduce the time by using right index. Also make sure Build side produces less rows and probe side produces more rows.
In Query Work Bench, After execution of query post output from Plan Text Tab.
Understand. But is there a way to make it (build phase) go faster? More docs will make the build phase take longer, correct? Can parallelism help?
Right index - to reduce the number of docs for the hash build, correct? We used a covering index but it did not help. On the contrary, a surprising observation was that when we dropped the covering index, it worked faster.
Ensured that build side produces less rows.
PFA the plan text. btw, the entire execution time is 4 sec, still I see multiple phases in the plan text showing 4 sec. Is that because the times are inclusive of the previous operation/phase? planText-1.txt.zip (3.2 KB)
I re-looked at the query and the plan and made 2 changes:
(1) re-introduced covering index for 3 of the 4 aliases. That reduced the time. So I take back my comment that covering indexes did not help.
(2) Found that hashjoin was having too many docs to build the table. So moved some of the filter conditions from WHERE clause to the JOIN clause and that helped reduce the docs going into the hash table and that reduced the time further. But for one alias, inspite of moving the filter conditions to the JOIN clause, Plan showed that IndexScan3 returned entire dataset. There were no spans. Why would that be? Could it be because its a RIGHT OUTER JOIN? Could it be a bug? Could it be due to inappropriate leading columns of the index?
So then I moved filter criteria for all aliases to the WHERE clause and now I can see that IndexScan3 has spans for all and number of documents returned are as expected.
Where should the filter criteria be? Alongwith the JOIN condition or as a part of the WHERE clause?
btw, I was looking at the Plan in QWB and all this while I thought that the colours were based on time taken by that step. But I realized it based on cpu. Orange means more cpu (not time) than green. Is that correct?
For an OUTER join, a filter in the ON-clause and a filter in the WHERE clause has different semantics. An ON-clause filter is evaluated as part of the join itself, and if the evaluation fails (i.e. returns false/null/missing) then the join returns the document from just the dominant side (similar to NULL-extended row for OUTER join in relational databases); in contrast, a WHERE clause filter is evaluated after all the joins are done, and if the evaluation fails (false/null/missing) the entire result document is discarded. Thus in an OUTER join if you move a filter from WHERE clause to ON clause (or vice versa) it changes the semantics of the query and it may not return the same result set.
Due to this difference the filters available to index scans may be different in an OUTER join query depending on whether it is specified in the ON clause vs. the WHERE clause.
To be sure, would it be correct to say that: for INNER JOINs, it does not matter whether the filter is a part of join i.e. ON or predicate i.e. WHERE clause. Both will have the same plan and perform the same.
the above is same as what bingjie.miao explained, correct?
By this you mean to say that we can change the left-hand-side to right-hand-side and vice versa to change the join from LEFT-outer to RIGHT-outer, correct? If yes, I guess we used LEFT OUTER join because of the below limitation:
A RIGHT OUTER join is only supported when it’s the only join in the query; or when it’s the first join in a chain of joins.
For example, if you have something like: SELECT ... FROM keyspace1 ks1 LEFT JOIN keyspace2 ks2 ON ks1.c1 = ks2.c2 AND ks1.type = "type1"
and you have an index on keyspace1: CREATE INDEX ix1_keyspace1 ON keyspace1(type)
you cannot use this index since the predicate on ks1.type is specified in the ON clause of a left outer join, meaning even if the predicate is not satisfies we still need to return the ks1 document due to semantics of the left outer join, thus we cannot use the index to filter out documents from keyspace1.
In contrast, if the query is: SELECT ... FROM keyspace1 ks1 LEFT JOIN keyspace2 ks2 ON ks1.c1 = ks2.c2 WHERE ks1.type = "type1"
since the predicate on ks1 is specified in the WHERE clause we can now use the above index for accessing keyspace1.