I am runnning a very simple ANSI join query, but I am getting different results using nested loop and hash join
Nested loop join:
SELECT count(*)
FROM ssb_lineorder JOIN ssb_ddate USE NL ON ssb_lineorder.lo_orderdate = ssb_ddate.d_datekey
WHERE ssb_ddate.d_year = 1993
AND ssb_lineorder.lo_discount BETWEEN 1 AND 3
AND ssb_lineorder.lo_quantity < 25;
Hash join:
SELECT count(*)
FROM ssb_lineorder JOIN ssb_ddate USE HASH(BUILD) ON ssb_lineorder.lo_orderdate = ssb_ddate.d_datekey
WHERE ssb_ddate.d_year = 1993
AND ssb_lineorder.lo_discount BETWEEN 1 AND 3
AND ssb_lineorder.lo_quantity < 25;
The first query gives me “0” but the second gives me more than ten thousands which is correct.
I suppose it is something to do with the index on the two buckets. I have index on EVERY single attribute on EVERY single bucket, i.e.
CREATE INDEX ixn ON bucket (attribute); (for each attribute in each bucket, one attribute per index)
The plans seem to push the predicates down to the scan node which is correct, but the plan of nested loop’s scan node obviously produces wrong results (from the “ItemsIn” and “ItemsOut” in the plan). I suppose it should be something wrong with the index but i dont know where. Is there any other possible cause of this difference?
I have to use nested loops because i am doing benchmarking. And I’ve read ANSI JOIN multiple times. They said we have to have index on the right hand side, which i do have.
SELECT count(1)
FROM ssb_ddate AS sd
JOIN ssb_lineorder AS sl USE NL ON sl.lo_orderdate = sd.d_datekey
WHERE sd.d_year = 1993
AND sl.lo_discount BETWEEN 1 AND 3
AND sl.lo_quantity < 25;
CREATE INDEX ix1 ON ssb_ddate( d_year, d_datekey);
CREATE INDEX ix2 ON ssb_lineorder(lo_discount, lo_quantity, lo_orderdate);
it might be using IntersectScans. AVOID IntersectScan with composite index and if needed with USE INDEX hint.
Also adjust index key of ix2 moving lo_orderdate to leading in case of NL vs ix2 as is for HASH.
Also try to use 6.6.1 MB-41605
Thanks for your reply! Is it possible to create one index per bucket to include every attribute in that index? I am running tens of thousands of queries with different predicates, which makes me find a universal index that is applicable.
N1QL indexes similar like b-tree it is not possible. You can check FTS index https://blog.couchbase.com/?s=fts. If you are going to use JOIN then GSI might be better.