Different results for nested loop and hash join

Yoga · January 24, 2021, 8:01am

Hi,

I am runnning a very simple ANSI join query, but I am getting different results using nested loop and hash join

Nested loop join:
SELECT count(*)
FROM ssb_lineorder JOIN ssb_ddate USE NL ON ssb_lineorder.lo_orderdate = ssb_ddate.d_datekey
WHERE ssb_ddate.d_year = 1993
AND ssb_lineorder.lo_discount BETWEEN 1 AND 3
AND ssb_lineorder.lo_quantity < 25;

Hash join:
SELECT count(*)
FROM ssb_lineorder JOIN ssb_ddate USE HASH(BUILD) ON ssb_lineorder.lo_orderdate = ssb_ddate.d_datekey
WHERE ssb_ddate.d_year = 1993
AND ssb_lineorder.lo_discount BETWEEN 1 AND 3
AND ssb_lineorder.lo_quantity < 25;

The first query gives me “0” but the second gives me more than ten thousands which is correct.
I suppose it is something to do with the index on the two buckets. I have index on EVERY single attribute on EVERY single bucket, i.e.
CREATE INDEX ixn ON bucket (attribute); (for each attribute in each bucket, one attribute per index)

The plans seem to push the predicates down to the scan node which is correct, but the plan of nested loop’s scan node obviously produces wrong results (from the “ItemsIn” and “ItemsOut” in the plan). I suppose it should be something wrong with the index but i dont know where. Is there any other possible cause of this difference?

Yoga · January 24, 2021, 8:03am

I have to use nested loops because i am doing benchmarking. And I’ve read ANSI JOIN multiple times. They said we have to have index on the right hand side, which i do have.

vsr1 · January 24, 2021, 9:49pm

SELECT count(1)
FROM ssb_ddate AS sd  
JOIN ssb_lineorder  AS sl USE NL ON sl.lo_orderdate = sd.d_datekey
WHERE sd.d_year = 1993
AND sl.lo_discount BETWEEN 1 AND 3
AND sl.lo_quantity < 25;

CREATE INDEX ix1 ON  ssb_ddate( d_year, d_datekey);
CREATE INDEX ix2 ON ssb_lineorder(lo_discount, lo_quantity, lo_orderdate);

Use above query and index. https://index-advisor.couchbase.com/indexadvisor/

it might be using IntersectScans. AVOID IntersectScan with composite index and if needed with USE INDEX hint.
Also adjust index key of ix2 moving lo_orderdate to leading in case of NL vs ix2 as is for HASH.
Also try to use 6.6.1 MB-41605

Yoga · January 24, 2021, 10:29pm

Thanks for your reply! Is it possible to create one index per bucket to include every attribute in that index? I am running tens of thousands of queries with different predicates, which makes me find a universal index that is applicable.

vsr1 · January 24, 2021, 10:33pm

N1QL indexes similar like b-tree it is not possible. You can check FTS index https://blog.couchbase.com/?s=fts. If you are going to use JOIN then GSI might be better.

Topic		Replies	Views
Different result of differnet join type SQL++ query , n1ql	1	487	January 12, 2021
INDEX JOIN... "LEFT OUTER NEST" vs "LEFT OUTER JOIN" SQL++	5	820	January 13, 2021
Joins and Clarification SQL++	1	1157	January 28, 2017
Joining two buckets using GSI SQL++ query , n1ql , index	1	1043	July 19, 2018
SubQuery with UPDATE and SELECT throwing Index not found error SQL++	2	762	March 13, 2022

Different results for nested loop and hash join

Related topics