I want to translate query 22 of the TPC-H benchmark to N1QL. The query involves two buckets named “customer” and “orders”; the former containing 150,000 documents and the latter 1,500,000. My first approach to write it is shown below:
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substr(c1.c_phone, 0, 2) as cntrycode,
c1.c_acctbal
from
customer c1
where
substr(c1.c_phone, 0, 2) in
['30', '17', '25', '10', '22', '15', '21']
and c1.c_acctbal > (
select raw
avg(c2.c_acctbal)
from
customer c2
where
c2.c_acctbal > 0.00
and substr(c2.c_phone, 0, 2) in
['30', '17', '25', '10', '22', '15', '21']
)[0]
and not exists (
select
*
from
orders o
use keys meta(c1).id
where
o.o_custkey = c1.id
)
) as custsale
group by
cntrycode
order by
cntrycode;
The problem is that the correlated sub-query in “exists” always returns an empty set. I have tried to isolate the problem by:
- Trying to count the number of customers without orders with the query below, but just like in the main query it results in an empty set.
select count(*)
from customer c1
where not exists (
select o.o_custkey
from orders o
use keys meta(c1).id
where c1.id = o.o_custkey
)
- Writing the query above in a different way , but the length is always 0, which (again) causes the query to count all customers
select count(*)
from customer c1
let corders = (
select o.o_custkey
from orders o
use keys meta(c1).id
where o.o_custkey = c1.id
)
where ARRAY_LENGTH(corders) = 0
- I checked to see if something about the data was off by finding the customers with at least 1 order (see the query below), but the result is 99996; which is correct as I know beforehand there are 50004 customers without orders: 150,000 - 99,996 = 50,004
select count(distinct c.id)
from customer c join orders o on c.id = o.o_custkey
where c.id = o.o_custkey
I have seen the “id” field in couchbase to be usually defined as a string, but mine is an integer (in TPC-H, the primary key of “customer” and “orders” are the fields “c_custkey” and “o_orderkey” respectively, but I simply reference them as “id” in their respective couchbase bucket ). May this be affecting USE KEYS somehow?
Could you help me understand what the problem is?
(NOTE: I am using Couchbase Server 6.0.0 Community)