Query DISTINCT RAW return multiple null values

andrea · May 13, 2024, 1:48pm

Hi,
Today i was running a query on the database and i saw something weird and i dont know if this is normal for the value null.
The query is like this:
SELECT DISTINCT RAW Email
FROM bucket
WHERE conditions
The values that i saw were normal expect for 2 null values instead of 1. If i run the same query with the condition “AND Email is null” it returns just one null value, same thing if i just run the distinct without raw. In the db i have 3 elements with Email = null. Someone know if the null value with the distinct raw has unexpected behaviours and it is better to not have them?

dh · May 13, 2024, 1:53pm

What version are you using?

A SELECT DISTINCT RAW should be producing a single null in the results, e.g.

cbq> select distinct raw x from [null,1,null,2,null] x;
{
    "requestID": "714a26b0-c2da-460f-a217-63b8e831eff6",
    "signature": "json",
    "results": [
    null,
    1,
    2
    ],

andrea · May 13, 2024, 1:56pm

Thx for the answer. Im using Couchbase Server Enterprise Edition 7.2.4 build 7070
This is the result of the query. I know its weird.

dh · May 13, 2024, 2:11pm

I’m unable to reproduce this locally using the same version.
Specifically I have run:

upsert into default (key k, value v) 
select to_string(x) k,{"Email":case when mod(x,3) == 0 then null else "email_"||to_string(x) end} v 
from array_range(0,20) x;

to generate 7 null values out of 20.

I have a primary index only on the bucket.

My result is as expected:

cbq> select distinct raw Email from default;
{
    "requestID": "3ff1f337-ed01-452e-8ee8-07df76afcf81",
    "signature": "json",
    "results": [
    null,
    "email_1",
    "email_10",
    "email_11",
    "email_13",
    "email_14",
    "email_16",
    "email_17",
    "email_19",
    "email_2",
    "email_4",
    "email_5",
    "email_7",
    "email_8"
    ],

You mentioned that adding a filter corrects the output - does this happen if you add it as:

WHERE Email IS NOT MISSING

?

Can you confirm what the EXPLAIN shows as the access method - and if this changes when you add the filter? (Notably if a different index is used and if it is covered in one vs having to FETCH in the other.)

andrea · May 13, 2024, 2:19pm

I tryed also with another field of the same documents that has 120 doc with value null and it returns 2 values null as well.
I tryed adding your filter “AND Email IS NOT MISSING” and i only get one null value. But the indexes used are the same.

dh · May 13, 2024, 2:25pm

I’m at a loss as I can’t see how this result is possible; whether covered or not, a Distinct operator should be the last item in the plan and this won’t produce duplicates.

All RAW does is strip the outermost “wrapper” - so the field name in this case. The same operator is used for “distinct” regardless of RAW specification.

If you run my first example (with the literal value list) does it produce the duplicated null for you?

andrea · May 13, 2024, 2:30pm

If I run your first example I get the right result as you.

I don’t really know how can this be possible. The real problem is that I have this data in a dev bucket and in a test bucket and I get the same results for both, with 2 null. I will just filter the results removing the nulls xD

dh · May 13, 2024, 2:30pm

Actually… I think I may have an idea.

If an entire document is inserted as “null” - e.g.

upsert into default values("0",null);

then you’ll see two nulls.

Can you check with:

select count(1) from default d where d is null;

?
(Substitute your collection name for “default”.)

andrea · May 13, 2024, 2:34pm

I’m sorry xD
[
{
“$1”: 0
}
]

dh · May 13, 2024, 2:35pm

Could you try with:

select count(1) from default d where d is missing;

too, please?

Actually that’ll probably not show anything either.

Can you verify that all your documents are of the expected form ?

andrea · May 13, 2024, 2:38pm

Nope, 0 results again.
[
{
“$1”: 0
}
]
The documents are all good. They are about 1100 users so it is also a small number.

dh · May 13, 2024, 2:39pm

SELECT count(1) FROM default WHERE Email IS MISSING

?

(This would imply using WHERE Email IS NOT MISSING in your statement.)

andrea · May 13, 2024, 2:45pm

This last query gives me like 6kk elements. But i think its normal because it gives me all the documents without Email.

dh · May 13, 2024, 2:51pm

OK. I believe the absent field is one null and the actual field with a value of null is another; I believe this is because they’re not equivalent (MISSING is not NULL) and RAW can’t represent MISSING (when not RAW an empty containing object represents MISSING).

The suggested filter (IS NOT MISSING) should remove the duplicate from your results.

andrea · May 13, 2024, 2:59pm

Thank you for the assistance. It wasn’t a real problem, but I was just curious of this behaviour and you clarified it to me

system · August 11, 2024, 3:00pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help with DISTINCT Query? SQL++	1	596	May 29, 2019
Insert-select settings and subquery SQL++	4	629	June 16, 2022
Need help with N1QL SQL++ n1ql	1	30	October 28, 2024
Java dsl does not seem to support `SELECT DISTINCT RAW` statements Java SDK	2	1117	September 10, 2020
Couchbase return null value after save and read document using n1ql Java SDK query , connections , n1ql , sdk , array-query	1	2004	October 5, 2020

Query DISTINCT RAW return multiple null values

Related topics