Query DISTINCT RAW return multiple null values

Hi,
Today i was running a query on the database and i saw something weird and i dont know if this is normal for the value null.
The query is like this:
SELECT DISTINCT RAW Email
FROM bucket
WHERE conditions
The values that i saw were normal expect for 2 null values instead of 1. If i run the same query with the condition “AND Email is null” it returns just one null value, same thing if i just run the distinct without raw. In the db i have 3 elements with Email = null. Someone know if the null value with the distinct raw has unexpected behaviours and it is better to not have them?

What version are you using?

A SELECT DISTINCT RAW should be producing a single null in the results, e.g.

cbq> select distinct raw x from [null,1,null,2,null] x;
{
    "requestID": "714a26b0-c2da-460f-a217-63b8e831eff6",
    "signature": "json",
    "results": [
    null,
    1,
    2
    ],

Thx for the answer. Im using Couchbase Server Enterprise Edition 7.2.4 build 7070
This is the result of the query. I know its weird.

I’m unable to reproduce this locally using the same version.
Specifically I have run:

upsert into default (key k, value v) 
select to_string(x) k,{"Email":case when mod(x,3) == 0 then null else "email_"||to_string(x) end} v 
from array_range(0,20) x;

to generate 7 null values out of 20.

I have a primary index only on the bucket.

My result is as expected:

cbq> select distinct raw Email from default;
{
    "requestID": "3ff1f337-ed01-452e-8ee8-07df76afcf81",
    "signature": "json",
    "results": [
    null,
    "email_1",
    "email_10",
    "email_11",
    "email_13",
    "email_14",
    "email_16",
    "email_17",
    "email_19",
    "email_2",
    "email_4",
    "email_5",
    "email_7",
    "email_8"
    ],

You mentioned that adding a filter corrects the output - does this happen if you add it as:

WHERE Email IS NOT MISSING

?

Can you confirm what the EXPLAIN shows as the access method - and if this changes when you add the filter? (Notably if a different index is used and if it is covered in one vs having to FETCH in the other.)

I tryed also with another field of the same documents that has 120 doc with value null and it returns 2 values null as well.
I tryed adding your filter “AND Email IS NOT MISSING” and i only get one null value. But the indexes used are the same.
image

I’m at a loss as I can’t see how this result is possible; whether covered or not, a Distinct operator should be the last item in the plan and this won’t produce duplicates.

All RAW does is strip the outermost “wrapper” - so the field name in this case. The same operator is used for “distinct” regardless of RAW specification.

If you run my first example (with the literal value list) does it produce the duplicated null for you?

If I run your first example I get the right result as you.

I don’t really know how can this be possible. The real problem is that I have this data in a dev bucket and in a test bucket and I get the same results for both, with 2 null. I will just filter the results removing the nulls xD

Actually… I think I may have an idea.

If an entire document is inserted as “null” - e.g.

upsert into default values("0",null);

then you’ll see two nulls.

Can you check with:

select count(1) from default d where d is null;

?
(Substitute your collection name for “default”.)

I’m sorry xD
[
{
“$1”: 0
}
]

Could you try with:

select count(1) from default d where d is missing;

too, please?

Actually that’ll probably not show anything either.

Can you verify that all your documents are of the expected form ?

Nope, 0 results again.
[
{
“$1”: 0
}
]
The documents are all good. They are about 1100 users so it is also a small number.

SELECT count(1) FROM default WHERE Email IS MISSING

?

(This would imply using WHERE Email IS NOT MISSING in your statement.)

This last query gives me like 6kk elements. But i think its normal because it gives me all the documents without Email.

OK. I believe the absent field is one null and the actual field with a value of null is another; I believe this is because they’re not equivalent (MISSING is not NULL) and RAW can’t represent MISSING (when not RAW an empty containing object represents MISSING).

The suggested filter (IS NOT MISSING) should remove the duplicate from your results.

1 Like

Thank you for the assistance. It wasn’t a real problem, but I was just curious of this behaviour and you clarified it to me :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.