N1QL combining array in result like in-line view

Hi,

I have multiple documents that have an array of strings such that when I select out all of the documents based on the WHERE condition, the result separates the output for each document.

For example, I have two documents resulting from the query:

"results": [
    {
        "$1": [
            "abc",
            "def"
        ]
    },
    {
        "$1": [
            "xyz"
        ]
    }
],

A sample query could look like:

select myString from bucket where myValue in [1, 2];

What if I want to combine the result together? so that it would look like this:

"results": [
    {
        "$1": [
            "abc",
            "def",
            "xyz"
        ]
    }
],

I seem to have gotten it to work by using UNNEST and then ARRAY_AGG. I just want to make sure that I’m on the right track…

select array_agg(s) from bucket b unnest b.myArray s where b.myValue in [1, 2];

Now I want to take the output (e.g., the above) and rewrite the query so that it would use the values (e.g., like an in-line view from Oracle). This is how I would write it out before using the in-line view:

select count(*) from bucket where aValue in ["abc", "def", "xyz"];

How would I use it as a nested N1QL (if that’s possible)? When I try to place the original N1QL after the IN in round brackets ( … ), I don’t get any error but the result is wrong.

select count(*) from bucket where aValue in (select array_agg(s) from bucket b unnest b.myArray s where b.myValue in [1, 2]);

Thank you.
Regards, Steve

Instead of UNNEST do ARRAY_AGG() this makes array of arrays, Then use ARRAY_FLTTEN() to flatten the array of array into array.

SELECT RAW makes into VALUE instead of object. As this value is array and SELECT makes another array so do ARRAY_FLATTEN() on SLECT or pick 0 th element.

SELECT COUNT(1) FROM bucket 
WHERE aValue IN ARRAY_FLATTEN((SELECT RAW ARRAY_FLTTEN(ARRAY_AGG(b.myArray),2)
                               FROM bucket b WHERE b.myValue IN [1,2]), 2);

OR

SELECT COUNT(1) FROM bucket 
WHERE aValue IN (SELECT RAW ARRAY_FLTTEN(ARRAY_AGG(b.myArray),2)
                               FROM bucket b WHERE b.myValue IN [1,2])[0];

Thanks @vsr1 – exactly what I’m looking for.

I’m noticing that it won’t use the preferred index… Using the query you gave:

SELECT COUNT(1) FROM bucket 
WHERE aValue IN (SELECT RAW ARRAY_FLTTEN(ARRAY_AGG(b.myArray),2)
                               FROM bucket b WHERE b.myValue IN [1,2])[0];

The inside portion has an index to support:

SELECT RAW ARRAY_FLTTEN(ARRAY_AGG(b.myArray),2)
                               FROM bucket b WHERE b.myValue IN [1,2])

The index would look like:

create index my_idx1 on bucket(myValue);

This runs fast (e.g., less than 20ms).
The outside portion of the query also has an index like this:

create index my_idx2 on bucket(aValue);

If running the outside portion by itself (with literal values instead such as):

SELECT COUNT(1) FROM bucket 
WHERE aValue IN ('abc', 'def', xyz');

It is also around 20ms.

However when combined all together (e.g., like the query you gave), it takes around 10s (seconds not milliseconds). When I check the explain plan, it is using the outside index and does an IndexScan. It doesn’t use the inside index.

When I try to force it to use the inside index with USE INDEX my_idx1, it doesn’t use my_idx1 (inside) but rather my_idx2 (outside).

I even try to use the USE INDEX my_idx1 on the inside and USE INDEX my_idx2 on the outside (e.g., both were explicitly written out). It still uses the my_idx2 (outside) only. All scenario takes around 10s.

Is there a way to get the nested N1QL portion to use the correct index?

Inside query uses the my_idx1. EXPLAIN doesn’t have ability display subquery plans.
As subquery is not correlated, you can do EXPLAIN on subquery and verify it.
The right Index will be

create index my_idx1 on bucket(myValue, myArray);

Plan is prepared during the prepare time at that time subquery results are not available so the outside query has to do complete index scan on m_idx2 and then apply predicate. That is why it may taking time.

Best option will be in your Client program do following.

  1. Run the subquery and get the results
  2. Construct the new query dynamically that uses results from the step 1
  3. execute the new query

Thanks again @vsr1. I was trying to see if we can simplify and combine both (1) and (2) from the above plan so it only needs a single trip. Clearly with the multiple trip, it will still be faster than trying to combine it in my case.

Thanks for correcting the inside index.

At present, If right side of IN is constant we push the values to indexer and do IndexScan on required range. If not we need to do full index scan on that key and apply predicate later.