Array Covering index with UNNEST and condition

Hello,

With an array index with a condition on the array, I can’t get the index match a query with UNNEST and same condition on the array.

It works if I add the condition. It is covering without UNNEST, but not with it : the performances are really bad, several minutes for a count versus milliseconds without UNNEST!

Document :

{
    "data": {
        "type1": {
            "array": ["", ""]
        },
        "type2": {
            "array": [""]
        }
    },
    "type": "product"
}

Index

CREATE INDEX covering_array_idx ON default(ALL ARRAY d.`array` FOR d IN OBJECT_INNER_VALUES(data) END, OBJECT_INNER_VALUES(data))
    WHERE ANY d IN OBJECT_INNER_VALUES(data)
              SATISFIES ARRAY_LENGTH(d.`array`) = 1
	      END
       AND type = 'product'

Non matching query

SELECT count(*) FROM default
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
         AND type = 'product'

Matching query but not covering

SELECT count(*) FROM default
    UNNEST OBJECT_INNER_VALUES(data) as datas
    WHERE ANY d IN OBJECT_INNER_VALUES(data)
              SATISFIES ARRAY_LENGTH(d.`array`) = 1
	      END
       AND type = 'product'

As you can see, I respected all UNNEST ARRAY covering index conditions, including Array Indexing and vsr1 recommendations

Try this.

CREATE INDEX covering_array_idx ON default(ALL ARRAY ARRAY_LENGTH(d.`array`) FOR d IN OBJECT_INNER_VALUES(data) END) WHERE type = 'product';
SELECT count(1) FROM default AS d1 UNNEST OBJECT_INNER_VALUES(data) as d WHERE ARRAY_LENGTH(d.`array`) = 1 AND d1.type = 'product';

It works really great with a count request, but as soon as I use the same request but to get the content of the array and other attributes, the request is worst than the first ! The explain show that it uses IntersectScan between type_idx and covering_array_idx.

Request :

SELECT d1.`key`, d.`key` as source, d.`array`
    FROM default as d1
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
          AND d1.type = 'product'

If you think you need always ARRAY_LENGTH(d.array) == 1 . Use Index similar like this and adjust your needs. If not remove WHEN clause.

CREATE INDEX covering_array_idx ON default(ALL ARRAY d.arrayFOR d IN OBJECT_INNER_VALUES(data) WHEN ARRAY_LENGTH(d.array) == 1 END) WHERE type = 'product';

Thank you, it is working, the performances are incredible now.

But, if I want an other field in my nested array, for example d.key as shown in my previous request, the index is used but not covering. I tried with the following index but it is not working :

CREATE INDEX covering_array_idx ON default(ALL ARRAY [d.array, d.`key`] FOR d IN OBJECT_INNER_VALUES(data) WHEN ARRAY_LENGTH(d.array) == 1 END) WHERE type = 'product';

Try this. If that is not working include OBJECT_INNER_VALUES(data) as last index key

SELECT [d.array, d.`key`][0] AS array, [d.array, d.`key`][1] AS key
    FROM default as d1
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1 AND [d.array, d.`key`] IS NOT NULL
          AND d1.type = 'product'

Doesn’t work either! But no problem, I’m ok with actual performances, even if it is not covering.

Thanks a lot

Now I’ve an other problem. When I add meta().cas to the index and request, the count works great, it is covering and very quick.
But if I try to get an other document attribute, it do a full scan and it is too slow.

The explain shows that it do a fetch before filter!

Covering Request

SELECT count(*)
    FROM default as d1
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
           AND d1.type = 'product'
           AND meta().cas > 0

Full scan request

SELECT d1.`key`
    FROM default as d1
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
           AND d1.type = 'product'
           AND meta(d1).cas > 0

Add in the index keys

Yes sure, I already done that, it doesn’t work.

I got it partially working, if I add type to the index, I can cover product.key and d.array, but not d.key.

Index

CREATE INDEX covering_array_idx ON default(ALL ARRAY d.arra yFOR d IN OBJECT_INNER_VALUES(data) WHEN ARRAY_LENGTH(d.array) = 1 END, meta().cas, type) WHERE type = 'product';

Covering Request

SELECT d1.`key`, d.array
    FROM products
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
      AND d1.type = 'product'
      AND meta(d1).cas > 0

As soon as I add d.key, the performance are awful! If the index is not covering, it’s fine for me, as soon as it scans only document matching cas condition, but it is fetching all documents before filtering!

Try one of this

  1. Add OBJECT_INNER_VALUES(data) as index key at the end, becuase query has predicates on bucket and Unnest values so whole array needed in the index keys to cover.
  2. Move META().cas > 0 to Index WHERE clause
  1. Adding OBJECT_INNER_VALUES(data) doesn’t change anything
  2. I can’t move META().cas > 0 to Index Where clause cause it value can change. 0 is just an example.

The double array index seems promising but I can’t get it to work!

CREATE INDEX covering_array_idx ON default(ALL ARRAY d.array FOR d IN OBJECT_INNER_VALUES(data) WHEN ARRAY_LENGTH(d.array) = 1 END, meta().cas, key ) WHERE type = 'product';

SELECT d1.`key`, d.array
    FROM products AS d1
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
      AND d1.type = 'product'
      AND meta(d1).cas > 0

Check the query index and do right aliases.

I’m using the exact same index, and yes, with your request it is working perfectly, but I need also to get d.key, and as soon as I add it, it is not covering anymore.

SELECT d1.`key`, d.array, d.`key` as source
    FROM products AS d1
    UNNEST OBJECT_INNER_VALUES(data) as d
    WHERE ARRAY_LENGTH(d.`array`) = 1
          AND d1.type = 'product'
          AND meta(d1).cas > 0

If you need d.key You need to add OBJECT_INNER_VALUES(data) to index otherwise it will not cover.

CREATE INDEX covering_array_idx ON default(ALL ARRAY d.array FOR d IN OBJECT_INNER_VALUES(data) WHEN ARRAY_LENGTH(d.array) = 1 END, meta().cas, key, OBJECT_INNER_VALUES(data) ) WHERE type = 'product';

Sadly, it doesn’t work even if I add OBJECT_INNER_VALUES(data) in the index!

If not working you may have to use non-covered scan. What is couchbase version.

I’m using CB 5.0.1.

I have no problem with non-covered scan, getting all results takes 5 to 10 min, which is not really bad. But it takes the same time even if there is no updates! And it causes high consumption in Couchbase nodes which impact other queries!

PS : we are using CAS as a very precise timestamp, we are polling updates.

The following combinations uses covered index in 5.1.0-5552

CREATE INDEX covering_array_idx ON default(ALL ARRAY d.`array` FOR d IN OBJECT_INNER_VALUES(data) WHEN ARRAY_LENGTH(d.`array`) = 1 END, meta().cas, `key`, OBJECT_INNER_VALUES(data) ) WHERE type = 'product';^C
 SELECT d1.`key`, d.`array`, d.`key` as source FROM default AS d1 UNNEST OBJECT_INNER_VALUES(data) as d WHERE ARRAY_LENGTH(d.`array`) = 1 AND d1.type = 'product' AND meta(d1).cas > 0;

It seems that we need to upgrade then! It’ll take some time to upgrade in production because it is not very flexible here!

In the mean time I have to find an other solution!