Basic Ed: Why Does This Index/Query Combination Require a Fetch?

I have the following test index in Couchbase 7.2.2:

CREATE INDEX `test_index`
    ON `test_bucket`(`sort`)
 WHERE ((`type` = "type1") OR (`type` = "type2"))

with the following query:

SELECT META().`id`
  FROM `test_bucket`
 WHERE (`sort` IS VALUED)
   AND ((`type` = "type1") OR (`type` = "type2"))

This combination requires a fetch after after pulling from the index (the query plan is at the end of the post).

Why does this require the fetch to apply the filter?

More specifically, if you drop the OR from index filter you’ll see that there is no longer a fetch and the query plan includes “filter_covers” for the index filter. But with the addition of the OR there is no longer a “filter_covers” clause.

So maybe the better question should be why the OR kills the covered filter?

The query plan with the OR follows:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "index": "test_index",
                "index_id": "f32d68da4b75baa3",
                "index_projection": {
                    "primary_key": true
                },
                "keyspace": "test_bucket",
                "namespace": "default",
                "spans": [
                    {
                        "exact": true,
                        "range": [
                            {
                                "inclusion": 0,
                                "index_key": "`sort`",
                                "low": "null"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Fetch",
                "keyspace": "test_bucket",
                "namespace": "default"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "(((`test_bucket`.`sort`) is valued) and (((`test_bucket`.`type`) = \"type1\") or ((`test_bucket`.`type`) = \"type2\")))"
                        },
                        {
                            "#operator": "InitialProject",
                            "discard_original": true,
                            "result_terms": [
                                {
                                    "expr": "(meta(`test_bucket`).`id`)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

I’m not the indexing expert, and without knowledge of the inner workings, here are my thoughts. It doesn’t look like the best query plan.

I have 7.6.2. It gives similar results to what you see.
For that query, the query advisor gives:

CREATE INDEX adv_type_sort ON my_bucket(type,sort)

And when I create and use that query, I get a better plan (no fetch)

    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "covers": [
                    "cover ((`my_bucket`.`type`))",
                    "cover ((`my_bucket`.`sort`))",
                    "cover ((meta(`my_bucket`).`id`))"
                ],
                "index": "adv_type_sort",
                "index_id": "72a076be49320d6",
                "keyspace": "my_bucket",
                "namespace": "default",
                "optimizer_estimates": {
                    "cardinality": 1.0842021724855044e-19,
                    "cost": 1.3401241725630335e-18,
                    "fr_cost": 1.3401241725630335e-18,
                    "size": 33
                },
                "spans": [
                    {
                        "exact": true,
                        "range": [
                            {
                                "high": "\"type1\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"type1\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`sort`",
                                "low": "null"
                            }
                        ]
                    },
                    {
                        "exact": true,
                        "range": [
                            {
                                "high": "\"type2\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"type2\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`sort`",
                                "low": "null"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "((cover ((`my_bucket`.`sort`)) is valued) and ((cover ((`my_bucket`.`type`)) = \"type1\") or (cover ((`my_bucket`.`type`)) = \"type2\")))",
                            "optimizer_estimates": {
                                "cardinality": 2.5489470578119236e-57,
                                "cost": 1.340746999293169e-18,
                                "fr_cost": 1.340746999293169e-18,
                                "size": 33
                            }
                        },
                        {
                            "#operator": "InitialProject",
                            "discard_original": true,
                            "optimizer_estimates": {
                                "cardinality": 2.5489470578119236e-57,
                                "cost": 1.340746999293169e-18,
                                "fr_cost": 1.340746999293169e-18,
                                "size": 33
                            },
                            "preserve_order": true,
                            "result_terms": [
                                {
                                    "expr": "cover ((meta(`my_bucket`).`id`))"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }

FWIW, you can achieve the same logic by using an IN predicate on both the index and in the query and it will be covered.

HTH.

Index WHERE clause has OR clause that expression must be part of index keys to cover. Because OR/NOT in index where clause can quickly make complex (permutation/combinations are so many).
As we try to re-apply predicates we need the value of type from index we can’t determine value of type uniquely it required fetch

As @dh mentioned try IN clause. It always better if possible in index WHERE use IN vs OR

CREATE INDEX `test_index` ON `test_bucket`(`sort`)
 WHERE type  IN ["type1", "type2"];

SELECT META().`id`
  FROM `test_bucket`
 WHERE (`sort` IS VALUED) AND type  IN ["type1", "type2"];

Thank you. The actual queries I have include more complexity, and in covering those I end up with numerous more spans. Basically my choice in that case seems to be including the covered variables, having no fetch, and then getting a ton of needless spans (in my query, the “sort is valued” basically says “given me everything”, but you see two spans), or not covering, getting the intended span, but then needing to take the fetch hit. The fetch his is usually worse time wise, of course.

Thank you @vsr1. As @dh indicated as well, I had seen that IN could work in further testing after my post. However, the actual OR clauses I have are more complicated, and in fact, I see an interesting behavior where SOME PARTS of my index filter become “filter_covers” while the sub-clauses that include ORs end up having the above issue.

In my head, it seemed to me like an index filter should, in its entirety, effectively become a “filter_cover” without breaking it down, especially since it seems it must be at least completely match in order to use the index at all. I get the complexity may be in cases where things are MORE specific (i.e. the index filter for an attribute has “IS VALUED” but the actual query filter is for “= 55” or some specific sub-range).

I’m hoping to understand in practice how that could work. In my specific queries, I cannot use the “IN” trick above, and the expressions that I have are more complicated (multiple OR clauses some with ANDs in them). When I cover the attributes, I get 10’s of SPANS but no FETCH, and it didn’t make sense to me how it degenerated. My post with the simple case above was meant more than anything else to confirm for me that the simple inclusion of the OR clause could cause this behavior (ANDs don’t seem to necessarily do that as much).

I appreciate the help, everyone. Thank you!

I don’t know if it will help, but “IS VALUED” is an option on index fields. It might work better than specifying as a predicate in the WHERE.

CREATE INDEX test_index ON test_bucket(sort IS VALUED …

However, the actual OR clauses I have are more complicated

Is it possible to post them?

I’m not able to post the full detailed index and query publicly (I can shift to a ticket if better), but I can share that this is essentially the filter on the index:

WHERE (NOT IFMISSINGORNULL(`_deleted`, FALSE))
  AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`base` == "baseVal" AND `type` != "typeExcl1" AND `type` != "typeExcl2")
   OR  (`type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"]))

The base attribute, if set to a specific value, will force inclusion of a document unless the type is one of two values. Otherwise, we will also include any of 12 different other types in this particular index. As with the sample above, the leading index attribute is disjoint and all the span technically needs (like above, the two spans weren’t really necessary once type was covering).

I know this may not help, but I’ll shift to a support ticket if we need additional resolution. I definitely appreciate the confirmation on the OR and the suggestions regarding IN.

In my non-expert, intuitive opinion, it might be the != that result in scans. Using an index to find exactly one (or more) values is done with lookup(s). Using an index to find everything that is != a value would take a scan. If you can convert those != to IN(… all the other possible values …), it might come out better.

AND ((`base` == "baseVal" AND `type` IN ["type3", "type4",...]))

Something else that might help is to split that query into two queries and take the distinct of the union. It eliminates the ‘OR’ - I don’t know if it will perform better.

Edit: I tried this and it seems that the NOT IFMISSINGORNULL are a problem.

SELECT distinct * from
(select meta(`test_bucket`).`id` from test_bucket
WHERE (NOT IFMISSINGORNULL(`_deleted`, FALSE))
  AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`base` == "baseVal" AND `type` IN ["type3", "type4",...]))
UNION
select meta(`test_bucket`).`id` from test_bucket
WHERE (NOT IFMISSINGORNULL(`_deleted`, FALSE))
  AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"])))

These indexes result in a query plan with no fetches. On my above “union” query.

CREATE INDEX adv_base_type_meta_id ON `my_bucket`(`_deleted` INCLUDE MISSING,`softDelete`,`base`,`type`,(meta().`id`))

CREATE INDEX adv_type_meta_id ON `my_bucket`(`_deleted` INCLUDE MISSING,`softDelete`,`type`,(meta().`id`))

Avoid NOT/ORs , same need in query too

WHERE IFMISSINGORNULL(`_deleted`, FALSE) = FALSE
  AND IFMISSINGORNULL(`softDelete`, FALSE) = FALSE
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`base` == "baseVal" AND `type` NOT IN  ["typeExcl1",  "typeExcl2"])
       OR  `type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"]);

If you want avoid many spans and predicates exactly same, try some thing in query and index. It will not go below relation operators and do transformation. Those must be exactly matched with query and index

WHERE IFMISSINGORNULL(`_deleted`, FALSE) = FALSE
  AND IFMISSINGORNULL(`softDelete`, FALSE) = FALSE
  AND (META().`id` NOT LIKE "_sync:%")
  AND (((`base` == "baseVal" AND `type` NOT IN  ["typeExcl1",  "typeExcl2"])
       OR  `type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"])) = true

If too many spans is issue you can also trick it by define index key as type and WHERE exactly clause some thing like CONTAINS([…], type) = true/false or use SUBSTR(type,0)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.