Basic Ed: Why Does This Index/Query Combination Require a Fetch?

polfer · August 12, 2024, 2:23pm

I have the following test index in Couchbase 7.2.2:

CREATE INDEX `test_index`
    ON `test_bucket`(`sort`)
 WHERE ((`type` = "type1") OR (`type` = "type2"))

with the following query:

SELECT META().`id`
  FROM `test_bucket`
 WHERE (`sort` IS VALUED)
   AND ((`type` = "type1") OR (`type` = "type2"))

This combination requires a fetch after after pulling from the index (the query plan is at the end of the post).

Why does this require the fetch to apply the filter?

More specifically, if you drop the OR from index filter you’ll see that there is no longer a fetch and the query plan includes “filter_covers” for the index filter. But with the addition of the OR there is no longer a “filter_covers” clause.

So maybe the better question should be why the OR kills the covered filter?

The query plan with the OR follows:

{
    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "index": "test_index",
                "index_id": "f32d68da4b75baa3",
                "index_projection": {
                    "primary_key": true
                },
                "keyspace": "test_bucket",
                "namespace": "default",
                "spans": [
                    {
                        "exact": true,
                        "range": [
                            {
                                "inclusion": 0,
                                "index_key": "`sort`",
                                "low": "null"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Fetch",
                "keyspace": "test_bucket",
                "namespace": "default"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "(((`test_bucket`.`sort`) is valued) and (((`test_bucket`.`type`) = \"type1\") or ((`test_bucket`.`type`) = \"type2\")))"
                        },
                        {
                            "#operator": "InitialProject",
                            "discard_original": true,
                            "result_terms": [
                                {
                                    "expr": "(meta(`test_bucket`).`id`)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

mreiche · August 12, 2024, 5:10pm

I’m not the indexing expert, and without knowledge of the inner workings, here are my thoughts. It doesn’t look like the best query plan.

I have 7.6.2. It gives similar results to what you see.
For that query, the query advisor gives:

CREATE INDEX adv_type_sort ON my_bucket(type,sort)

And when I create and use that query, I get a better plan (no fetch)

    "plan": {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan3",
                "covers": [
                    "cover ((`my_bucket`.`type`))",
                    "cover ((`my_bucket`.`sort`))",
                    "cover ((meta(`my_bucket`).`id`))"
                ],
                "index": "adv_type_sort",
                "index_id": "72a076be49320d6",
                "keyspace": "my_bucket",
                "namespace": "default",
                "optimizer_estimates": {
                    "cardinality": 1.0842021724855044e-19,
                    "cost": 1.3401241725630335e-18,
                    "fr_cost": 1.3401241725630335e-18,
                    "size": 33
                },
                "spans": [
                    {
                        "exact": true,
                        "range": [
                            {
                                "high": "\"type1\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"type1\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`sort`",
                                "low": "null"
                            }
                        ]
                    },
                    {
                        "exact": true,
                        "range": [
                            {
                                "high": "\"type2\"",
                                "inclusion": 3,
                                "index_key": "`type`",
                                "low": "\"type2\""
                            },
                            {
                                "inclusion": 0,
                                "index_key": "`sort`",
                                "low": "null"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Filter",
                            "condition": "((cover ((`my_bucket`.`sort`)) is valued) and ((cover ((`my_bucket`.`type`)) = \"type1\") or (cover ((`my_bucket`.`type`)) = \"type2\")))",
                            "optimizer_estimates": {
                                "cardinality": 2.5489470578119236e-57,
                                "cost": 1.340746999293169e-18,
                                "fr_cost": 1.340746999293169e-18,
                                "size": 33
                            }
                        },
                        {
                            "#operator": "InitialProject",
                            "discard_original": true,
                            "optimizer_estimates": {
                                "cardinality": 2.5489470578119236e-57,
                                "cost": 1.340746999293169e-18,
                                "fr_cost": 1.340746999293169e-18,
                                "size": 33
                            },
                            "preserve_order": true,
                            "result_terms": [
                                {
                                    "expr": "cover ((meta(`my_bucket`).`id`))"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }

dh · August 12, 2024, 6:00pm

FWIW, you can achieve the same logic by using an IN predicate on both the index and in the query and it will be covered.

HTH.

vsr1 · August 12, 2024, 6:13pm

Index WHERE clause has OR clause that expression must be part of index keys to cover. Because OR/NOT in index where clause can quickly make complex (permutation/combinations are so many).
As we try to re-apply predicates we need the value of type from index we can’t determine value of type uniquely it required fetch

As @dh mentioned try IN clause. It always better if possible in index WHERE use IN vs OR

CREATE INDEX `test_index` ON `test_bucket`(`sort`)
 WHERE type  IN ["type1", "type2"];

SELECT META().`id`
  FROM `test_bucket`
 WHERE (`sort` IS VALUED) AND type  IN ["type1", "type2"];

polfer · August 12, 2024, 6:55pm

Thank you. The actual queries I have include more complexity, and in covering those I end up with numerous more spans. Basically my choice in that case seems to be including the covered variables, having no fetch, and then getting a ton of needless spans (in my query, the “sort is valued” basically says “given me everything”, but you see two spans), or not covering, getting the intended span, but then needing to take the fetch hit. The fetch his is usually worse time wise, of course.

polfer · August 12, 2024, 7:03pm

Thank you @vsr1. As @dh indicated as well, I had seen that IN could work in further testing after my post. However, the actual OR clauses I have are more complicated, and in fact, I see an interesting behavior where SOME PARTS of my index filter become “filter_covers” while the sub-clauses that include ORs end up having the above issue.

In my head, it seemed to me like an index filter should, in its entirety, effectively become a “filter_cover” without breaking it down, especially since it seems it must be at least completely match in order to use the index at all. I get the complexity may be in cases where things are MORE specific (i.e. the index filter for an attribute has “IS VALUED” but the actual query filter is for “= 55” or some specific sub-range).

I’m hoping to understand in practice how that could work. In my specific queries, I cannot use the “IN” trick above, and the expressions that I have are more complicated (multiple OR clauses some with ANDs in them). When I cover the attributes, I get 10’s of SPANS but no FETCH, and it didn’t make sense to me how it degenerated. My post with the simple case above was meant more than anything else to confirm for me that the simple inclusion of the OR clause could cause this behavior (ANDs don’t seem to necessarily do that as much).

I appreciate the help, everyone. Thank you!

mreiche · August 12, 2024, 7:16pm

I don’t know if it will help, but “IS VALUED” is an option on index fields. It might work better than specifying as a predicate in the WHERE.

CREATE INDEX test_index ON test_bucket(sort IS VALUED …

However, the actual OR clauses I have are more complicated

Is it possible to post them?

polfer · August 12, 2024, 7:59pm

I’m not able to post the full detailed index and query publicly (I can shift to a ticket if better), but I can share that this is essentially the filter on the index:

WHERE (NOT IFMISSINGORNULL(`_deleted`, FALSE))
  AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`base` == "baseVal" AND `type` != "typeExcl1" AND `type` != "typeExcl2")
   OR  (`type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"]))

The base attribute, if set to a specific value, will force inclusion of a document unless the type is one of two values. Otherwise, we will also include any of 12 different other types in this particular index. As with the sample above, the leading index attribute is disjoint and all the span technically needs (like above, the two spans weren’t really necessary once type was covering).

I know this may not help, but I’ll shift to a support ticket if we need additional resolution. I definitely appreciate the confirmation on the OR and the suggestions regarding IN.

mreiche · August 12, 2024, 8:28pm

In my non-expert, intuitive opinion, it might be the != that result in scans. Using an index to find exactly one (or more) values is done with lookup(s). Using an index to find everything that is != a value would take a scan. If you can convert those != to IN(… all the other possible values …), it might come out better.

AND ((`base` == "baseVal" AND `type` IN ["type3", "type4",...]))

Something else that might help is to split that query into two queries and take the distinct of the union. It eliminates the ‘OR’ - I don’t know if it will perform better.

Edit: I tried this and it seems that the NOT IFMISSINGORNULL are a problem.

SELECT distinct * from
(select meta(`test_bucket`).`id` from test_bucket
WHERE (NOT IFMISSINGORNULL(`_deleted`, FALSE))
  AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`base` == "baseVal" AND `type` IN ["type3", "type4",...]))
UNION
select meta(`test_bucket`).`id` from test_bucket
WHERE (NOT IFMISSINGORNULL(`_deleted`, FALSE))
  AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"])))

mreiche · August 12, 2024, 9:03pm

These indexes result in a query plan with no fetches. On my above “union” query.

CREATE INDEX adv_base_type_meta_id ON `my_bucket`(`_deleted` INCLUDE MISSING,`softDelete`,`base`,`type`,(meta().`id`))

CREATE INDEX adv_type_meta_id ON `my_bucket`(`_deleted` INCLUDE MISSING,`softDelete`,`type`,(meta().`id`))

vsr1 · August 13, 2024, 7:22pm

Avoid NOT/ORs , same need in query too

WHERE IFMISSINGORNULL(`_deleted`, FALSE) = FALSE
  AND IFMISSINGORNULL(`softDelete`, FALSE) = FALSE
  AND (META().`id` NOT LIKE "_sync:%")
  AND ((`base` == "baseVal" AND `type` NOT IN  ["typeExcl1",  "typeExcl2"])
       OR  `type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"]);

If you want avoid many spans and predicates exactly same, try some thing in query and index. It will not go below relation operators and do transformation. Those must be exactly matched with query and index

WHERE IFMISSINGORNULL(`_deleted`, FALSE) = FALSE
  AND IFMISSINGORNULL(`softDelete`, FALSE) = FALSE
  AND (META().`id` NOT LIKE "_sync:%")
  AND (((`base` == "baseVal" AND `type` NOT IN  ["typeExcl1",  "typeExcl2"])
       OR  `type` IN ["typeInc1", "typeInc2", "typeInc3", "typeInc4", "typeInc5",
                  "typeInc6", "typeInc7", "typeInc8", "typeInc9", "typeInc10",
                  "typeInc11", "typeInc12"])) = true

If too many spans is issue you can also trick it by define index key as type and WHERE exactly clause some thing like CONTAINS([…], type) = true/false or use SUBSTR(type,0)

system · November 11, 2024, 7:22pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is this Query performing a fetch SQL++ n1ql , index	1	1239	November 12, 2019
"OR" 100 times slower than "AND" SQL++ n1ql	9	1654	January 6, 2017
Index for a query with OR statement Couchbase Server n1ql , index	1	697	August 22, 2019
N1QL query works even with missing predicate created using couchbase secondary index SQL++ query , java , n1ql , couchbase-cli	1	1122	April 28, 2020
Covering Index issue in 4.5.1 Couchbase Server query , n1ql	1	1266	October 14, 2016

Basic Ed: Why Does This Index/Query Combination Require a Fetch?

Related topics