N1QL - Indexes when using tokens to search in array field with SATISFIES

Dear all,

I have a question about using indexes with tokens and satisifes condition in WHERE clause.
My bucket is made by document in which we have a field called searchKeyA as an array of strings (they are all keywords).
We want to search all documents whose searchKeyA contains every word passed as a parameter.

As an example, a parameter string can be “alfa beta gamma” and we need to get documents that have both alfa, beta, gamma in the searchKeyA field.

So, we first built an index on searchKeyA as following:

create index idx_searchKeyA on docs_bucket(searchKeyA) using gsi;

Now, we want to fetch documents with queries like the following:

select *
from docs_bucket
where every v in tokens("alfa beta gamma") satisfies v in searchKeyA end;

If we run the explain command we get a plan like this:

"plan": {
  "#operator": "Sequence",
  "~children": [
    {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "PrimaryScan",
          "index": "#primary",
          "keyspace": "docs_bucket",
          "namespace": "default",
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "docs_bucket",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "every `v` in tokens(\"alfa beta gamma\") satisfies (`v` in (`docs_bucket`.`searchKeyA`)) end"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    }
  ]
}

Why we are getting a PrimaryScan operation for this query?
Is there a better way to run queries like this?

Thank you for your precious help,
Stefano

You need to use ANY not every for array indexing to use.

ANY or SOME, EVERY, and ANY AND EVERY or SOME AND EVERY

Range predicates (ANY or SOME, EVERY, and ANY AND EVERY or SOME AND EVERY) allow you to test a boolean condition over the elements or attributes of a collection or object(s). They each evaluate to a boolean value.

ANY or SOME is TRUE if the collection is non-empty and at least one element matches.

EVERY is TRUE if the collection is empty, or if the collection is non-empty and every element matches.

ANY AND EVERY or SOME AND EVERY is TRUE if the collection is non-empty and every element matches.

Thank you @vsr1,
said that, I can’t use a single ANY because of my requisite on searching ALL the words in the parameter.
With ANY AND EVERY or SOME AND EVERY the query is too slow.

Is there any example of a similar situation on the net? Or, maybe better, may you suggest me another way to accomplish the query?

Regards,
Stefano

CREATE INDEX ix1 ON default(DISTINCT ARRAY v FOR v IN serachKey END);
SELECT * FROM default WHERE ANY v IN searchKey SATISFIES v = "alpha" END AND
                                                      ANY v IN searchKey SATISFIES v = "beta" END AND
                                                      ANY v IN searchKey SATISFIES v = "gama" END;

Thank you @vsr1,

I’m gathering better results after I changed my array index into a Covering Array index.
So, from this

create index idx_searchKeyA_node1 on docs_bucket(distinct array v FOR v IN searchKeyA END);

To this:

create index idx_searchKeyA_node1 on docs_bucket(distinct array v FOR v IN searchKeyA END, searchKeyA);

Now, with multiple ANYs I’m getting results in about 200ms; the executing plan has changed, now the server is using a Sequence → Sequence → IntersectScan → IndexScan cascading operators which are faster than before.

Bests,
Stefano

As you made it covered try this to avoid IntersectScan

ANY v IN searchKeyA SATISFIES v = “alpha” END AND “beta” IN searchKeyA AND “gama” IN searchKeyA;

Dear @vsr1,

Thank you for your suggestion, but turning the query in this way

leads to slower results.
I’ll keep your solution in mind for the future, btw.

My very best regards,
Stefano