How Using Analytics Indexes in JOIN

I have two entities deploy_dev.categories_game.data and deploy_dev.game.data
I created index:

CREATE INDEX categoriesIdx ON deploy_dev.game.data (UNNEST data.en.categories SELECT id:STRING) EXCLUDE UNKNOWN KEY


SELECT count(col_doc_game)
  FROM `deploy_dev`.`categories_game`.`data` col_doc
  JOIN `deploy_dev`.`game`.`data` as col_doc_game ON SOME category IN col_doc_game.data.en.categories SATISFIES category.id =  col_doc.id
  
  WHERE (col_doc.data.en.category_view=TRUE) GROUP BY col_doc.id

But this index is not used. Why ?

if instead of col_doc.id use const string, index is used

Hi @Lenty_Me, you need to pass the /*+ indexnl */ hint and cast the joined field col_doc.id to the same type as the indexed field’s type using to_string(). Try this:

SELECT count(col_doc_game)
FROM `deploy_dev`.`categories_game`.`data` col_doc
JOIN `deploy_dev`.`game`.`data` AS col_doc_game
ON SOME category IN col_doc_game.data.en.categories
  SATISFIES category.id /*+ indexnl */ = to_string(col_doc.id)
WHERE (col_doc.data.en.category_view=TRUE) GROUP BY col_doc.id;

Thank You It is real working.

But due to the use of the index, there was a problem with other queries.
“code”: 25000,
“msg”: “Internal error”,

In the query which do not use field from the index.
For example:

SELECT VALUE OBJECT_CONCAT(col_doc.data.en, IFMISSING(col_doc.data.`ru`, {}), {'doc_id': col_doc.id, 'ctime': META(col_doc).cas}, {})
  FROM `deploy_dev`.`game`.`data` col_doc
  WHERE (col_doc.data.en.provider.id IN []
      AND ARRAY_LENGTH(ARRAY_INTERSECT(ARRAY_STAR(col_doc.data.en.categories).alias, [
      "slots"
    ]))>0
      AND TRUE IN [ARRAY_LENGTH(col_doc.data.en.allowed_countries) = 0, "DE"  IN col_doc.data.en.allowed_countries]
      AND TRUE IN [ARRAY_LENGTH(col_doc.data.en.restricted_countries) = 0,  "DE"  NOT IN col_doc.data.en.restricted_countries]
      AND col_doc.data.en.codes.desktop!="")

Result is “code”: 25000, “msg”: “Internal error”,

If I remove index, query is working
If I remove condition “DE” IN col_doc.data.en.allowed_countries , query is working

Please advidvice me

It looks like there is a bug related to array indexing for those queries.
The IN clauses in the example query triggered the bug where you have a nested IN clause in another IN clause. Try rewriting the query to remove some of the IN clauses. Here is an example:

SELECT VALUE OBJECT_CONCAT(col_doc.data.en, IFMISSING(col_doc.data.`ru`, {}), {'doc_id': col_doc.id, 'ctime': META(col_doc).cas}, {})
FROM `deploy_dev`.`game`.`data` col_doc
WHERE (
col_doc.data.en.provider.id IN []
AND ARRAY_LENGTH(ARRAY_INTERSECT(ARRAY_STAR(col_doc.data.en.categories).alias, ["slots"])) > 0
AND (CASE WHEN ARRAY_LENGTH(col_doc.data.en.allowed_countries) = 0 THEN TRUE
	      WHEN "DE" IN col_doc.data.en.allowed_countries THEN TRUE
	      ELSE FALSE END)
AND (CASE WHEN ARRAY_LENGTH(col_doc.data.en.restricted_countries) = 0 THEN TRUE
	      WHEN "DE" NOT IN col_doc.data.en.restricted_countries THEN TRUE
	      ELSE FALSE END)
AND col_doc.data.en.codes.desktop!=""
);

Here is another version:

SELECT VALUE OBJECT_CONCAT(col_doc.data.en, IFMISSING(col_doc.data.`ru`, {}), {'doc_id': col_doc.id, 'ctime': META(col_doc).cas}, {})
FROM `deploy_dev`.`game`.`data` col_doc
WHERE (
col_doc.data.en.provider.id IN []
AND ARRAY_LENGTH(ARRAY_INTERSECT(ARRAY_STAR(col_doc.data.en.categories).alias, ["slots"])) > 0
AND (CASE WHEN ARRAY_LENGTH(col_doc.data.en.allowed_countries) = 0 THEN TRUE
	      WHEN ARRAY_CONTAINS(col_doc.data.en.allowed_countries,"DE") THEN TRUE
	      ELSE FALSE END)
AND (CASE WHEN ARRAY_LENGTH(col_doc.data.en.restricted_countries) = 0 THEN TRUE
	      WHEN ARRAY_CONTAINS(col_doc.data.en.restricted_countries,"DE") THEN FALSE
	      ELSE TRUE END)
AND col_doc.data.en.codes.desktop!=""
);

You could also disable array indexing for a query if you know your query won’t use array indexes by setting the compiler.arrayindex before the query like this:

SET `compiler.arrayindex` "false";
SELECT VALUE OBJECT_CONCAT(col_doc.data.en, IFMISSING(col_doc.data.`ru`, {}), {'doc_id': col_doc.id, 'ctime': META(col_doc).cas}, {})
FROM `deploy_dev`.`game`.`data` col_doc
WHERE (col_doc.data.en.provider.id IN []
  AND ARRAY_LENGTH(ARRAY_INTERSECT(ARRAY_STAR(col_doc.data.en.categories).alias, ["slots"]))>0
  AND TRUE IN [ARRAY_LENGTH(col_doc.data.en.allowed_countries) = 0, "DE"  IN col_doc.data.en.allowed_countries]
  AND TRUE IN [ARRAY_LENGTH(col_doc.data.en.restricted_countries) = 0,  "DE"  NOT IN col_doc.data.en.restricted_countries]
  AND col_doc.data.en.codes.desktop!="");

I would recommend you compare the results of the two versions against your original query to make sure the semantics is the same especially for cases where the allowed_countries and restricted_countries could be MISSING or NULL.

One final note, I am not sure about the purpose of this predicate:

col_doc.data.en.provider.id IN []

You could try rewriting this predicate. You might get a better simpler plan. The IN clause can sometimes be translated in a special way that may not necessarily be the most efficient.