Slow query with unnest - though using index

jda · October 23, 2018, 3:17pm

Following this nice advice from @vsr1: How do I create an index to cope with UNNEST'ed queries? I ended up having a query that would find the last edited or created document of a specific type:

SELECT
       IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,
       IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,
       d.revisioninfo.modifiedcount
FROM data AS d
LEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du
WHERE d.type='FishingTrip'
ORDER BY modified DESC limit 1;

It uses this index:

CREATE INDEX `def_type_modified` ON `data`(`type`,(`revisioninfo`.`modifiedcount`),(`revisioninfo`.`created`),(`revisioninfo`.`createdby`),(`revisioninfo`.`updates`))

as one can see from the Explain:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "covers": [
              "cover ((`d`.`type`))",
              "cover (((`d`.`revisioninfo`).`modifiedcount`))",
              "cover (((`d`.`revisioninfo`).`created`))",
              "cover (((`d`.`revisioninfo`).`createdby`))",
              "cover (((`d`.`revisioninfo`).`updates`))",
              "cover ((meta(`d`).`id`))"
            ],
            "index": "def_type_modified",
            "index_id": "587679f61576574c",
            "index_projection": {
              "entry_keys": [
                0,
                1,
                2,
                3,
                4
              ]
            },
            "keyspace": "data",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"FishingTrip\"",
                    "inclusion": 3,
                    "low": "\"FishingTrip\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Unnest",
                  "as": "du",
                  "expr": "to_array(cover (((`d`.`revisioninfo`).`updates`)))",
                  "outer": true
                }
              ]
            }
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(cover ((`d`.`type`)) = \"FishingTrip\")"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "as": "modified",
                      "expr": "ifmissingornull((`du`.`modified`), cover (((`d`.`revisioninfo`).`created`)))"
                    },
                    {
                      "as": "modifiedby",
                      "expr": "ifmissingornull((`du`.`modifiedby`), cover (((`d`.`revisioninfo`).`createdby`)))"
                    },
                    {
                      "expr": "cover (((`d`.`revisioninfo`).`modifiedcount`))"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order",
        "limit": "1",
        "sort_terms": [
          {
            "desc": true,
            "expr": "`modified`"
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "1"
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT\n       IFMISSINGORNULL(du.modified, d.revisioninfo.created) AS modified,\n       IFMISSINGORNULL(du.modifiedby, d.revisioninfo.createdby) AS modifiedby,\n       d.revisioninfo.modifiedcount\nFROM data AS d\nLEFT UNNEST TO_ARRAY(d.revisioninfo.updates) AS du\nWHERE d.type='FishingTrip'\nORDER BY modified DESC limit 1;"
}

However, now I have started playing with “live” data - and the above query takes around 1 second when I have 25-30,000 documents.

I’m suspecting that the ORDER BY modified could be the culprit. But I’m not sure - only guessing as it is not directly part of the index. If I remove the ORDER BY... then the response is in milliseconds!

I have tried to create the index like this (from pure trying to “unfold” the query):

CREATE INDEX `def_type_modified` ON `data`(`type`,(`revisioninfo`.`modifiedcount`),(`revisioninfo`.`created`),(`revisioninfo`.`createdby`),(`revisioninfo`.`updates`),IFMISSINGORNULL(TO_ARRAY(d.revisioninfo.updates).modified, d.revisioninfo.created))

But it does not speed up the query…

How can I improve this?

vsr1 · October 23, 2018, 3:34pm

Pagination queries (ORDER BY … OFFSET … LIMIT) needs to produce all qualifying rows. That is why it takes time because last document that may be right candidate. Once you remove ORDER BY you want any 1 document. Once it produces first document it query stops.

We optimize queries if the query order by can expose index order. In this case it is not possible.

jda · October 23, 2018, 3:37pm

Ok, that is kind of sad…

jda · October 23, 2018, 3:41pm

So if I understand you correctly then the only way I can do this is by “unnesting” (or rather “calculating”) a “last modified date” in my program so that it can be included in the index?

Though, that kind of spoils the idea of using the “right” or “natural” structure of this information…

vsr1 · October 23, 2018, 3:49pm

I am not sure i get it. Your query is not doing modified date for each document. It also going across documents and finding latest.

The following looks complex but you can try if interested.

CREATE INDEX ix21 ON default (type, ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC, revisioninfo.modifiedcount);

EXPLAIN SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM default AS d
LET m = ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)
WHERE d.type='FishingTrip'
ORDER BY ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC limit 1;

jda · October 23, 2018, 4:06pm

Interesting… That was kind of the idea I followed in my “helpless” try - I just didn’t know how to express it

I’ll give your suggestion a go

vsr1 · October 23, 2018, 4:10pm

To Use Expressions from index keys those needs to be exactly matched with query expressions.

jda · October 23, 2018, 4:13pm

That is AWESOME!!!

It WORKS. Response time is around 5-10ms.

Not sure if I can “invent” that for another case - but I believe I fully understand it.

Thanks!

jda · October 24, 2018, 10:37am

Unfortunately, there is an issue with this brilliant suggestion. If I have a group of documents that have never been edited then it does not work. Like this type:

{
  "address": "Nørregade 10",
  "city": "København K",
  "inactive": false,
  "ispublic": true,
  "key": "1",
  "name": "Københavns Universitet",
  "revisioninfo": {
    "created": "2013-11-11T22:50:07+0100",
    "createdby": "John Dalsgaard/D-D",
    "modifiedcount": 0
  },
  "type": "University",
  "zip": "1017"
}

Running the query:

SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM data AS d
LET m = ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)
WHERE d.type='University'
ORDER BY ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC
limit 1;

just returns:

[
  {
    "modifiedcount": 0
  }
]

I think the problem is that there is no revisionsinfo.updates at all - and therefore the IFMISSINGORNULL... does not get executed. But as the ARRAY_MAX() is there I think I just need to always add the created and createdby

I have tried the following without success, so I may need a little more help @vsr1:

SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM data AS d
LET m = ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, d.revisioninfo.created), IFMISSINGORNULL(u.modifiedby, d.revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)
WHERE d.type='University'
ORDER BY ARRAY_MAX(ARRAY [IFMISSINGORNULL(u.modified, d.revisioninfo.created), IFMISSINGORNULL(u.modifiedby, d.revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END) DESC
limit 1;

… just thinking that it could not find the right reference - but that’s not it. Then I tried to add the created info as an element to the array:

SELECT
      m[0] AS modified,
      m[1] AS modifiedby,
      d.revisioninfo.modifiedcount
FROM data AS d
LET m = ARRAY_MAX(ARRAY_APPEND([revisioninfo.created,revisioninfo.createdby], ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END))
WHERE d.type='University'
ORDER BY ARRAY_MAX(ARRAY_APPEND([revisioninfo.created,revisioninfo.createdby], ARRAY [IFMISSINGORNULL(u.modified, revisioninfo.created), IFMISSINGORNULL(u.modifiedby, revisioninfo.createdby)] FOR u IN TO_ARRAY(revisioninfo.updates) END)) DESC
limit 1;

vsr1 · October 24, 2018, 12:28pm

Try following in post#4 query

TO_ARRAY(revisioninfo.updates)

To

TO_ARRAY(IFMISSINGORNULL(revisioninfo.updates,revisioninfo))

jda · October 24, 2018, 1:16pm

I had to make a slight change where i changed “revisioninfo.created” to “u.created” - and then change the index accordingly

It also returns more correct results from the existing queries as the wrong date was returned if there was one created after last modification

It gets a little complicated - but it works

Thanks!

Topic		Replies	Views
How do I create an index to cope with UNNEST'ed queries? SQL++ n1ql	10	1425	August 8, 2018
Complex indexing SQL++ n1ql , index	2	567	December 3, 2020
Unnest query running slow Couchbase Server query , index	2	933	March 10, 2022
Array Covering index with UNNEST and condition SQL++ n1ql , index	30	3875	April 16, 2018
Index for query SQL++ n1ql	4	1709	August 26, 2016

Slow query with unnest - though using index

Related topics