ORDER BY with OR disjunctions is very slow, even when order field is indexed

treo · October 22, 2021, 8:45pm

I have the following N1QL query that takes me 10 seconds

SELECT multimedia,titles.title,publicationDate
FROM data
WHERE (sectionId="90"
        OR sectionId="93"
        OR sectionId="94")
    AND publicationDate <= '2021-09-15 16:45:04'
    AND published=TRUE
    AND type="news"
    AND META().id LIKE "editorial::%"
ORDER BY publicationDate DESC
LIMIT 500

The slowness is due to “ORDER BY”. without ORDER BY it takes only 60ms.

It is a bucket with 1300000 items. 100% RAM resident

Index contains 1000000 item and weighs 186.6 MB, 100% RAM resident ratio, 0 requests/seg

Enterprise Edition 6.6.2 build 9598. CPU and RAM idle servers. No load.

The index:

CREATE INDEX `data-idx` ON 
`data`(`sectionId`,`publicationDate` DESC,`type`,`published`,`isAmazonOnsite`,`updatedOn`)
WHERE ((META().`id`) LIKE "editorial::%") WITH { "num_replica":1 }

The Plan do a IndexScan3 (#itemsOut - 157043 (quick)) → but need to make a fetch of 157043 (slow) → filter od 157043 (slow) → to finally make the ORDER (quick) → LIMIT (quick)

If the “publicationDate DESC” field is already indexed, why doesn’t the indexScan3 return the data already filtered, sorted and limited to 500. without fetching to the data service?
How could it improve the speed?

{
  "#operator": "Sequence",
  "#stats": {
    "#phaseSwitches": 1,
    "execTime": "17.781µs"
  },
  "~children": [
    {
      "#operator": "Authorize",
      "#stats": {
        "#phaseSwitches": 3,
        "execTime": "14.58µs",
        "servTime": "1.806162ms"
      },
      "privileges": {
        "List": [
          {
            "Target": "default:data",
            "Priv": 7
          }
        ]
      },
      "~child": {
        "#operator": "Sequence",
        "#stats": {
          "#phaseSwitches": 1,
          "execTime": "3.83µs"
        },
        "~children": [
          {
            "#operator": "Sequence",
            "#stats": {
              "#phaseSwitches": 1,
              "execTime": "2.21µs"
            },
            "~children": [
              {
                "#operator": "IndexScan3",
                "#stats": {
                  "#itemsOut": 157043,
                  "#phaseSwitches": 628175,
                  "execTime": "297.608181ms",
                  "kernTime": "9.287087961s",
                  "servTime": "57.198871ms"
                },
                "index": "data-idx-editorial-portadillas",
                "index_id": "6f1e7d080213883f",
                "index_projection": {
                  "primary_key": true
                },
                "keyspace": "data",
                "namespace": "default",
                "spans": [
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "\"90\"",
                        "inclusion": 3,
                        "low": "\"90\""
                      },
                      {
                        "high": "\"2021-09-15 16:45:04\"",
                        "inclusion": 2,
                        "low": "null"
                      },
                      {
                        "high": "\"news\"",
                        "inclusion": 3,
                        "low": "\"news\""
                      },
                      {
                        "high": "true",
                        "inclusion": 3,
                        "low": "true"
                      }
                    ]
                  },
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "\"93\"",
                        "inclusion": 3,
                        "low": "\"93\""
                      },
                      {
                        "high": "\"2021-09-15 16:45:04\"",
                        "inclusion": 2,
                        "low": "null"
                      },
                      {
                        "high": "\"news\"",
                        "inclusion": 3,
                        "low": "\"news\""
                      },
                      {
                        "high": "true",
                        "inclusion": 3,
                        "low": "true"
                      }
                    ]
                  },
                  {
                    "exact": true,
                    "range": [
                      {
                        "high": "\"94\"",
                        "inclusion": 3,
                        "low": "\"94\""
                      },
                      {
                        "high": "\"2021-09-15 16:45:04\"",
                        "inclusion": 2,
                        "low": "null"
                      },
                      {
                        "high": "\"news\"",
                        "inclusion": 3,
                        "low": "\"news\""
                      },
                      {
                        "high": "true",
                        "inclusion": 3,
                        "low": "true"
                      }
                    ]
                  }
                ],
                "using": "gsi",
                "#time_normal": "00:00.354",
                "#time_absolute": 0.354807052
              },
              {
                "#operator": "Fetch",
                "#stats": {
                  "#itemsIn": 157043,
                  "#itemsOut": 157043,
                  "#phaseSwitches": 647807,
                  "execTime": "351.800071ms",
                  "kernTime": "1.04608118s",
                  "servTime": "8.273478909s"
                },
                "keyspace": "data",
                "namespace": "default",
                "#time_normal": "00:08.625",
                "#time_absolute": 8.62527898
              },
              {
                "#operator": "Sequence",
                "#stats": {
                  "#phaseSwitches": 1,
                  "execTime": "1.24µs"
                },
                "~children": [
                  {
                    "#operator": "Filter",
                    "#stats": {
                      "#itemsIn": 157043,
                      "#itemsOut": 157043,
                      "#phaseSwitches": 628175,
                      "execTime": "9.078187605s",
                      "kernTime": "617.803648ms"
                    },
                    "condition": "((((((((`data`.`sectionId`) = \"90\") or ((`data`.`sectionId`) = \"93\")) or ((`data`.`sectionId`) = \"94\")) and ((`data`.`publicationDate`) <= \"2021-09-15 16:45:04\")) and ((`data`.`published`) = true)) and ((`data`.`type`) = \"news\")) and ((meta(`data`).`id`) like \"editorial::%\"))",
                    "#time_normal": "00:09.078",
                    "#time_absolute": 9.078187605
                  },
                  {
                    "#operator": "InitialProject",
                    "#stats": {
                      "#itemsIn": 157043,
                      "#itemsOut": 157043,
                      "#phaseSwitches": 628175,
                      "execTime": "3.452829516s",
                      "kernTime": "6.243215938s"
                    },
                    "result_terms": [
                      {
                        "expr": "(`data`.`multimedia`)"
                      },
                      {
                        "expr": "((`data`.`titles`).`title`)"
                      },
                      {
                        "expr": "(`data`.`publicationDate`)"
                      }
                    ],
                    "#time_normal": "00:03.452",
                    "#time_absolute": 3.452829516
                  }
                ],
                "#time_normal": "00:00.000",
                "#time_absolute": 0.00000124
              }
            ],
            "#time_normal": "00:00.000",
            "#time_absolute": 0.00000221
          },
          {
            "#operator": "Order",
            "#stats": {
              "#itemsIn": 157043,
              "#itemsOut": 500,
              "#phaseSwitches": 314591,
              "execTime": "937.039252ms",
              "kernTime": "8.760740031s"
            },
            "limit": "500",
            "sort_terms": [
              {
                "desc": true,
                "expr": "(`data`.`publicationDate`)"
              }
            ],
            "#time_normal": "00:00.937",
            "#time_absolute": 0.937039252
          },
          {
            "#operator": "Limit",
            "#stats": {
              "#itemsIn": 500,
              "#itemsOut": 500,
              "#phaseSwitches": 1001,
              "execTime": "154.614µs"
            },
            "expr": "500",
            "#time_normal": "00:00.000",
            "#time_absolute": 0.000154614
          },
          {
            "#operator": "FinalProject",
            "#stats": {
              "#itemsIn": 500,
              "#itemsOut": 500,
              "#phaseSwitches": 1501,
              "execTime": "354.527µs",
              "kernTime": "9.370922ms"
            },
            "#time_normal": "00:00.000",
            "#time_absolute": 0.000354527
          }
        ],
        "#time_normal": "00:00.000",
        "#time_absolute": 0.00000383
      },
      "#time_normal": "00:00.001",
      "#time_absolute": 0.001820742
    },
    {
      "#operator": "Stream",
      "#stats": {
        "#itemsIn": 500,
        "#itemsOut": 500,
        "#phaseSwitches": 1003,
        "execTime": "8.182236ms",
        "kernTime": "9.701608428s"
      },
      "#time_normal": "00:00.008",
      "#time_absolute": 0.008182235999999999
    }
  ],
  "~versions": [
    "6.5.0-N1QL",
    "6.6.2-9598-enterprise"
  ],
  "#time_normal": "00:00.000",
  "#time_absolute": 0.000017780999999999998
}

A curiosity:

If I filter with OR disjunctions: 'WHERE (sectionId=“90” OR sectionId=“93” OR sectionId=“94”)'
takes 10 seconds
If I filter without OR disjunctions: 'WHERE (sectionId=“90”)'
take only 40ms (because it returns the 500 results directly from the IndexScan3 )

#  40ms
SELECT multimedia,titles.title,publicationDate
FROM data
WHERE (sectionId="90")
    AND publicationDate <= '2021-09-15 16:45:04'
    AND published=TRUE
    AND type="news"
    AND META().id LIKE "editorial::%"
ORDER BY publicationDate DESC
LIMIT 500

Notes:

I need the ORDER and the ORs
I could create as leading key “publicationDate DESC” and improve a lot the time of this query, but the reality is that this index is also used in more types of queries, And the users can select the fields by which he wants to sort or search.
A “Covering Index” (all the fields specified in the query) improves the times, but it is not an option either, because some queries return free text fields and descriptions (for which we do not filter) that would make the index gigantic.

Thanks

vsr1 · October 22, 2021, 10:58pm

Try this

Only way to get that is moving “publicationDate DESC” leading. Otherwise query need explicit sort.

CREATE INDEX `dataindexfts-idx` ON 
`dataindexfts`(`type`,`published`,`publicationDate` DESC, `sectionId`,`isAmazonOnsite`,`updatedOn`)
WHERE ((META().`id`) LIKE "editorial::%") WITH { "num_replica":1 }

FYI: even though it will not make any difference Instead of OR , use sectionId IN [90, 93, 94]

Other option is

SELECT t.*
FROM ( (SELECT multimedia,titles.title,publicationDate FROM dataindexfts WHERE sectionId="90" AND publicationDate <= '2021-09-15 16:45:04' AND published=TRUE AND type="news" AND META().id LIKE "editorial::%" ORDER BY publicationDate DESC LIMIT 500)
        UNION
       (SELECT multimedia,titles.title,publicationDate FROM dataindexfts WHERE sectionId="93" AND publicationDate <= '2021-09-15 16:45:04' AND published=TRUE AND type="news" AND META().id LIKE "editorial::%" ORDER BY publicationDate DESC LIMIT 500)
       UNION
       (SELECT multimedia,titles.title,publicationDate FROM dataindexfts WHERE sectionId="94" AND publicationDate <= '2021-09-15 16:45:04' AND published=TRUE AND type="news" AND META().id LIKE "editorial::%" ORDER BY publicationDate DESC LIMIT 500)) AS t
ORDER BY t.publicationDate DESc
LIMIT 500;

treo · October 25, 2021, 10:33pm

Thank you vsr1, I asked with high hopes in your knowledge, and you have once again given us a brilliant solution. Thank you vsr1

The only way to make my N1QL efficient (with multiple OR and ORDER BY), is to use the sortable field as leading key. it improves the speed x20
I would have to have 1 index with different leading keys for each field that the user can sort by. This is the option I had initially considered (but I wanted to avoid having to create 10 identical indexes).
Effectively, there is no improvement by replacing the ORs (sectionId=“90” OR sectionId=“93” OR sectionId=“94”) by (sectionId IN [“90”, “93”, “94”]).
The 3 “subquerys UNION” option seems to me fabulous and in the example query it improves the speed x100 (in each UNION it only returns 500 results, directly from the Indexscan, and the ORDER is only 1500 items), I had not contemplated it and I am sure it helps other users of the forum with my same problem.
But I will not be able to use this strategy, because in my real use case (my initial post was a simplification), I have not only 3 “OR”, but I have 90 “OR” .Doing 90 UNIONs of 500 results is still x5 times faster than the original query (just ORDER BY of 45,000 items), but not fast enough for web results.

Thanks a lot vsr1, you helped me a lot.
Treo

vsr1 · October 26, 2021, 1:31am

It might give little better. Other option is avoid fetch till you have final 500 documents and then use N1QL or SDK fetch them.

SELECT t1.multimedia, t1.titles.title, t1.publicationDate
FROM dataindexfts AS t1 USE KEYS (
SELECT RAW t.id
FROM ( (SELECT META().id, publicationDate  FROM dataindexfts WHERE sectionId="90" AND publicationDate <= '2021-09-15 16:45:04' AND published=TRUE AND type="news" AND META().id LIKE "editorial::%" ORDER BY publicationDate DESC LIMIT 500)
        UNION ALL
       (SELECT  META().id, publicationDate  FROM dataindexfts WHERE sectionId="93" AND publicationDate <= '2021-09-15 16:45:04' AND published=TRUE AND type="news" AND META().id LIKE "editorial::%" ORDER BY publicationDate DESC LIMIT 500)
       UNION ALL
       (SELECT META().id, publicationDate  FROM dataindexfts WHERE sectionId="94" AND publicationDate <= '2021-09-15 16:45:04' AND published=TRUE AND type="news" AND META().id LIKE "editorial::%" ORDER BY publicationDate DESC LIMIT 500)) AS t
ORDER BY t.publicationDate DESc
LIMIT 500);

treo · October 26, 2021, 9:36pm

Thank you vsr1

This strategy is even better and it is efficient even with 90 UNIONs
We get only the meta(id) and the sort field → do the UNION ALL → reorder → limit to 500 → do only 500 fetch with USE KEYS. Superb.
It accelerates results x30.

Topic		Replies	Views
N1QL Index issue on sort SQL++	1	611	May 8, 2019
N1QL Query order by desc is slow SQL++	5	974	January 20, 2021
N1QL performance reduces drastically on using ORDER BY SQL++	4	2678	January 19, 2016
N1QL is very slow with "order by" clause SQL++ n1ql	4	2305	September 11, 2016
N1ql does not use index for "order by" SQL++	30	6234	November 4, 2016

ORDER BY with OR disjunctions is very slow, even when order field is indexed

Related topics