Index suggestion for LET, JOIN and UNNEST

edward · July 30, 2024, 2:25pm

I am tryng to improve the performance of the following query.
The query takes around 500-900ms to run.
By checking the the “plan” of the query, the LET section takes up most of the time (90%) to go through the list of availability of each product to look for the matched location.

The query involves a few join and unnest, I am not sure how to build an index that could speed up the LET part here.
The join includes joining 3 documents together, 2 of them are in the same collection.

Any help will be really appreciated. Thanks

The query

SELECT items.itemId AS __id, META(items).cas AS __cas,
               lists._id,
               lists.type,
               lists.uid,
               lists.updatedDate,
               {
                 "itemId":items.itemId, 
                 "item":OBJECT_PUT(products, "availability", ava), /* I want to return the availability object of the locationId that I am querying instead of the whole list of availability */
                 "score": items.score,
                 "lastPurchaseDate":items.lastPurchaseDate
               } AS joinedItem
FROM `bucket1`.`scope1`.`lists` AS lists USE KEYS "ITEM_LIST:USER100"                                    /* The USER100 here is passed in as a variable when forming the query */
    LEFT JOIN `bucket1`.`scope1`.`lists` AS rmlists ON KEYS CONCAT('REMOVED_ITEM_LIST:',lists.uid)
    UNNEST lists.items AS items
    INNER JOIN `bucket1`.`scope1`.`products` AS products ON KEYS CONCAT('PRODUCTS:',items.itemId) 
LET rmItem = FIRST i FOR i IN rmlists.items WHEN i.itemId = items.itemId AND i.deleteDate != '' END,     /* I want to filter out any item in the ITEM_LIST that exists in REMOVED_ITEM_LIST 
                                                                                                            and the rmlists.deleteDate is later than the items.lastPurchaseDate */
    ava = FIRST a FOR a IN products.availability WHEN a.locationId = "bbbb-bbbb" END                     /* The locationId here is a variable that being passed in when forming the query.
                                                                                                            I want to get the availability object of the locationId that I am querying */
WHERE (rmItem IS MISSING OR items.lastPurchaseDate > rmItem.deleteDate) 
    AND (ava IS NOT MISSING AND ava IS NOT NULL)
    AND (products.dontShow IS MISSING OR products.dontShow IS NULL OR products.dontShow = false)
    
    /* The following filters will sometimes be added depends on the business logic */
    AND items.score > 0
    AND ava.status = 1
    AND ava.offer = true
    AND products.brands IN ['BRAND_1']
    AND EVERY cat IN ['CAT_B', 'CAT_C'] SATISFIES cat IN products.category END 
    
/* The following sorting will be added depends on the business logic */
ORDER BY products.productOrder ASC
ORDER BY items.score DESC 
LIMIT 50

The data model

Collection: `bucket1`.`scope1`.`lists`
Doc key: REMOVED_ITEM_LIST:USER100
{
  "items": [ /* This array could have few hundred items */
    {
      "itemId": "1111",
      "deleteDate": "2024-01-01T07:11:03Z"
    },
    {
      "itemId": "2222",
      "deleteDate": "2024-07-28T09:07:14Z"
    }
  ],
  "_id": "REMOVED_ITEM_LIST:USER100",
  "type": "RemovedItemList",
  "name": "RemovedItemList",
  "uid": "USER100",
  "updatedDate": "2024-07-01T07:11:03Z"
}


Collection: `bucket1`.`scope1`.`lists`
Doc key: ITEM_LIST:USER100
{
  "items": [ /* This array could have few hundred items */
    {
      "itemId": "1111",
      "score": 0.5
      "lastPurchaseDate": "2024-03-28"
    },
    {
      "itemId": "2222",
      "score": 0.1
      "lastPurchaseDate": "2024-03-28"
    },
    {
      "itemId": "3333",
      "score": 0.9
      "lastPurchaseDate": "2024-07-10"
    }
  ],
  "type": "ItemList",
  "uid": "USER100",
  "updatedDate": "2024-07-10",
  "_id": "ITEM_LIST:USER100"
}


Collection: `bucket1`.`scope1`.`products`
Doc key: PRODUCTS:1111
{
  "productCode" : "1111"
  "name": "4TB SSD"
  "availability": [ /* This array could have few hundred items */
    {
      "locationId": "aaaa-aaaa",
      "status": 1
    },
    {
      "locationId": "bbbb-bbbb",
      "status": 1,
      "offer": true
    },
    {
      "locationId": "cccc-cccc",
      "status": 1
    }
  ],
  "brand": "BRAND_1",
  "category": [
    "CAT_A",
    "CAT_B",
    "CAT_C"
  ]
  "type": "product",
  "productOrder": 110,
  "dontShow": false,
  "updated": 1722343233
}

vsr1 · July 30, 2024, 5:39pm

There is no index needed above query as it uses USE KEYS or ON KEYS.

As you have 4 level Join it can explorde rmItem only depends on first 2 joins which are 1 row. So Add FROM subquery produces first 2 JOIN and calcuate rmItem then join with other two

edward · July 30, 2024, 6:39pm

Hi @vsr1 , thanks for taking a look to my question.

I am a bit confused on your suggestion. Do you mean I should break the query into two sub-query, the first query to filter out removed item in the lists by joining lists and rmList and the second query to join it with products and filter the result?

How is it going to solve the problem of the costly operation in LET to find out the ava?

Thanks so much.

vsr1 · July 30, 2024, 11:38pm

ava is costly nothing can be done

edward · July 31, 2024, 1:51pm

Is it possible to separate the ava look up into another query and create a covering index for the ava so that it can be retrieved from the index instead of going through the document?

vsr1 · July 31, 2024, 4:54pm

Not sure it helps but try

WITH products AS ( SELECT ava AS availability, p.*
                   FROM `bucket1`.`scope1`.`products` AS p
                   UNNEST p.availability AS ava
                   WHERE p.type = "product"
                         AND IFMISSIGNORNULL(p.dontShow, false) = false
                         AND ava.locationId = "bbbb-bbbb"
                         AND p.brands IN ['BRAND_1']
                          AND EVERY cat IN ['CAT_B', 'CAT_C'] SATISFIES cat IN p.category END)

SELECT t.*, OBJECT_PUT(t.joinedItem,"item",p) AS joinedItem, MISSING rmItem
FROM (SELECT items.itemId AS __id, META(lists).cas AS __cas,
             lists._id, lists.type, lists.uid, lists.updatedDate,
             (FIRST i FOR i IN rmlists.items WHEN i.itemId = items.itemId AND i.deleteDate != '' END) AS rmItem,
              {items.itemId, items.lastPurchaseDate, items.score} AS joinedItem
     FROM `bucket1`.`scope1`.`lists` AS lists USE KEYS "ITEM_LIST:USER100"
     LEFT JOIN `bucket1`.`scope1`.`lists` AS rmlists ON KEYS CONCAT('REMOVED_ITEM_LIST:',lists.uid)
     UNNEST lists.items AS items) AS t
JOIN products AS p ON t.__id = p.productCode
WHERE (t.rmItem IS MISSING OR t.joinedItem.lastPurchaseDate > t.rmItem.deleteDate)
ORDER BY p.productOrder ASC
LIMIT 50;

CREATE INDEX ix1 ON `bucket1`.`scope1`.`products`(ALL ARRAY ava.locationId FOR ava IN availability END, brands)
WHERE type = "product" AND IFMISSIGNORNULL(dontShow, false) = false;

Above query add as WITH clause and JOIN and see

system · October 29, 2024, 4:54pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Appropriate index for query with unnest SQL++ query , index	13	1467	June 6, 2021
Index on UNNEST JOIN SQL++ n1ql	5	3677	December 12, 2017
How do I create an index to cope with UNNEST'ed queries? SQL++ n1ql	10	1425	August 8, 2018
Improve query performance SQL++ query , n1ql	3	638	August 15, 2022
Need to tune a N1QL SQL++ query , n1ql , index	9	107	January 9, 2025

Index suggestion for LET, JOIN and UNNEST

Related topics