Index suggestion for LET, JOIN and UNNEST

I am tryng to improve the performance of the following query.
The query takes around 500-900ms to run.
By checking the the “plan” of the query, the LET section takes up most of the time (90%) to go through the list of availability of each product to look for the matched location.

The query involves a few join and unnest, I am not sure how to build an index that could speed up the LET part here.
The join includes joining 3 documents together, 2 of them are in the same collection.

Any help will be really appreciated. Thanks

The query

SELECT items.itemId AS __id, META(items).cas AS __cas,
               lists._id,
               lists.type,
               lists.uid,
               lists.updatedDate,
               {
                 "itemId":items.itemId, 
                 "item":OBJECT_PUT(products, "availability", ava), /* I want to return the availability object of the locationId that I am querying instead of the whole list of availability */
                 "score": items.score,
                 "lastPurchaseDate":items.lastPurchaseDate
               } AS joinedItem
FROM `bucket1`.`scope1`.`lists` AS lists USE KEYS "ITEM_LIST:USER100"                                    /* The USER100 here is passed in as a variable when forming the query */
    LEFT JOIN `bucket1`.`scope1`.`lists` AS rmlists ON KEYS CONCAT('REMOVED_ITEM_LIST:',lists.uid)
    UNNEST lists.items AS items
    INNER JOIN `bucket1`.`scope1`.`products` AS products ON KEYS CONCAT('PRODUCTS:',items.itemId) 
LET rmItem = FIRST i FOR i IN rmlists.items WHEN i.itemId = items.itemId AND i.deleteDate != '' END,     /* I want to filter out any item in the ITEM_LIST that exists in REMOVED_ITEM_LIST 
                                                                                                            and the rmlists.deleteDate is later than the items.lastPurchaseDate */
    ava = FIRST a FOR a IN products.availability WHEN a.locationId = "bbbb-bbbb" END                     /* The locationId here is a variable that being passed in when forming the query.
                                                                                                            I want to get the availability object of the locationId that I am querying */
WHERE (rmItem IS MISSING OR items.lastPurchaseDate > rmItem.deleteDate) 
    AND (ava IS NOT MISSING AND ava IS NOT NULL)
    AND (products.dontShow IS MISSING OR products.dontShow IS NULL OR products.dontShow = false)
    
    /* The following filters will sometimes be added depends on the business logic */
    AND items.score > 0
    AND ava.status = 1
    AND ava.offer = true
    AND products.brands IN ['BRAND_1']
    AND EVERY cat IN ['CAT_B', 'CAT_C'] SATISFIES cat IN products.category END 
    
/* The following sorting will be added depends on the business logic */
ORDER BY products.productOrder ASC
ORDER BY items.score DESC 
LIMIT 50

The data model

Collection: `bucket1`.`scope1`.`lists`
Doc key: REMOVED_ITEM_LIST:USER100
{
  "items": [ /* This array could have few hundred items */
    {
      "itemId": "1111",
      "deleteDate": "2024-01-01T07:11:03Z"
    },
    {
      "itemId": "2222",
      "deleteDate": "2024-07-28T09:07:14Z"
    }
  ],
  "_id": "REMOVED_ITEM_LIST:USER100",
  "type": "RemovedItemList",
  "name": "RemovedItemList",
  "uid": "USER100",
  "updatedDate": "2024-07-01T07:11:03Z"
}


Collection: `bucket1`.`scope1`.`lists`
Doc key: ITEM_LIST:USER100
{
  "items": [ /* This array could have few hundred items */
    {
      "itemId": "1111",
      "score": 0.5
      "lastPurchaseDate": "2024-03-28"
    },
    {
      "itemId": "2222",
      "score": 0.1
      "lastPurchaseDate": "2024-03-28"
    },
    {
      "itemId": "3333",
      "score": 0.9
      "lastPurchaseDate": "2024-07-10"
    }
  ],
  "type": "ItemList",
  "uid": "USER100",
  "updatedDate": "2024-07-10",
  "_id": "ITEM_LIST:USER100"
}


Collection: `bucket1`.`scope1`.`products`
Doc key: PRODUCTS:1111
{
  "productCode" : "1111"
  "name": "4TB SSD"
  "availability": [ /* This array could have few hundred items */
    {
      "locationId": "aaaa-aaaa",
      "status": 1
    },
    {
      "locationId": "bbbb-bbbb",
      "status": 1,
      "offer": true
    },
    {
      "locationId": "cccc-cccc",
      "status": 1
    }
  ],
  "brand": "BRAND_1",
  "category": [
    "CAT_A",
    "CAT_B",
    "CAT_C"
  ]
  "type": "product",
  "productOrder": 110,
  "dontShow": false,
  "updated": 1722343233
}

There is no index needed above query as it uses USE KEYS or ON KEYS.

As you have 4 level Join it can explorde rmItem only depends on first 2 joins which are 1 row. So Add FROM subquery produces first 2 JOIN and calcuate rmItem then join with other two

Hi @vsr1 , thanks for taking a look to my question.

I am a bit confused on your suggestion. Do you mean I should break the query into two sub-query, the first query to filter out removed item in the lists by joining lists and rmList and the second query to join it with products and filter the result?

How is it going to solve the problem of the costly operation in LET to find out the ava?

Thanks so much.

ava is costly nothing can be done

Is it possible to separate the ava look up into another query and create a covering index for the ava so that it can be retrieved from the index instead of going through the document?

Not sure it helps but try

WITH products AS ( SELECT ava AS availability, p.*
                   FROM `bucket1`.`scope1`.`products` AS p
                   UNNEST p.availability AS ava
                   WHERE p.type = "product"
                         AND IFMISSIGNORNULL(p.dontShow, false) = false
                         AND ava.locationId = "bbbb-bbbb"
                         AND p.brands IN ['BRAND_1']
                          AND EVERY cat IN ['CAT_B', 'CAT_C'] SATISFIES cat IN p.category END)

SELECT t.*, OBJECT_PUT(t.joinedItem,"item",p) AS joinedItem, MISSING rmItem
FROM (SELECT items.itemId AS __id, META(lists).cas AS __cas,
             lists._id, lists.type, lists.uid, lists.updatedDate,
             (FIRST i FOR i IN rmlists.items WHEN i.itemId = items.itemId AND i.deleteDate != '' END) AS rmItem,
              {items.itemId, items.lastPurchaseDate, items.score} AS joinedItem
     FROM `bucket1`.`scope1`.`lists` AS lists USE KEYS "ITEM_LIST:USER100"
     LEFT JOIN `bucket1`.`scope1`.`lists` AS rmlists ON KEYS CONCAT('REMOVED_ITEM_LIST:',lists.uid)
     UNNEST lists.items AS items) AS t
JOIN products AS p ON t.__id = p.productCode
WHERE (t.rmItem IS MISSING OR t.joinedItem.lastPurchaseDate > t.rmItem.deleteDate)
ORDER BY p.productOrder ASC
LIMIT 50;

CREATE INDEX ix1 ON `bucket1`.`scope1`.`products`(ALL ARRAY ava.locationId FOR ava IN availability END, brands)
WHERE type = "product" AND IFMISSIGNORNULL(dontShow, false) = false;

Above query add as WITH clause and JOIN and see

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.