Complex query performance optimizations

Hi,

maybe somebody know how to improve such query?

CREATE INDEX prod_adv_idx 
ON `production`(`timestamp`,`brandCode`,`transactionId`,`quantity`) 
WHERE ((`originalSource` = 'D2C') 
AND (`_class` = 'com.elc.d2c.transactions.model.TransactionItem'))

SELECT AVG(t1.itemsInTransaction) AS amount
FROM(
    SELECT SUM(trsI.quantity) AS itemsInTransaction
    FROM production trsI JOIN production trs ON KEYS trsI.transactionId
    WHERE trsI._class = "model.TransactionItem"
        AND trsI.originalSource = "Y1"
        AND trsI.code = "x1"
        AND trsI.timestamp BETWEEN "2020-02-02T23:00:00Z" AND "2022-02-03T22:59:59Z"
        AND trs.status != "deleted"
    GROUP BY trsI.transactionId ) AS t1

which takes 16.9 seconds.

Using Couchbase Enterprise Edition 6.6.3 build 9808.
DB has 170,655,089 items, 104GB / 126GB RAM, 541GB HDD. (1 bucket, 6 servers (2 for data, 4 for query and index)).

Thanks in advance!

The Index provided doesn’t qualify for query.
trsI.transactionId is ARRAY or single value?
Is trs need to check status only?

You no need group results? The reason is can it be SUM(trsI.quantity)/ COUNT(DISTINCT trsI.transactionId) ?

@vsr1 thank you for response.

  1. trsI.transactionId is a value - id of items with _class = “model.Transaction”
  2. trs should check status only
  3. Yes I have to group to exclude Transactions with status = “deleted”
CREATE INDEX ix1 ON `production`(code, `timestamp`,`transactionId`,`quantity`)
WHERE ((`originalSource` = 'Y1') AND (`_class` = "model.TransactionItem"));

See First query using index aggregates any better.

SELECT RAW AVG(st.sq)
FROM (SELECT SUM(t.quantity) AS sq, t.transactionId
      FROM production AS t
      WHERE t._class = "model.TransactionItem"
            AND t.originalSource = "Y1"
            AND t.code = "x1"
            AND t.timestamp BETWEEN "2020-02-02T23:00:00Z" AND "2022-02-03T22:59:59Z"
      GROUP BY t.transactionId) AS st
JOIN production AS trs ON KEYS st.transactionId
WHERE trs.status != "deleted";

OR

SELECT SUM(trsI.quantity)/COUNT(DISTINCT trsI.transactionId) AS amount
FROM production trsI
JOIN production trs ON KEYS trsI.transactionId
WHERE trsI._class = "model.TransactionItem"
      AND trsI.originalSource = "Y1"
      AND trsI.code = "x1"
      AND trsI.timestamp BETWEEN "2020-02-02T23:00:00Z" AND "2022-02-03T22:59:59Z"
      AND trs.status != "deleted";

@vsr1 Thanks for suggestions!