Using JOIN, Aggregate function SUM() and GROUP BY without affecting query execution time

Hi Couchbase Gurus,

Need help and advice here,
How can I use JOIN, SUM() and GROUP BY for my document products and product-aggregates.

Below are the data and information you need:

// Partial Query Statement:

SELECT
  META(`products`).id,
  products.productCode,
  products.name,
  SUM(prodAgg.sales) AS totalSales,
  SUM(prodAgg.qtyShipped) AS totalShipped
FROM `bucket` AS products
LEFT JOIN `bucket` AS prodAgg
  ON META(products).id = prodAgg.product
  AND prodAgg.`type` = 'product-aggregate'
WHERE products.`type` = 'product'
GROUP BY prodAgg.product
LIMIT 50

// unique product documents:

[
   {
     Id: product::0010,
     name: ProductName01 
     productCode: 0010,
     type: product 
   },
  {
     id: product::0011,
     name: ProductName11,
     productCode: 0011,
     type: product
   },
  {
     id: product::0014,
     name: ProductName14,
     productCode: 0014,
     type: product,
   },
]

// product-aggregate documents:

[
   {
     product: product::0010,
     qtyShipped: 5,
     sales: 65.89
   },
   {
     product: product::0010,
     qtyShipped: 15,
     sales: 75.50,
     type: ‘product-aggregate’ 
   },
  {
     product: product::0010,
     qtyShipped: 4,
     sales: 10.11,
     type: ‘product-aggregate’ 
   },
 {
     product: product::0011,
     qtyShipped: 5,
     sales: 65.89,
     type: ‘product-aggregate’ 
   },
   {
     product: product::0011,
     prod: 9,
     sales: 55.40,
     type: ‘product-aggregate’ 
   },
]

My expected result would something like these:

[
   {
     Id: product::0010,
     name: ProductName01 
     productCode: 0010,
     type: product,
     totalSales: 151.50,
     totalShipped: 24
   },
  {
     id: product::0011,
     name: ProductName11,
     productCode: 0011,
     type: product,
     totalSales: 121.29,
     totalShipped: 14
   },
  {
     id: product::0014,
     name: ProductName14,
     productCode: 0014,
     type: product,
     totalSales: 0,
     totalShipped: 0
   },
]

Also, how can I optimized my query? I have these number of documents for:
product-aggregates: 930977 +
products (unique): 656

Need you help guys. Thanks :slight_smile:

You have LEFT JOIN and GROUP BY derived from document key. There is small difference on grouping if document is not present (if acceptable) with following query by moving LIMIT inside.

SELECT
      p.id,
      p.productCode,
      p.name,
      SUM(prodAgg.sales) AS totalSales,
      SUM(prodAgg.qtyShipped) AS totalShipped
FROM (SELECT META(`products`).id,
            products.productCode,
            products.name
      FROM `bucket` AS products
      WHERE products.`type` = 'product'
      LIMIT 50) AS p
LEFT JOIN `bucket` AS prodAgg ON p.id = prodAgg.product AND prodAgg.`type` = 'product-aggregate'
GROUP BY p;

Hi @vsr1,

Thank you for your response, is there any way my LIMIT would outside the FROM () because the LIMIT is dynamic base on our api request. Also, we have ORDER BY. :slight_smile:

If you have ORDER BY then you can’t have limit inside then it is almost your query except group by products so that you can project other fields. Once you have GROUP BY, ORDER BY it needs materialize and process all of them.

What is your ORDER BY.

I would ORDER BY totalShipped or totalSales or productName

Hi @vsr1,

Can’t really figure it out how to put the LIMIT and ORDER BY outside FROM()
Do you have any suggestion for this ? need I badly :frowning:

Thanks you

CREATE INDEX ix1 ON (productCode, name) WHERE type = 'product';
CREATE INDEX ix2 ON (product, sales, qtyShipped) WHERE type = 'product-aggregate'

SELECT
      META(p).id,
      p.productCode,
      p.name,
      SUM(prodAgg.sales) AS totalSales,
      SUM(prodAgg.qtyShipped) AS totalShipped
FROM `bucket` AS p
LEFT JOIN `bucket` AS prodAgg ON META(p).id = prodAgg.product AND prodAgg.`type` = 'product-aggregate'
WHERE p.`type` = 'product' AND p.productCode IS NOT NULL
GROUP BY p.productCode, p.name, META(p).id
ORDER BY totalSales
LIMIT 50;

OR

Using 6.50+ EE

   CREATE INDEX ix1 ON (productCode, name) WHERE type = 'product';
    CREATE INDEX ix2 ON (product, sales, qtyShipped) WHERE type = 'product-aggregate';


WITH prodAgg AS (SELECT pa.product, SUM(pa.sales) AS totalSales, SUM(pa.qtyShipped) AS totalShipped
            FROM `bucket` AS pa
            WHERE pa.`type` = "product-aggregate" AND pa.product IS NOT NUL
            GROUP BY pa.product)
SELECT META(p).id,
      p.productCode,
      p.name,
      pag.totalSales,
      pag.totalShipped
FROM `bucket` AS p LEFT JOIN prodAgg AS pag USE HASH(BUILD) ON pga.product = META(p).id
WHERE p.`type` = 'product' AND p.productCode IS NOT NULL
ORDER BY totalSales;