How to group result based on time intervals

I have a requirement to group the data by time intervals, (15 min, 30 min, 1 hour, 2 hour, A day etc). And count the number of partySize (its a field in the doc)
To explain my case better below are the sample docs that I created

  • INSERT INTO places VALUES (“def01”,{ “createdAt”: “2023-08-07T01:00:00”, “name”: “andrea” , “partySize”:“1”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def02”, { “createdAt”: “2023-08-07T01:10:00”, “name”: “alex” , “partySize”:“1”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def03”, { “createdAt”: “2023-08-07T03:10:00”, “name”: “bob” , “partySize”:“1”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def04”, { “createdAt”: “2023-08-07T03:15:00”, “name”: “ben” , “partySize”:“5”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def05”, { “createdAt”: “2023-08-07T04:15:00”, “name”: “chris” , “partySize”:“1”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def06”, { “createdAt”: “2023-08-07T03:03:00”, “name”: “bacca” , “partySize”:“1”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def07”, { “createdAt”: “2023-08-07T03:17:00”, “name”: “buchi” , “partySize”:“1”, “tp”:“qp”});
  • INSERT INTO places VALUES (“def08”,{ “createdAt”: “2023-08-07T01:10:00”, “name”: “andy duplicate” , “partySize”:“1”, “tp”:“qp”});

I was able to achieve what i needed by below query (yet tested only on smaller data set).

1. Trying to group them by intervals of 15 minutes from start and end datetime
SELECT time AS createdAt,
SUM(
(CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt), MILLIS(time),‘millisecond’) BETWEEN 0 AND 899999 THEN d.partySize ELSE 0 END)
) AS count
FROM places AS d UNNEST DATE_RANGE_STR(“2023-08-07T01:00:00”, “2023-08-07T05:00:00”, ‘minute’, 15) AS time
WHERE d.tp=“qp” AND d.createdAt IS NOT NULL
GROUP BY time
ORDER BY time

Below is the output
[
{
“count”: 3,
“createdAt”: “2023-08-07T01:00:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:15:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:30:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:45:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:00:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:15:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:30:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:45:00”
},
{
“count”: 2,
“createdAt”: “2023-08-07T03:00:00”
},
{
“count”: 6,
“createdAt”: “2023-08-07T03:15:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T03:30:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T03:45:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:00:00”
},
{
“count”: 1,
“createdAt”: “2023-08-07T04:15:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:30:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:45:00”
}
]

2. try same query in span of 30 minutes

SELECT time AS createdAt,
SUM(
(CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt), MILLIS(time),‘minute’) BETWEEN 0 AND 30 THEN d.partySize ELSE 0 END)
) AS count
FROM places AS d UNNEST DATE_RANGE_STR(“2023-08-07T01:00:00”, “2023-08-07T05:00:00”, ‘minute’, 30) AS time
WHERE d.tp=“qp” AND d.createdAt IS NOT NULL
GROUP BY time
ORDER BY time

Below is the output, i have omitted the 0 value ouput but notice the time intervals are seperated by 30 mins
{
“count”: 3,
“createdAt”: “2023-08-07T01:00:00”
},
{
“count”: 8,
“createdAt”: “2023-08-07T03:00:00”
},
{
“count”: 1,
“createdAt”: “2023-08-07T04:00:00”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:30:00”
}

3. Tried with 1 hour, and this is very the flaw in the query surfaced, my approach was same but the results are incorrect.

SELECT time AS createdAt,
SUM(
(CASE WHEN DATE_DIFF_MILLIS(MILLIS(d.createdAt), MILLIS(time),‘hour’) BETWEEN 0 AND 23 THEN d.partySize ELSE 0 END)
) AS count
FROM places AS d UNNEST DATE_RANGE_STR(“2023-08-07T01:00:00”, “2023-08-07T05:00:00”, ‘hour’, 1) AS time
WHERE d.tp=“qp” AND d.createdAt IS NOT NULL
GROUP BY time
ORDER BY time

I took some inspiration from this post
Please Help @geraldapeoples @vsr1

I note that partySize is string data so needs conversion for summing.

By hour is the simplest:

SELECT SUM(TO_NUMBER(p.partySize)) periodTotal, period
FROM places p
LET period = DATE_TRUNC_STR(p.createdAt,"hour")
GROUP BY period
ORDER BY period
;

Sub-hour requires a little more processing in the LET statement:

SELECT SUM(TO_NUMBER(p.partySize)) periodTotal, period
FROM places p
LET ms = STR_TO_MILLIS(p.createdAt), period = MILLIS_TO_STR(ms - ms % (15*60*1000))
GROUP BY period
ORDER BY period
;
SELECT SUM(TO_NUMBER(p.partySize)) periodTotal, period
FROM places p
LET ms = STR_TO_MILLIS(p.createdAt), period = MILLIS_TO_STR(ms - ms % (30*60*1000))
GROUP BY period
ORDER BY period
;

If you want to fill in the missing intervals, then it is probably simplest to join to an expression that generates them:

SELECT SUM(IFMISSINGORNULL(TO_NUMBER(p.partySize),0)) periodTotal, allp period                                                      
FROM (date_range_str("2023-08-07T00:00:00","2023-08-08T00:00:00","minute",15)) allp
  LEFT OUTER JOIN places p ON STR_TO_MILLIS(allp) = STR_TO_MILLIS(p.createdAt)-STR_TO_MILLIS(p.createdAt)%(15*60*1000)
GROUP BY allp
ORDER BY allp
;

HTH.

A further variation that fills in the intervals:

WITH intvmins AS (60)                                                                                                               
,startTime AS ("2023-08-07T00:00:00")
,endTime AS (DATE_ADD_STR(startTime,8,"hour"))
,allp AS (DATE_RANGE_STR(startTime,endTime,"minute",intvmins))
,p AS (
  SELECT pl.partySize, prd
  FROM places pl
  LET ms = STR_TO_MILLIS(pl.createdAt), prd = ms - ms % (intvmins*60*1000)
)
SELECT SUM(IFMISSINGORNULL(TO_NUMBER(p.partySize),0)) periodTotal, allp period
FROM allp LEFT OUTER JOIN p ON STR_TO_MILLIS(allp) = p.prd
GROUP BY allp
ORDER BY allp
;

These examples rely on a primary scan on “places”, but if there are additional filters a suitable index could be used - i.e. with the inclusion of tp=“qp” and createdAt IS NOT NULL, something like:

CREATE INDEX ix1 ON `places`(`createdAt`,`partySize`) WHERE `tp` = 'qp' AND `createdAt` IS NOT NULL

could be used to improve performance.

Use IDIV() i.e. integer division truncate the time to interval you need an group the results
$intrvl = number of minutes groups you want i.e 15, 30, 60, 120, 24*60

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt
FROM (SELECT IDIV(STR_TO_MILLIS(d), $intrvl*60*1000)*$intrvl*60*1000 AS time, 0 AS partySize
      FROM DATE_RANGE_STR("2023-08-07T01:00:00", "2023-08-07T05:00:00", "minute", $intrvl) AS d

      UNION ALL

      SELECT IDIV(STR_TO_MILLIS(createdAt), $intrvl*60*1000)*$intrvl*60*1000 AS time, TO_NUMBER(partySize) AS partySize
      FROM places AS p
      WHERE p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-07T05:00:00" ) AS t
GROUP BY time
ORDER BY createdAt;

Hello @dh
Thanks for your help. I am running all the queries to check the result. Below query runs fine, but it does not generate all the intervals, i want to ones which do not have any count as well.

SELECT SUM(IFMISSINGORNULL(TO_NUMBER(p.partySize),0)) periodTotal, allp period                                                      
FROM (date_range_str("2023-08-07T01:00:00","2023-08-07T05:00:00","minute",15)) allp
  LEFT OUTER JOIN places p ON STR_TO_MILLIS(allp) = STR_TO_MILLIS(p.createdAt)-STR_TO_MILLIS(p.createdAt)%(15*60*1000)
  WHERE p.tp="qp" AND p.createdAt IS NOT NULL
GROUP BY allp
ORDER BY allp
;

[
{
“period”: “2023-08-07T01:00:00”,
“periodTotal”: 3
},
{
“period”: “2023-08-07T03:00:00”,
“periodTotal”: 2
},
{
“period”: “2023-08-07T03:15:00”,
“periodTotal”: 6
},
{
“period”: “2023-08-07T04:15:00”,
“periodTotal”: 1
}
]

Strange - the LEFT OUTER JOIN should be producing all results. I get:

cbq> SELECT SUM(IFMISSINGORNULL(TO_NUMBER(p.partySize),0)) periodTotal, allp period
   2 FROM (date_range_str("2023-08-07T00:00:00","2023-08-08T00:00:00","minute",15)) allp
   3    LEFT OUTER JOIN places p ON STR_TO_MILLIS(allp) = STR_TO_MILLIS(p.createdAt)-STR_TO_MILLIS(p.createdAt)%(15*60*1000)
   4 GROUP BY allp
   5 ORDER BY allp
   6 ;
{
    "requestID": "5d137f7a-1e56-498b-a8e8-dae914377bcd",
    "signature": {
        "periodTotal": "number",
        "period": "json"
    },
    "results": [
    {
        "periodTotal": 0,
        "period": "2023-08-07T00:00:00"
    },
    {
        "periodTotal": 0,
        "period": "2023-08-07T00:15:00"
    },
...
    {
        "periodTotal": 0,
        "period": "2023-08-07T23:45:00"
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "110.494291ms",
        "executionTime": "110.317036ms",
        "resultCount": 96,
        "resultSize": 7008,
        "serviceLoad": 3,
        "sortCount": 96
    }
}

What version are you using ? - Perhaps it is a bug we’ve addressed.

am using
Community Edition 7.0.0 build 5302

OK, seems to be something we’ve addressed since 7.0.0. In 7.0.0 the WITH variant should work - does in my testing at least. As does @vsr1’s solution.

Am sorry, But when I try @vsr1 solution, i have the same problem

[
{
“count”: 3,
“createdAt”: “2023-08-07T00:30:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:30:00+05:30”
},
{
“count”: 8,
“createdAt”: “2023-08-07T02:30:00+05:30”
},
{
“count”: 1,
“createdAt”: “2023-08-07T03:30:00+05:30”
}
]

I have added an extra WHERE condition so that my index is used
CREATE INDEX idx_qp_createdAt ON places(createdAt,name) WHERE (tp = “qp”)

Strange, I installed 7.0.0-5302 and it worked… the exact statement I ran was:

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt
FROM (SELECT IDIV(STR_TO_MILLIS(d), 15*60*1000)*15*60*1000 AS time, 0 AS partySize
      FROM DATE_RANGE_STR("2023-08-07T01:00:00", "2023-08-07T05:00:00", "minute", 15) AS d

      UNION ALL

      SELECT IDIV(STR_TO_MILLIS(createdAt), 15*60*1000)*15*60*1000 AS time, TO_NUMBER(partySize) AS partySize
      FROM places AS p
      WHERE p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-07T05:00:00" ) AS t
GROUP BY time
ORDER BY createdAt;

and I got 16 results (with the example data).

As it is a UNION the first branch should be producing all 16 values - can you run that in isolation to confirm?

SELECT IDIV(STR_TO_MILLIS(d), 15*60*1000)*15*60*1000 AS time, 0 AS partySize
      FROM DATE_RANGE_STR("2023-08-07T01:00:00", "2023-08-07T05:00:00", "minute", 15) AS d

Yes I did. And I have the 16 results too.

And more strangely when I ran the complete query for an interval for 15 minutes I get all the 16 results.

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt
FROM (SELECT IDIV(STR_TO_MILLIS(d), 15601000)1560*1000 AS time, 0 AS partySize
FROM DATE_RANGE_STR(“2023-08-07T01:00:00”, “2023-08-07T05:00:00”, “minute”, 15) AS d

  UNION ALL

  SELECT IDIV(STR_TO_MILLIS(createdAt), 15*60*1000)*15*60*1000 AS time, TO_NUMBER(partySize) AS partySize
  FROM places AS p
  WHERE p.tp="qp" AND p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-07T05:00:00" ) AS t

GROUP BY time
ORDER BY createdAt;

output →
[
{
“count”: 3,
“createdAt”: “2023-08-07T01:00:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:15:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:30:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T01:45:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:00:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:15:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:30:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T02:45:00+05:30”
},
{
“count”: 2,
“createdAt”: “2023-08-07T03:00:00+05:30”
},
{
“count”: 6,
“createdAt”: “2023-08-07T03:15:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T03:30:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T03:45:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:00:00+05:30”
},
{
“count”: 1,
“createdAt”: “2023-08-07T04:15:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:30:00+05:30”
},
{
“count”: 0,
“createdAt”: “2023-08-07T04:45:00+05:30”
}
]

But when i run with an interval of 60 or 120 minutes (to group them by 1 hour or 2 hour)
I dont get the 0 valued intervals.

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt
FROM (SELECT IDIV(STR_TO_MILLIS(d), 120601000)12060*1000 AS time, 0 AS partySize
FROM DATE_RANGE_STR(“2023-08-07T01:00:00”, “2023-08-07T05:00:00”, “minute”, 15) AS d

  UNION ALL

  SELECT IDIV(STR_TO_MILLIS(createdAt), 120*60*1000)*120*60*1000 AS time, TO_NUMBER(partySize) AS partySize
  FROM places AS p
  WHERE p.tp="qp" AND p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-08T05:00:00" ) AS t

GROUP BY time
ORDER BY createdAt;

output

[
{
“count”: 3,
“createdAt”: “2023-08-06T23:30:00+05:30”
},
{
“count”: 8,
“createdAt”: “2023-08-07T01:30:00+05:30”
},
{
“count”: 1,
“createdAt”: “2023-08-07T03:30:00+05:30”
}
]

You need to change all the 15’s to 30’s or 60’s. So also the last argument in DATE_RANGE_STR. (Or you can use the original and just supply the parameter value.)

e.g. this gives 8 results:

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt                                                                    
FROM (SELECT IDIV(STR_TO_MILLIS(d), 30*60*1000)*30*60*1000 AS time, 0 AS partySize
      FROM DATE_RANGE_STR("2023-08-07T01:00:00", "2023-08-07T05:00:00", "minute", 30) AS d

      UNION ALL

      SELECT IDIV(STR_TO_MILLIS(createdAt), 30*60*1000)*30*60*1000 AS time, TO_NUMBER(partySize) AS partySize
      FROM places AS p
      WHERE p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-07T05:00:00" ) AS t
GROUP BY time
ORDER BY createdAt;

4 results:

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt
FROM (SELECT IDIV(STR_TO_MILLIS(d), 60*60*1000)*60*60*1000 AS time, 0 AS partySize
      FROM DATE_RANGE_STR("2023-08-07T01:00:00", "2023-08-07T05:00:00", "minute", 60) AS d

      UNION ALL

      SELECT IDIV(STR_TO_MILLIS(createdAt), 60*60*1000)*60*60*1000 AS time, TO_NUMBER(partySize) AS partySize
      FROM places AS p
      WHERE p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-07T05:00:00" ) AS t
GROUP BY time
ORDER BY createdAt;

2 results:

SELECT SUM(partySize) AS count, MILLIS_TO_STR(time) AS createdAt
FROM (SELECT IDIV(STR_TO_MILLIS(d), 120*60*1000)*120*60*1000 AS time, 0 AS partySize
      FROM DATE_RANGE_STR("2023-08-07T01:00:00", "2023-08-07T05:00:00", "minute", 120) AS d

      UNION ALL

      SELECT IDIV(STR_TO_MILLIS(createdAt), 120*60*1000)*120*60*1000 AS time, TO_NUMBER(partySize) AS partySize
      FROM places AS p
      WHERE p.createdAt >= "2023-08-07T01:00:00" AND p.createdAt < "2023-08-07T05:00:00" ) AS t
GROUP BY time
ORDER BY createdAt;

HTH.

2 Likes

I used named parameter to avoid many places change. Also check timezone (UTC) to represent properly. Also use filter Time when used days, or higher hours

My Bad , I missed adding the right number of minutes in the interval part.
Can you give me an example of filter time if am using a period of days. Also I was struggling with the time representation, what do I have to do so that the dates used are what I specify as start and end.

“createdAt”: “2023-08-07T01:00:00”
You don’t have timezone , when converted to MILLIS to STR, STR to MILLIS it might use query service timezone. Check those.

p.createdAt >= “2023-08-07T01:00:00” AND p.createdAt < “2023-08-07T05:00:00”
Your time start 1:00 say you have 2 hours it will consider (IDIV of 2 hours) 00:00:00 to 02:00:00 , …
You may loose 1 hour data in count
Same is true for higher hours or days.
It all depends on what you want how you calculate.

The following query generates based on start time on the predicate upto milliseconds.
Next one increments by interval minutes

If using 3 days set intrvl 3*24*60 (i.e. 3 days * 24 hours * 60 minutes)

$stime "2023-08-07T01:10:12.123Z"
$etime "2023-08-07T05:10:17.123Z"
$intrvl 15

SELECT SUM(partySize) AS count, MILLIS_TO_TZ(time,"UTC") AS createdAt
FROM (SELECT d AS time, MILLIS_TO_TZ(d,"UTC") AS t, 0 AS partySize
      FROM ARRAY_RANGE(STR_TO_MILLIS($stime), STR_TO_MILLIS($etime), $intrvl*60*1000) AS d

      UNION ALL

      SELECT (IDIV((STR_TO_MILLIS(createdAt) - STR_TO_MILLIS($stime)),$intrvl*60*1000)*$intrvl*60*1000)+STR_TO_MILLIS($stime) AS time,
            TO_NUMBER(partySize) AS partySize
      FROM [{ "createdAt": "2023-08-07T01:00:00Z", "name": "andrea" , "partySize":"1", "tp":"qp"},
            { "createdAt": "2023-08-07T01:10:00Z", "name": "alex" , "partySize":"1", "tp":"qp"},
            { "createdAt": "2023-08-07T03:10:00Z", "name": "bob" , "partySize":"1", "tp":"qp"},
            { "createdAt": "2023-08-07T03:15:00Z", "name": "ben" , "partySize":"5", "tp":"qp"},
            { "createdAt": "2023-08-07T04:15:00Z", "name": "chris" , "partySize":"1", "tp":"qp"},
            { "createdAt": "2023-08-07T03:03:00Z", "name": "bacca" , "partySize":"1", "tp":"qp"},
            { "createdAt": "2023-08-07T03:17:00Z", "name": "buchi" , "partySize":"1", "tp":"qp"},
            { "createdAt": "2023-08-07T01:10:00Z", "name": "andy duplicate" , "partySize":"1", "tp":"qp"}] AS p
      WHERE p.createdAt BETWEEN $stime AND $etime ) AS t
GROUP BY time
ORDER BY createdAt;

Output removed 0 count
  {
        "count": 2,
        "createdAt": "2023-08-07T02:55:12.123Z"
    },
    {
        "count": 6,
        "createdAt": "2023-08-07T03:10:12.123Z"
    },
    {
        "count": 1,
        "createdAt": "2023-08-07T04:10:12.123Z"
    }

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.