Was wonder about the plausibility of a query type, suppose the I have a (Array) document:
[{ timestamp: ““2020-11-10T18:00:03.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:02:03.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:04:04.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:10:03.437611Z””, values: {…}}…]
I want to query the document and fetch items which have a 5 minutes interval between them, so from the above document i am suppose to be left with only two (the 1st and last entries) after executing the query.
How might you go about it? any help would very appreciated
what if i have a few in that same minute? its exactly what I am hopeful of learning, is there a way i could preserve some state that the where clause would rely on ?
this returns only the two first entries regardless of any other that might exist…
if the initial (Array) document would have been bigger, this query would have reflected it but with two u1\u2 entries with the same first 2 values detected but not incrementally
Try this it will work. There are two loops. Outer loop walks through all elements. If there is match with inner loop it included element and terminate inner loop for that element no need check other elements.
ARRAY (FIRST u1
FOR u2 IN arr1
WHEN ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000
END)
FOR u1 IN arr1
END
Thank you for the fast reply(!)
was not exactly sure how you meant i should put it, did smth like this:
SELECT RAW x FROM [0] dd
JOIN (
SELECT RAW entry
FROM Data entries UNNEST entries entry
WHERE STR_TO_MILLIS(entry.timestamp) >= STR_TO_MILLIS(“2020-11-10T18:00:03.437611Z”)
AND STR_TO_MILLIS(entry.timestamp) < STR_TO_MILLIS(“2020-11-10T19:17:40.7401607Z”)
) arr1 ON TRUE
LET x = ARRAY (FIRST u1
FOR u2 IN arr1
WHEN ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5601000
END
) FOR u1 IN arr1 END
got an array of 364 nulls, wondering what am i missing
again, thanks for the help fellas, I appreciate it immensely
SELECT
ARRAY (FIRST u1 FOR u2 IN d.entries
WHEN u2.timestamp >= "2020-11-10T18:00:03.437611Z" AND u2.timestamp < "2020-11-10T19:17:40.7401607Z"
AND ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5601000
END)
FOR u1 IN d.entries
WHEN u1.timestamp >= "2020-11-10T18:00:03.437611Z" AND u1.timestamp < "2020-11-10T19:17:40.7401607Z"
END AS res
FROM Data AS d
WHERE ANY v IN d.entries SATISFIES v.timestamp >= "2020-11-10T18:00:03.437611Z" AND v.timestamp < "2020-11-10T19:17:40.7401607Z" END;
CREATE INDEX ix1 ON Data( DISTINCT e.timestamp FOR e IN entries END);
Right, but I still think it is a typical statistics/analytics problem very close to building a histogram. Split your data into 5 minute bins and pick an item with max timestamp from each bin. It is not clear from your description how to treat gaps longer then 5 minutes though.
Thank you, @vsr1, Your solution is way above my level of expertise in N1QL. I do not need it right now but looking for implementing some standard analytics tasks in Community Edition lacking Analytics module.
Hi fellas, ended up going another way by mix and matching, thanks for everyone’s time!
FROM DATE_RANGE_STR(“2020-11-10T18:00:03.437611Z”, “2020-11-10T19:17:40.7401607Z”, ‘minute’, 5) time
LET
entries = (FROM TheData ens UNNEST ens e
WHERE META(ens).id LIKE “Entry_92c8546a-7564-45c1-8f60-e7a37987f580_3f30f81d-4b54-4693-bb3e-76969eee1e90_%”
AND MILLIS(e.timestamp) >= MILLIS(“2020-11-10T18:00:03.437611Z”)
AND MILLIS(e.timestamp) < MILLIS(“2020-11-10T19:17:40.7401607Z”)
SELECT RAW e)
, sel_ = FIRST v FOR i:v IN entries WHEN MILLIS(v.timestamp) > MILLIS(time) END
WHERE sel_ IS NOT NULL
SELECT RAW ARRAY_AGG(sel_)
LIMIT 1000
hopes this help somebody
i tried to join the subquery (TheData) but to no avail, hopefully this will just keep getting the data from a cache… will follow up
again thanks for the time!