Query to fetch items at certain interval between them (if it exist)

mazorsk · November 18, 2020, 8:05pm

Hi couchbasia!

Was wonder about the plausibility of a query type, suppose the I have a (Array) document:
[{ timestamp: ““2020-11-10T18:00:03.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:02:03.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:04:04.437611Z””, values: {…}}, { timestamp: ““2020-11-10T18:10:03.437611Z””, values: {…}}…]

I want to query the document and fetch items which have a 5 minutes interval between them, so from the above document i am suppose to be left with only two (the 1st and last entries) after executing the query.
How might you go about it? any help would very appreciated

Not sure know if its the correct place to post

mazorsk · November 18, 2020, 8:40pm

PS this is a solution as a view

function (doc, meta) {
var iv = doc.length > 0 ? new Date(doc[0].timestamp) : new Date();

for(var n in doc)
{
var e = doc[n];
var tm = new Date(e.timestamp);

if((iv.getTime() + 5 * 60 * 1000) < tm.getTime())
{
  iv = tm;
  emit(null, e);
}

}
}

thanks!

mazorsk · November 18, 2020, 9:20pm

i did not used the view since i do not know how to pass parameters to it (using .net sdk)

skaryshev · November 18, 2020, 10:55pm

Extract integer minutes from timestamp and use integer division / modulo
WHERE TONUMBER(DATE_PART_STR(timestamp, ‘minute’)) % 5 = 0

vsr1 · November 18, 2020, 11:23pm

ARRAY  (FIRST   u1
        FOR u2 IN arr1 
        WHEN  ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000  
        END)
FOR u1 IN arr1 
END

It includes all documents that has 5 minute apart even there are multiple in same minute

mazorsk · November 18, 2020, 11:36pm

what if i have a few in that same minute? its exactly what I am hopeful of learning, is there a way i could preserve some state that the where clause would rely on ?

mazorsk · November 18, 2020, 11:37pm

this returns only the two first entries regardless of any other that might exist…
if the initial (Array) document would have been bigger, this query would have reflected it but with two u1\u2 entries with the same first 2 values detected but not incrementally

vsr1 · November 18, 2020, 11:39pm

Try this it will work. There are two loops. Outer loop walks through all elements. If there is match with inner loop it included element and terminate inner loop for that element no need check other elements.

ARRAY  (FIRST   u1
        FOR u2 IN arr1 
        WHEN  ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000  
        END)
FOR u1 IN arr1 
END

mazorsk · November 19, 2020, 12:01am

Thank you for the fast reply(!)
was not exactly sure how you meant i should put it, did smth like this:

SELECT RAW x FROM [0] dd
JOIN (
SELECT RAW entry
FROM Data entries UNNEST entries entry
WHERE STR_TO_MILLIS(entry.timestamp) >= STR_TO_MILLIS(“2020-11-10T18:00:03.437611Z”)
AND STR_TO_MILLIS(entry.timestamp) < STR_TO_MILLIS(“2020-11-10T19:17:40.7401607Z”)
) arr1 ON TRUE
LET x = ARRAY (FIRST u1
FOR u2 IN arr1
WHEN ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5601000
END
) FOR u1 IN arr1 END

got an array of 364 nulls, wondering what am i missing
again, thanks for the help fellas, I appreciate it immensely

vsr1 · November 19, 2020, 12:07am

Your original post expected is wrong. because 00, 03, 04, 10 minutes. All 4 qualifies because first 3 are 5 min apart from last

Change last one 18:06:03 it will get right results

SELECT ARRAY  (FIRST   u1 FOR u2 IN arr1 WHEN  ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5 *60*1000  END) FOR u1 IN arr1 END AS res
LET arr1 = [{ "timestamp": "2020-11-10T18:00:03.437611Z"}, { "timestamp": "2020-11-10T18:02:03.437611Z"}, { "timestamp": "2020-11-10T18:04:04.437611Z"}, { "timestamp": "2020-11-10T18:06:03.437611Z"}];

 "results": [
    {
        "res": [
            {
                "timestamp": "2020-11-10T18:00:03.437611Z"
            },
            {
                "timestamp": "2020-11-10T18:06:03.437611Z"
            }
        ]
    }
    ]

vsr1 · November 19, 2020, 12:17am

SELECT
    ARRAY (FIRST  u1 FOR u2 IN d.entries
           WHEN u2.timestamp >= "2020-11-10T18:00:03.437611Z" AND u2.timestamp < "2020-11-10T19:17:40.7401607Z"
                AND ABS(MILLIS(u1.timestamp)-MILLIS(u2.timestamp)) > 5601000
           END)
    FOR u1 IN d.entries
    WHEN u1.timestamp >= "2020-11-10T18:00:03.437611Z" AND u1.timestamp < "2020-11-10T19:17:40.7401607Z"
    END AS res
FROM Data AS d
WHERE ANY v IN d.entries SATISFIES v.timestamp >= "2020-11-10T18:00:03.437611Z" AND v.timestamp < "2020-11-10T19:17:40.7401607Z" END;

CREATE INDEX ix1 ON Data( DISTINCT e.timestamp FOR e IN entries END);

skaryshev · November 19, 2020, 4:01am

Right, but I still think it is a typical statistics/analytics problem very close to building a histogram. Split your data into 5 minute bins and pick an item with max timestamp from each bin. It is not clear from your description how to treat gaps longer then 5 minutes though.

vsr1 · November 19, 2020, 3:41pm

Hi @skaryshev,

If you need that you can try this. Grouping 0, 5th, 10th, 15th,… minute and projecting latest timestamp in the group

SELECT MAX([e.timestamp,e])[1].* 
FROM default AS e
WHERE ..............
GROUP BY IDIV(MILLIS(e.timestamp),300000)

skaryshev · November 19, 2020, 4:42pm

Thank you, @vsr1, Your solution is way above my level of expertise in N1QL. I do not need it right now but looking for implementing some standard analytics tasks in Community Edition lacking Analytics module.

mazorsk · November 19, 2020, 9:19pm

Hi fellas, ended up going another way by mix and matching, thanks for everyone’s time!

FROM DATE_RANGE_STR(“2020-11-10T18:00:03.437611Z”, “2020-11-10T19:17:40.7401607Z”, ‘minute’, 5) time
LET
entries = (FROM TheData ens UNNEST ens e
WHERE META(ens).id LIKE “Entry_92c8546a-7564-45c1-8f60-e7a37987f580_3f30f81d-4b54-4693-bb3e-76969eee1e90_%”
AND MILLIS(e.timestamp) >= MILLIS(“2020-11-10T18:00:03.437611Z”)
AND MILLIS(e.timestamp) < MILLIS(“2020-11-10T19:17:40.7401607Z”)
SELECT RAW e)
, sel_ = FIRST v FOR i:v IN entries WHEN MILLIS(v.timestamp) > MILLIS(time) END
WHERE sel_ IS NOT NULL
SELECT RAW ARRAY_AGG(sel_)
LIMIT 1000

hopes this help somebody

i tried to join the subquery (TheData) but to no avail, hopefully this will just keep getting the data from a cache… will follow up
again thanks for the time!

Topic		Replies	Views
Querying couch base view by multiple values Couchbase Server	0	739	May 16, 2019
Query series data with range Couchbase Server query , n1ql	8	79	March 26, 2025
Querying a View based on Multiple Keys to get a range of data Java SDK	7	6650	July 2, 2015
Interval and nearest match SQL++	22	2739	May 2, 2017
Inaccurate match on date SQL++	5	1407	April 6, 2017

Query to fetch items at certain interval between them (if it exist)

Related topics