Could you please post the sample document and index definition.
NOTE: Predicate date LIKE “%06/07/2017%” AND id IS NOT MISSING requires full index scan on the whole index (due to leading %). i.e. 10 million documents. On top it needs to GROUP BY.
I’ll post an example doc later today. Meanwhile, if we drop % and IS
MISSING, how much faster you think the execution will be? Also, will
primary index be enough, or do we need to create a specific secondary index?
It depends. The reason I asked sample document is
want date field format in the document if it is ISO 8601 timestamp it is much easier to query(i.e. “2017-07-06T10:02:46.808-07:00”). You need leading % and can push partial string to the Indexer and produce less items.
Is date format is same in all the documents.
Also requests (i.e sum argument) is not bucket name Requests
Based on that I will suggest index so that avoid fetch.
NOTE: provide sample document that fields referenced in query only
Requests is the name of the bucket, and requests is the field of the
document.
Here is how our date field of the corresponding document looks like:
“date”: "06/07/2017 22-56-04.648"
With this formatting its simply easier to read the date and time, comparing
to the ISO 8601 timestamp format. If needed, we can definitely change it to
whatever will make CouchBase’s life easier.
Here is the (shortened) example structure of the document:
For your query create the following index. It uses covered index (i.e no data Fetch).
CREATE INDEX ix1 ON Requests(date, id, name, requests);
SELECT name, SUM(requests)
FROM Requests
WHERE date LIKE "06/07/2017%" AND id IS NOT MISSING
GROUP BY name;
If the date is ISO 8601 format you can even do this
Only 2016 and 2017 first 6 months data
SELECT name, SUM(requests)
FROM Requests
WHERE date BETWEEN "2016-01-01" AND "2017-06-30" AND id IS NOT MISSING
GROUP BY name;
Only 2017 and group BY month
SELECT SUBSTR(date,5,2) AS month, SUM(requests)
FROM Requests
WHERE date BETWEEN "2017-01-01" AND "2017-12-31" AND id IS NOT MISSING
GROUP BY SUBSTR(date,5,2);
SELECT DATE_PART_STR(date,"month") AS month, SUM(requests)
FROM Requests
WHERE date BETWEEN "2017-01-01" AND "2017-12-31" AND id IS NOT MISSING
GROUP BY DATE_PART_STR(date,"month");
In ISO 8601 format if format is differ also comparable (i.e “2017-07-06T20:43:10.033-07:00” > “2017-07-06”)
You can also set max_parallelism parameter to make it parallel.
Following N1QL: A PRACTICAL GUIDE will have useful information.
By the way - it looks like CB 4.5 doesn’t follow N1QL BETWEEN specification - results that are being returned by the querying service don’t include the ending date.
It should have followed. Can you post query and sample document
Also try x BETEEN start AND end ====> x >= start AND x <= end and see if you getting results.
SELECT country, COUNT(*) AS requests
FROM Request_IDs
WHERE date BETWEEN “2017-07-11” AND "2017-07-12"
AND country='United States’
AND id IS NOT MISSING
GROUP BY country;
If the date string does not explicitly declare the value of a component, then a value of 0 is assumed. For example 2016-02-07 is equivalent to 2016-02-07T00:00:00. This is with the exception of the time zone, which if unspecified in the date string will default to the local system time zone.
Sorry, I didn’t paste the proper date implementation - we do follow the ISO 8601 specification.
For now we are adding 1 more day to the closing date of BETWEEN query - in order to get the data for the day before. We can live with that but it’s worth checking…