We have a requirement involving a data set of 70 million records. The dev team aims to display the count of a specific document type in the user interface using a group by query, along with other field values from this extensive data set. However, creating a covering index for this purpose would be resource-intensive. Is there an alternative method to maintain the count of these particular document types, allowing us to efficiently retrieve the count from the 70 million documents. Below is sample query and similarly there are around 10-12 queries.
select network, zone, regnw,count(1) from CRM where __t =âSRâ group by network, zone, regnw ;
See my comment above and vsr1âs last comment. Put documents of different types (â__tâ) in separate collections. The count of documents in a collection is accessible without a scan. Otherwise - it will require a scan.
You are counting based on group, keeping separate collection also will not help
SELECT network, zone, regnw, count(1)
FROM CRM AS c
WHERE __t ="SR"
GROUP BY network, zone, regnw ;
CREATE INDEX i1 ON CRN(__t, network, zone, regnw);
Use above index. 70M entries take time. Depends on how many types and how many groups and how often changes you can generate new rows in different collection consolidated results and get those with out aggregates or cache results.
UPSERT INTO newcol1 (KEY k, VALUE v )
SELECT CONCAT2("_", __t, TOSTR(network), TOSTR(zone), TOSTR(regnw)) AS k.
{__t, network, zone, regnw, "cnt":count(1)} AS v
FROM CRM AS c
WHERE __t IS NOT NULL
GROUP BY __t, network, zone, regnw ;
@vsr1, The approach that you suggested seems like a shift-left of the GROUP BY, meaning that instead of running the aggregation when counts are needed, we do it at the time of mutation. Is that correct?
However, since mutations occur at a rate of hundreds or thousands per second, this approach may not be feasible.
Additionally, since the mutation happens via KV access, the application would require two network round tripsâone for the KV-based mutation and another for the UPSERT.
We had considered a similar approach where the mutation happens via KV, and we maintain a separate document, incrementing a count by 1 each time using another KV mutation. This eliminates the need for an UPSERT, but it still suffers from the same challengeâtwo network round trips!