Help with DISTINCT Query?

dbragg · May 29, 2019, 8:59pm

I have a bucket with records structured as follows:
{
“Id”: null,
“ApplicationName”: “MyApp”,
“ApplicationInstance”: “Testing”,
“TimeStamp”: “2019-03-22T06:28:03.3798162-07:00”,
“LogLevel”: “Info”,
“Message”: “GetWaitingEmailMessagesCommand Success”,
“MetaData”: {
“request”: {},
“response”: {
“ExecutionTime”: 69,
“Result”: ,
“Exception”: null,
“ResultType”: 0,
“SourceCommandType”: “EMT.Common.Business.BusinessCommands.EmailHistoryCommands.GetWaitingEmailMessagesCommand, EMT.Common.Business, Version=4.2.0.0, Culture=neutral, PublicKeyToken=null”,
“ShortSourceName”: “GetWaitingEmailMessagesCommand”,
“ValidationResult”: null
}
}
}

And an index that looks like this:
CREATE INDEX Logging_ApplicationName ON CentralLogs(ApplicationName)

And I am trying to run this query:

SELECT DISTINCT ApplicationName
FROM CentralLogs
WHERE ApplicationName IS NOT MISSING

The query is using the index, but it takes way too long!! Usually it times out. Sometimes it will come back with a response, but as the number of documents has increased (35 million) it rarely ever succeeds.

I’ve also tried:

SELECT ApplicationName
FROM CentralLogs
WHERE ApplicationInstance IS NOT MISSING
GROUP BY ApplicationName

…with the same result. The correct index is used, but still takes too long.

Can anyone suggest a query/index strategy to get these distinct values from the bucket?

vsr1 · May 29, 2019, 10:09pm

The index and query are right. As you have 35Million items it taking time. If you have EE group by query can take advantage of partition and index aggregations described here https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/, https://blog.couchbase.com/couchbase-gsi-index-partitioning/

CREATE INDEX Logging_ApplicationName ON CentralLogs(ApplicationName);

SELECT ApplicationName
FROM CentralLogs
WHERE ApplicationName IS NOT MISSING
GROUP BY ApplicationName;

SELECT DISTINCT ApplicationName
FROM CentralLogs
WHERE ApplicationName IS NOT MISSING;

Topic		Replies	Views
Distinct values across large number of documents SQL++	7	1216	February 10, 2021
Index on Array not extracting correct information SQL++	3	425	November 15, 2022
Count of distinct is slow in couchbase Couchbase Server query , n1ql , index	9	1793	February 16, 2022
Help with index SQL++ query , n1ql , index	3	411	June 2, 2023
Query optimization as its taking more time SQL++ index	5	617	July 20, 2022

Help with DISTINCT Query?

Related topics