I’ll continue this thread as I’m also experiencing slow performance issue with count(*)
query on Couchbase 4.5 Beta.
My use case is very typical and extremely simple. I store user session documents and each document has user
attribute which contains user identifier as a string.
Document example:
{
...
"user": "123456789",
...
}
I created the following index:
CREATE INDEX `user-idx` ON `default`(`user`) USING GSI;
Now I want to find 10 users with biggest number of open (existing) sessions. The query is simple:
select `user`, count(*) from `default` group by `user` order by count(*) desc limit 10;
That query takes ~8sec on ~50k documents in the database. Extremely long!
I searched the forum and found a suggestion to add where field is not null
clause to the query.
And here is the new query:
select `user`, count(*) from `default` where `user` is not null group by `user` order by count(*) desc limit 10;
This query takes ~2.5 sec, yes 3 times faster but still very slow comparing to RDBMS.
Here is the EXPLAIN
output:
cbq> explain select `user`, count(*) from `default` where `user` is not null group by `user` order by count(*) desc limit 10;
{
"requestID": "ca80774d-cd2c-4cbe-a318-924627b68991",
"signature": "json",
"results": [
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"covers": [
"cover ((`default`.`user`))",
"cover ((meta(`default`).`id`))"
],
"index": "user-idx",
"index_id": "f631c4c57f717e25",
"keyspace": "default",
"namespace": "default",
"spans": [
{
"Range": {
"Inclusion": 0,
"Low": [
"null"
]
}
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(cover ((`default`.`user`)) is not null)"
},
{
"#operator": "InitialGroup",
"aggregates": [
"count(*)"
],
"group_keys": [
"cover ((`default`.`user`))"
]
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"count(*)"
],
"group_keys": [
"cover ((`default`.`user`))"
]
},
{
"#operator": "FinalGroup",
"aggregates": [
"count(*)"
],
"group_keys": [
"cover ((`default`.`user`))"
]
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "cover ((`default`.`user`))"
},
{
"expr": "count(*)"
}
]
}
]
}
}
]
},
{
"#operator": "Order",
"limit": "10",
"sort_terms": [
{
"desc": true,
"expr": "count(*)"
}
]
},
{
"#operator": "Limit",
"expr": "10"
},
{
"#operator": "FinalProject"
}
]
},
"text": "select `user`, count(*) from `default` where `user` is not null group by `user` order by count(*) desc limit 10"
}
],
"status": "success",
"metrics": {
"elapsedTime": "4.195335ms",
"executionTime": "4.101078ms",
"resultCount": 1,
"resultSize": 4955
}
}
As you can see the query uses IndexScan
, not IndexCountScan
and I don’t know why.
Unless I’m doing something wrong (what could it be??) I don’t really see that the problem is fixed in 4.5 Beta.