Hi, I have a bucket with 2M documents (each document is less than 2KB). When I try range scan query with N1QL, it takes more than 1 minute to finish the query (my query should scan 1.2M documents). This is slower than my expectation. I try similar query on a MongoDB cluster, it takes 4s to scan 1M documents over 3M documents.
Any tips to make the query faster? I doubt whether I’m using Couchbase in a wrong way:(
More contexts here:
- My query is “SELECT COUNT(*), AVG(edgeLatency) FROM test WHERE timestampUtc BETWEEN ‘2016-11-22T01:00:00’ AND ‘2016-11-23T06:00:00’”. Here is one response.
{
"requestID": "d37772c4-10cf-4095-a156-8c739ac81c2d",
"clientContextID": "59::60",
"signature": {
"$1": "number",
"$2": "number"
},
"results": [
{
"$1": 1266665,
"$2": 0.49988867718860125
}
],
"status": 0,
"errors": [],
"warnings": [],
"metrics": {
"elapsedTime": "1m22.4896596s",
"executionTime": "1m22.48869s",
"resultCount": 1,
"resultSize": 76,
"mutationCount": 0,
"errorCount": 0,
"warningCount": 0,
"sortCount": 0
}
}
- Field “timestampUtc” is a datetime and I have created GSI INDEX on it.
Definition: CREATE INDEX timestampUtc_index ON test(`timestampUtc`) USING GSI
- This is explanation of my query.
{
"requestID": "a9628394-8d00-42b6-9084-248b4ddc1f75",
"clientContextID": "61::62",
"signature": "json",
"results": [
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"index": "timestamputc_index",
"index_id": "93504482a482e79c",
"keyspace": "test",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"2016-11-23T06:00:00\""
],
"Inclusion": 3,
"Low": [
"\"2016-11-22T01:00:00\""
]
}
}
],
"using": "gsi"
},
{
"#operator": "Fetch",
"keyspace": "test",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "((`test`.`timestampUtc`) between \"2016-11-22T01:00:00\" and \"2016-11-23T06:00:00\")"
},
{
"#operator": "InitialGroup",
"aggregates": [
"avg((`test`.`edgeLatency`))",
"count(*)"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"avg((`test`.`edgeLatency`))",
"count(*)"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"avg((`test`.`edgeLatency`))",
"count(*)"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "count(*)"
},
{
"expr": "avg((`test`.`edgeLatency`))"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "SELECT COUNT(*), AVG(edgeLatency) FROM test WHERE timestampUtc BETWEEN '2016-11-22T01:00:00' AND '2016-11-23T06:00:00'"
}
],
"status": 0,
"errors": [],
"warnings": [],
"metrics": {
"elapsedTime": "5.9998ms",
"executionTime": "4.9983ms",
"resultCount": 1,
"resultSize": 3861,
"mutationCount": 0,
"errorCount": 0,
"warningCount": 0,
"sortCount": 0
}
}
- I have set max_parallelism=8 in my query. This makes my query faster but I cannot get extra improvement if max_parallelism is greater than 8.
- My Couchbase cluster has 12 nodes, each node is a DS13_v2 Azure VM. My bucket is the only bucket running on cluter except for a empty default bucket. My bucket has 60GB RAM quota.
|Size|CPU cores|Memory: GiB|Local SSD: GiB|Max data disks|Max cached disk throughput: IOPS / MBps (cache size in GiB)|Max uncached disk throughput: IOPS / MBps|Max NICs / Network bandwidth|
| ------| ------ | ------ |
|Standard_DS13_v2|8|56|112|16|32,000 / 256 (288)|25,600 / 384|8 high| - My cluster is using Couchbase Server 4.0.0-4051 Community Edition (build-4051). I have tried 4.1 CE and 4.5.1 EE on single machine and it has same performance. With MOI in 4.5.1 EE, the query is faster but it still takes more than 1 minute.
- Covering index doesn’t fit my scenarios.