Hello,
I’ve been helping a customer get the most out of their Couchbase/Sync Gateway servers and we’ve been encountering some issues that we would like to clarify in order to get an understanding of what is going on with inconsistent behaviours/performance when using different parts of the Couchbase stack.
So, we have a query that goes something along these lines:
SELECT *
FROM process
WHERE meta().id NOT LIKE "_sync%"
AND type = "workflow-job"
AND entity.state = "Pending"
AND entity.runAfter = "2018-02-26T09:07:57.693Z"
AND array_count(process.entity.dependsOnJobs) = 0
AND entity.runOn = "BCG-BE-6C4AE161"
ORDER BY entity.startDate ASC
LIMIT 10
Execution times for this query, without any index have been in the order of 10 to 40s, so in order to optimize things we’ve introduced an index for this specific query that goes like this:
CREATE INDEX idx_test ON `process`(type, entity.state) WHERE type = "workflow-job" AND entity.state = "Pending"
We also have another similar index that goes like this:
CREATE INDEX process_workflow_index2 ON `process`(entity.state) WHERE meta().id NOT LIKE "_sync%" AND type = "workflow-job" AND entity.state = "Pending"
The explain plan goes something like this:
[
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IntersectScan",
"scans": [
{
"#operator": "IndexScan",
"index": "idx_test",
"index_id": "b177b75eb23e40f6",
"keyspace": "process",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"workflow-job\"",
"\"Pending\""
],
"Inclusion": 3,
"Low": [
"\"workflow-job\"",
"\"Pending\""
]
}
}
],
"using": "gsi"
},
{
"#operator": "IndexScan",
"index": "process_workflow_index2",
"index_id": "500a23cdebc0904a",
"keyspace": "process",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"Pending\""
],
"Inclusion": 3,
"Low": [
"\"Pending\""
]
}
}
],
"using": "gsi"
},
{
"#operator": "IndexScan",
"index": "#primary",
"index_id": "f7935ad2b645fee2",
"keyspace": "process",
"namespace": "default",
"spans": [
{
"Range": {
"Inclusion": 0,
"Low": [
"null"
]
}
}
],
"using": "gsi"
}
]
},
{
"#operator": "Fetch",
"keyspace": "process",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "((((((not ((meta(`process`).`id`) like \"_sync:%\")) and ((`process`.`type`) = \"workflow-job\")) and (((`process`.`entity`).`state`) = \"Pending\")) and (((`process`.`entity`).`runAfter`) < \"2018-02-26T09:07:57.693Z\")) and (array_count(((`process`.`entity`).`dependsOnJobs`)) = 0)) and (((`process`.`entity`).`runOn`) = \"BGC-BE-6C4AE161\"))"
},
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "self",
"star": true
}
]
}
]
}
}
]
},
{
"#operator": "Order",
"limit": "10",
"sort_terms": [
{
"expr": "((`process`.`entity`).`startDate`)"
}
]
},
{
"#operator": "Limit",
"expr": "10"
},
{
"#operator": "FinalProject"
}
]
},
"text": "SELECT * \nFROM process \nWHERE meta().id not like \"_sync:%\" \nAND type = \"workflow-job\" \nAND entity.state = \"Pending\" \nAND entity.runAfter < \"2018-02-26T09:07:57.693Z\" \nAND array_count(process.entity.dependsOnJobs) = 0 \nAND entity.runOn = \"BGC-BE-6C4AE161\"\nORDER BY entity.startDate ASC \nLIMIT 10"
}
]
Now, when using the query workbench, we’ve been able to consistently get responses within 100 to 800ms (quite an interval, but way better than it used to be).
However, when running the same exact query from the application code using Couchbase SDK, the response times will always increase back to 10 to 40s to get a reply and sometimes we will even get OOM errors that stop the application execution.
Why is this happening? Why do we get such inconsistent behaviour?
Also, why does the index have to have (type, entity.state) when we already use a WHERE clause in the index definition? At least the type should be able to be dismissed from the index (because of the WHERE) however once we remove it the query performance goes down significantly and the explain shows that the used index is the primary one (which should not exist, in the first place).
We are using:
- Couchbase Community 4.5.1 (soon to be upgraded to Enterprise 4.6) and
- Couchbase JS SDK v2.3.5
PS:
As a sidenote, I know that both presented indexes have, more or less, the same information (they are alike), thus the reason why the explain uses both indexes