CB Enterprise Edition 5.5.0 build 2958.
15 Data-only nodes, 4 Index-only nodes, 4 Query-only nodes. All running 32GB RAM.
We get a lot of the above errors/failures lately.
(and the beauty is, in cbq
you just get the prompt back - no error response. in GUI, you get “unexpected error”)
I tested one of the queries that gets this a lot, and it works fine if I run it for a smaller subset of data (e.g. only events from the past 10 days), but if I grow the horizon (say even 15 days) it start to fail. If I shut down all other systems interfacing with CouchBase - it works fine, so obviously something is competing for resources.
I saw that people mentioned OOM for similar issue - but I don’t think so. I don’t think we see any OOM related errors. Also, from the admin GUI, the RAM % on Query servers never goes above 50% (BTW, Why we can’t see CPU/Ram utilization graphs of a given server - for the entire server - and not just per 1 bucket?)
In the GUI log it shows as:
github.com/couchbase/query/execution.(*base).sendItem(0xc422993a40, 0x18d0980, 0xc420dac240, 0xe31340)
goproj/src/github.com/couchbase/query/execution/base.go:348 +0x57 fp=0xc421040db8 sp=0xc421040d78
github.com/couchbase/query/execution.(*IndexScan3).RunOnce.func1()
goproj/src/github.com/couchbase/query/execution/scan_index3.go:127 +0x5e4 fp=0xc421040f50 sp=0xc421040db8
github.com/couchbase/query/util.(*Once).Do(0xc422993b38, 0xc420b10f88)
goproj/src/github.com/couchbase/query/util/sync.go:51 +0x68 fp=0xc421040f78 sp=0xc421040f50
github.com/couchbase/query/execution.(*IndexScan3).RunOnce(0xc422993a40, 0xc420927080, 0x0, 0x0)
goproj/src/github.com/couchbase/query/execution/scan_index3.go:140 +0x9d fp=0xc421040fc0 sp=0xc421040f78
runtime.goexit()
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.8.5/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc421040fc8 sp=0xc421040fc0
created by github.com/couchbase/query/execution.(*base).runConsumer.func1
goproj/src/github.com/couchbase/query/execution/base.go:537 +0x2f6
[goport(/opt/couchbase/bin/cbq-engine)] 2018/08/24 12:56:52 child process exited with status 134
Attached also a section of the query.log
when this happens.
query_log_snippet.log.zip (60.6 KB)
Any idea?
thanks