4.5.0 Core dumps with Circular Write Mode compaction for Indexes

We are using 4.5.0-2601 Enterprise Edition (build-2601). I should note that we upgraded from 4.1.1 a couple days ago.

The Circular Write Mode compaction was scheduled to start at 2:22AM. A series of core dumps occurred starting at 2:22AM and up to 3:58AM, when all disk space had been consumed, on the Couchbase Node with the index.

Is this a known issue? And if not, what information (core, logs, etc.) do you need to investigate this? (The core file is ~ 10GB).

Here are logs from the babysitter.log file for the first core dump.

[ns_server:info,2016-09-08T02:22:13.143-04:00,babysitter_of_ns_1@127.0.0.1:<0.122.0>:supervisor_cushion:handle_info:58]Cushion managed supervisor for indexer failed: {abnormal,1}
[error_logger:error,2016-09-08T02:22:13.143-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]** Generic server <0.123.0> terminating
** Last message in was {#Port<0.3538>,{exit_status,1}}
** When Server state == {state,#Port<0.3538>,indexer,
{["[goport] 2016/09/08 02:22:13 /opt/couchbase/bin/indexer terminated: signal: aborted (core dumped)",
"\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/net/http/transport.go:853 +0x10a6",
“created by net/http.(*Transport).dialConn”,
"\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8231abfb0 sp=0xc8231abfa8"],
[“runtime.goexit()”]},
indexer,undefined,[],0}
** Reason for termination ==
** {abnormal,1}

[error_logger:error,2016-09-08T02:22:13.144-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
crasher:
initial call: ns_port_server:init/1
pid: <0.123.0>
registered_name: []
exception exit: {abnormal,1}
in function gen_server:terminate/6 (gen_server.erl, line 744)
ancestors: [<0.122.0>,<0.121.0>,ns_child_ports_sup,ns_babysitter_sup,
<0.56.0>]
messages: [{‘EXIT’,#Port<0.3538>,normal}]
links: [<0.122.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 196650
stack_size: 27
reductions: 8440393
neighbours:

[ns_server:debug,2016-09-08T02:22:13.144-04:00,babysitter_of_ns_1@127.0.0.1:<0.129.0>:supervisor_cushion:init:39]starting ns_port_server with delay of 5000
[error_logger:error,2016-09-08T02:22:13.144-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]** Generic server <0.122.0> terminating
** Last message in was {die,{abnormal,1}}
** When Server state == {state,indexer,5000,
{1473,284053,337615},
undefined,infinity}
** Reason for termination ==
** {abnormal,1}

[error_logger:error,2016-09-08T02:22:13.145-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
crasher:
initial call: erlang:apply/2
pid: <0.121.0>
registered_name: []
exception exit: {abnormal,1}
in function restartable:loop/4 (src/restartable.erl, line 69)
ancestors: [ns_child_ports_sup,ns_babysitter_sup,<0.56.0>]
messages: []
links: [<0.69.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 2586
stack_size: 27
reductions: 1482
neighbours:

[error_logger:error,2016-09-08T02:22:13.145-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,ns_child_ports_sup}
Context: child_terminated
Reason: {abnormal,1}
Offender: [{pid,<0.121.0>},
{name,
{indexer,"/opt/couchbase/bin/goport",[],
[use_stdio,exit_status,stderr_to_stdout,stream,
{log,“indexer.log”},
{env,
[{“GOPORT_ARGS”,
"["/opt/couchbase/bin/indexer","-vbuckets=1024","-cluster=127.0.0.1:8091","-adminPort=9100","-scanPort=9101","-httpPort=9102","-streamInitPort=9103","-streamCatchupPort=9104","-streamMaintPort=9105","-storageDir=/opt/couchbase/var/lib/couchbase/data/@2i","-diagDir=/opt/couchbase/var/lib/couchbase/crash","-nodeUUID=9cb6edda57a624f08193c894ef30c1ce","-storageMode=forestdb"]"},
{“GOTRACEBACK”,“crash”},
{“CBAUTH_REVRPC_URL”,
http://%40:67c09ea0f990c89ccbc3532fd6b7b362@127.0.0.1:8091/index”}]}]}},
{mfargs,
{restartable,start_link,
[{supervisor_cushion,start_link,
[indexer,5000,infinity,ns_port_server,
start_link,
[#Fun<ns_child_ports_sup.2.49698737>]]},
86400000]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,worker}]

[error_logger:info,2016-09-08T02:22:13.161-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================PROGRESS REPORT=========================
supervisor: {local,ale_dynamic_sup}
started: [{pid,<0.131.0>},
{name,‘sink-indexer’},
{mfargs,
{ale_disk_sink,start_link,
[‘sink-indexer’,
"/opt/couchbase/var/lib/couchbase/logs/indexer.log",
[{rotation,
[{compress,true},
{size,41943040},
{num_files,10},
{buffer_size_max,52428800}]}]]}},
{restart_type,permanent},
{shutdown,5000},
{child_type,worker}]

[ns_server:debug,2016-09-08T02:22:13.220-04:00,babysitter_of_ns_1@127.0.0.1:<0.128.0>:restartable:start_child:98]Started child process <0.129.0>
MFA: {supervisor_cushion,start_link,
[indexer,5000,infinity,ns_port_server,start_link,
[#Fun<ns_child_ports_sup.2.49698737>]]}
[error_logger:info,2016-09-08T02:22:13.220-04:00,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================PROGRESS REPORT=========================
supervisor: {local,ns_child_ports_sup}
started: [{pid,<0.128.0>},
{name,
{indexer,"/opt/couchbase/bin/goport",[],
[use_stdio,exit_status,stderr_to_stdout,stream,
{log,“indexer.log”},
{env,
[{“GOPORT_ARGS”,
"["/opt/couchbase/bin/indexer","-vbuckets=1024","-cluster=127.0.0.1:8091","-adminPort=9100","-scanPort=9101","-httpPort=9102","-streamInitPort=9103","-streamCatchupPort=9104","-streamMaintPort=9105","-storageDir=/opt/couchbase/var/lib/couchbase/data/@2i","-diagDir=/opt/couchbase/var/lib/couchbase/crash","-nodeUUID=9cb6edda57a624f08193c894ef30c1ce","-storageMode=forestdb"]"},
{“GOTRACEBACK”,“crash”},
{“CBAUTH_REVRPC_URL”,
http://%40:67c09ea0f990c89ccbc3532fd6b7b362@127.0.0.1:8091/index”}]}]}},
{mfargs,
{restartable,start_link,
[{supervisor_cushion,start_link,
[indexer,5000,infinity,ns_port_server,
start_link,
[#Fun<ns_child_ports_sup.2.49698737>]]},
86400000]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,worker}]

[ns_server:info,2016-09-08T02:22:13.681-04:00,babysitter_of_ns_1@127.0.0.1:<0.76.0>:ns_port_server:log:210]memcached<0.76.0>: 2016-09-08T02:22:13.480478-04:00 NOTICE (bwecl) DCP (Producer) eq_dcpq:secidx:proj-bwecl-MAINT_STREAM_TOPIC_50:70:b9:ed:fe:b7:a:54-11994220998227306265/2 - (vb 344) Stream closing, sent until seqno 268958 remaining items 0, reason: The stream closed early because the conn was disconnected

And the initial log from the indexer.log

2016-09-08T02:21:57.362-04:00 [Info] ForestDBSlice::OpenSnapshot SliceId 0 IndexInstId 955245833351623398 Creating New Snapshot SnapshotInfo: seqnos: 110248364, 110248364, 0 committed:false
2016-09-08T02:21:57.362-04:00 [Info] StorageMgr::handleCreateSnapshot Added New Snapshot Index: 955245833351623398 PartitionId: 0 SliceId: 0 Crc64: 985532960051374772 (SnapshotInfo: seqnos: 110248364, 110248364, 0 committed:false) SnapCreateDur 52.804µs SnapOpenDur 287.405µs
[FDB ERR] Read error: read offset 199762582933037056 exceeds the file’s current offset 121453547520 in a database file ‘/opt/couchbase/var/lib/couchbase/data/@2i/bwecl_bwidx_bwecl_sp_startTime___269554588572028ac529f7faceb54b18_955245833351623398_0.index/data.fdb.2’

[FDB ERR] Failed to read the B±Tree block (block id: 48770161848886, block address: 0x7f333da2c000)
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0x0 pc=0x7f3559c72b22]

runtime stack:
runtime.throw(0x1214980, 0x2a)
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/panic.go:530 +0x90 fp=0x7f3555ff8208 sp=0x7f3555ff81f0
runtime.sigpanic()
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/sigpanic_unix.go:12 +0x5a fp=0x7f3555ff8258 sp=0x7f3555ff8208

goroutine 11138540 [syscall, 3 minutes, locked to thread]:
runtime.cgocall(0xc8a648, 0xc820195b38, 0x0)
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.6/go/src/runtime/cgocall.go:123 +0x11b fp=0xc820195ae8 sp=0xc820195ab8
github.com/couchbase/indexing/secondary/fdb._Cfunc_fdb_compact_upto(0x7f3554415aa0, 0x7f3544175b00, 0x1a7b91b, 0x0)
??:0 +0x41 fp=0xc820195b38 sp=0xc820195ae8
github.com/couchbase/indexing/secondary/fdb.(*File).CompactUpto(0xc823aa9680, 0xc823e77400, 0x91, 0xc824a039a0, 0x0, 0x0)
/home/couchbase/jenkins/workspace/watson-unix/goproj/src/github.com/couchbase/indexing/secondary/fdb/file.go:113 +0x32d fp=0xc820195bc8 sp=0xc820195b38
github.com/couchbase/indexing/secondary/indexer.(*fdbSlice).Compact(0xc8220f3680, 0xecf644b27, 0xc83b4bd6ca, 0x1998ca0, 0x0, 0x0)

The core file would be useful to debug further. Esp to rule out MB-20185. thanks

Sundar,
And how is the best way to get you the core file, it is 10 GB?

Many core files compress very well. However as an alternative could you also please look for what we call as the mini dump file (a file with hexadecimal characters ending in “.dmp”) and send that instead if the core file is still too large? thanks

You can upload the file using the curl command shipped with couchbase server to this location…
curl --upload-file FILE NAME https://s3.amazonaws.com/customers.couchbase.com/YOUR_COMPANY_NAME/

I used the following linux “split” command to chop up the core file into 500MB chunks
split -b 500M core.28696 core_segments

I have uploaded all 21 files (core_segmentsaa … core_segmentsau) to
https://s3.amazonaws.com/customers.couchbase.com/Broadsoft
(Please verify all 21 files are there. If any are missing please reply back and I will upload any missing files).

You can use the linux “cat” command to join them back together.