Couchstore tests failing with SEGV_MAPPER

I am building the Couchbase master source on powerpc64le platform and I am getting few segfaults as below:

     89 - couchstore-mapreduce-builtin-test (SEGFAULT)
     90 - couchstore-mapreduce-map-test (SEGFAULT)        
     91 - couchstore-mapreduce-reduce-test (SEGFAULT)        
     92 - couchstore-testapp (OTHER_FAULT)

While debugging one of the failures (couchstore-mapreduce-builtin-test) ; found that one of the variable value is not getting correct/expected value. Compared the gdb with x86 setup.

Pasting the debug logs for reference:

(gdb) b couchstore/src/views/mapreduce/mapreduce.cc:239
Breakpoint 1 at 0x100095d0: file /root/meghali/couchbase_master/couchstore/src/views/mapreduce/mapreduce.cc, line 239.
(gdb) r
Starting program: /root/meghali/couchbase_master/build/couchstore/couchstore_mapreduce-builtin-test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Running mapreduce builtin tests
[New Thread 0x3fffb5dff0b0 (LWP 32263)]
[New Thread 0x3fffb55ff0b0 (LWP 32264)]
[New Thread 0x3fffb4dff0b0 (LWP 32265)]
[New Thread 0x3fffb43ff0b0 (LWP 32266)]
Thread 1 "couchstore_mapr" hit Breakpoint 1, createJsContext () at /root/meghali/couchbase_master/couchstore/src/views/mapreduce/mapreduce.cc:239
239         Handle<Context> context = Context::New(isolate, NULL, global);
(gdb) p isolate
$1 = (v8::Isolate *) 0x3fffb4460000
(gdb) p global
$2 = <optimized out>
(gdb) n
Thread 1 "couchstore_mapr" received signal SIGSEGV, Segmentation fault.
0x00003fffb74f9448 in v8::NewContext(v8::Isolate*, v8::ExtensionConfiguration*, v8::MaybeLocal<v8::ObjectTemplate>, v8::MaybeLocal<v8::Value>, unsigned long, v8::DeserializeInternalFieldsCallback) () from /usr/local/lib/libv8.so
(gdb) q

Please note the “optimized out” value been shown for var “global” above. Has anyone seen similar issue before?
Any pointers around how to debug OR root-cause this will be useful.
Thanks,
Meghali

You could try to use Debug build type to include more debug information and reduce optimization level like this:

make EXTRA_CMAKE_OPTIONS='-DCMAKE_BUILD_TYPE=Debug'

Thanks @avsej, for your response!
I rebuilt my code adding this flag and then re-run the tests using “ctest” still the same error is seen.

Are the variables on in coredump still optimized?

No, now its showing the some valid values; as below:

Thread 1 "couchstore_mapr" hit Breakpoint 1, createJsContext ()
    at /root/meghali/couchbase_master/couchstore/src/views/mapreduce/mapreduce.cc:241
241         Handle<Context> context = Context::New(isolate, NULL, global);
(gdb) p isolate
$1 = (v8::Isolate *) 0x3fffb4460000
(gdb) p global
$2 = {val_ = 0x1004dcf8}
(gdb) n
Thread 1 "couchstore_mapr" received signal SIGSEGV, Segmentation fault.

However the segmentation-fault still there.

What version of v8 are you using? Couchbase is using 5.9 from this branch: https://github.com/couchbasedeps/v8/tree/5.9.223

You can find list of build dependencies here: https://github.com/couchbase/tlm/blob/master/deps/manifest.cmake, and the exact source versions here: https://github.com/couchbase/tlm/blob/master/deps/packages/CMakeLists.txt

As of now I am using the latest version i.e 6.2.0.
I will try downgrading to version 5.9 and see if that helps.

Hi @avsej, that really helped !
I downgraded my v8 version from 6.2.0 to 5.9 version and re-built the source from couchbasedep/v8 repo and now the couchstore 4 tests are passing for me.

  Start  89: couchstore-mapreduce-builtin-test

89/308 Test #89: couchstore-mapreduce-builtin-test … Passed 0.05 sec
Start 90: couchstore-mapreduce-map-test
90/308 Test #90: couchstore-mapreduce-map-test … Passed 4.02 sec
Start 91: couchstore-mapreduce-reduce-test
91/308 Test #91: couchstore-mapreduce-reduce-test … Passed 4.02 sec
Start 92: couchstore-testapp
92/308 Test #92: couchstore-testapp … Passed 0.57 sec

Thanks a lot for all your inputs and suggestions.

Now after this similar SEGV_MAPERR error I am getting for “couchdb” tests. I thought may be the root-cause of all of those might be same and this fix should resolve it however that’s not the case here for me. The error log for one of the couchdb related error is as below:

Start 1: couchdb-couch_set_view-02-old-index-cleanup
1: Test command: /usr/bin/python “/root/meghali/couchbase_master/couchdb/test/etap/runtest.py” “-c” “/root/meghali/couchbase_master/build/couchstore” “-p” “/root/meghali/couchbase_master/build/couchdb/src” “-m” “couch_set_view/test” “-e” “/usr/bin/escript” “-t” “/root/meghali/couchbase_master/couchdb/src/couch_set_view/test/02-old-index-cleanup.t” “–verbose"
1: Test timeout computed to be: 9.99988e+06
1: ERL_LIBS=”/root/meghali/couchbase_master/build/couchdb/src"
1: ERL_FLAGS="-pa /root/meghali/couchbase_master/build/couchdb/src/…/test/etap /root/meghali/couchbase_master/build/couchdb/src/couch_set_view/test /root/meghali/couchbase_master/build/couchdb/src/couch_view_parser /root/meghali/couchbase_master/build/couchdb/src/ejson /root/meghali/couchbase_master/build/couchdb/src/mochiweb /root/meghali/couchbase_master/build/couchdb/src/mapreduce /root/meghali/couchbase_master/build/couchdb/src/couch_index_merger /root/meghali/couchbase_master/build/couchdb/src/couchdb /root/meghali/couchbase_master/build/couchdb/src/erlang-oauth /root/meghali/couchbase_master/build/couchdb/src/etap /root/meghali/couchbase_master/build/couchdb/src/snappy /root/meghali/couchbase_master/build/couchdb/src/lhttpc /root/meghali/couchbase_master/build/couchdb/src/CMakeFiles /root/meghali/couchbase_master/build/couchdb/src/couch_dcp /root/meghali/couchbase_master/build/couchdb/src/couch_set_view"
1: # Current time local 2017-11-29 12:44:37
1: # Using etap version "0.3.4"
1: 1…73
1: Apache CouchDB 0.0.0 (LogLevel=info) is starting.
1: [info] [<0.87.0>] Database _replicator, design document _design/_replicator updated (new revision: 0-, deleted: false)
1: [info] [<0.119.0>] Database _users, design document _design/_auth updated (new revision: 0-, deleted: false)
1: Apache CouchDB has started. Time to relax.
1: [info] [<0.2.0>] Apache CouchDB has started on http://127.0.0.1:42516/
1: [info] [<0.59.0>] Deleting database couch_test_set_index_cleanup/0
1: [info] [<0.59.0>] Deleting file /root/meghali/couchbase_master/build/couchdb/tmp/lib/couch_test_set_index_cleanup/couch_test_set_index_cleanup/0.couch.1
1: [info] [<0.59.0>] Deleting couch file “/root/meghali/couchbase_master/build/couchdb/tmp/lib/couch_test_set_index_cleanup/couch_test_set_index_cleanup/0.couch.1” with renaming it to “/root/meghali/couchbase_master/build/couchdb/tmp/lib/couch_test_set_index_cleanup/.delete/f51057c0a4314b83bf8be72c2f6ea6c0”


1: Received signal 11 SEGV_MAPERR fffffffffffffff8
1:
1: ==== C stack trace ===============================
1:
1: [0x3fffa4ad5be4]
1: [0x3fffa4ad68f0]
1: [0x3fffacb704d8]
1: [0x3fffa453ba18]
1: [0x3fffa453bac8]
1: [0x3fffa4b0c36c]
1: [0x3fffa5acec98]


1: [0x3fffac978070]
1: [0x3fffac8c3a30]
1: [end of stack trace]
1:
1: 0/73 tests passed
1/1 Test #1: couchdb-couch_set_view-02-old-index-cleanup …***Failed 0.62 sec

Any pointers around this would really help.

I recommend you to rebuild all dependencies like this

#!/bin/sh -ex

PACKAGES="
lz4
snappy
python-snappy
rocksdb
v8
zlib
boost
breakpad
curl
erlang
flatbuffers
flex
icu4c
jemalloc
json
libcouchbase
libcxx
libevent
libsqlite3
libuv
numactl
"

TLM=$PWD/tlm
CACHE=$HOME/.cbdepscache

mkdir -p $CACHE
for pkg in ${PACKAGES}; do
  cd ${TLM}/deps/packages
  rm -rf build-${pkg}
  mkdir build-${pkg}
  cd build-${pkg}
  cmake .. -DPACKAGE=${pkg}
  cmake --build . --target ${pkg}
  archive=$(find ${TLM}/deps/packages/build-${pkg}/deps/${pkg} -name ${pkg}*.tgz)
  checksum=$(find ${TLM}/deps/packages/build-${pkg}/deps/${pkg} -name ${pkg}*.md5)
  cp ${archive} $CACHE/
  cp ${checksum} $CACHE/$(basename ${archive}).md5
done

Also when I’ve been using FreeBSD, I noticed that the couchbase does not work well with system alloc (which is also jemalloc), so I have to force it to use Couchbase version like this (PWD is where you checkout the repo projects):

export LD_LIBRARY_PATH=$PWD/install/lib/libjemalloc.so

Additionally, some dependencies might be frozen just for FreeBSD. Dependencies for platforms without official support listed here:

If you rebuild more recent versions with script above, you might want to update this file too.

Alright, I will try and understand this build mechanism. Thanks.

Tried re-building the dependencies and still the error, this time its giving erlang related error, unable to understand and debug the issue.

ERL_FLAGS="-pa /root/meghali/couchbase_master/build/couchdb/src/../test/etap  /root/meghali/couchbase_master/build/couchdb/src/couch_set_view/test  /root/meghali/couchbase_master/build/couchdb/src/couchdb  /root/meghali/couchbase_master/build/couchdb/src/lhttpc  /root/meghali/couchbase_master/build/couchdb/src/CMakeFiles  /root/meghali/couchbase_master/build/couchdb/src/couch_set_view"
escript: exception error: undefined function etap:plan/1
  in function  erl_eval:do_apply/6 (erl_eval.erl, line 657)
  in call from escript:eval_exprs/5 (escript.erl, line 865)
  in call from erl_eval:local_func/5 (erl_eval.erl, line 544)
  in call from escript:interpret/4 (escript.erl, line 781)
  in call from escript:start/1 (escript.erl, line 276)
  in call from init:start_it/1
  in call from init:start_em/1

Any pointers or suggestions would really help.
Thanks,
Meghali

In continuation of the further debugging.

We have observed couple of test failures inside couchDB couch_set_view test module with similar error “Received signal 11 SEGV_MAPERR”. While debugging further found that all the failed tests are crashing at update_ddoc function call . This is common place for crash for all the tests.
Also while running couch_set_view test , we have observed that couchdb-couch_set_view-02-old-index-cleanup test hangs in between inside function create_ddoc_copy. Also the test failure behavior is random here.

So with this we have a few set of questions;

  1. Whether couchdb tests are interdependent and does the sequence of tests run will have any impact?

  2. Does the couchdb tests run need any specific configurations?

  3. Should the ‘ddoc’ copy should get created by couchdb-couch_set_view-02-old-index-cleanup test ,which later gets updated by other tests?

  4. We have built the couchdb with some of the modules disabled; as the complete set of tests run hangs during execution. Below is list of enable/disabled modules (hashed once are disabled).
    ADD_SUBDIRECTORY(couchdb)
    ADD_SUBDIRECTORY(lhttpc)
    ADD_SUBDIRECTORY(couch_set_view)
    ADD_SUBDIRECTORY(couch_index_merger)
    #ADD_SUBDIRECTORY(couch_view_parser)
    #ADD_SUBDIRECTORY(couch_dcp)
    ADD_SUBDIRECTORY(mapreduce)
    #ADD_SUBDIRECTORY(snappy)
    #ADD_SUBDIRECTORY(erlang-oauth)
    ADD_SUBDIRECTORY(ejson)
    #ADD_SUBDIRECTORY(mochiweb)
    ADD_SUBDIRECTORY(etap)
    With these set of modules as of now all of the tests-suites for couch_set_view are failing for me.

  5. Also We have observed while running the tests individually sometimes these tests pass and sometimes they fail, why is the behavior unstable?
    couchdb-couch_set_view-17-unindexable-partitions
    couchdb-couch_set_view-23-replica-group-missing
    couchdb-couch_set_view-30-query-fdleaks
    couchdb-couch_set_view-33-dcp-duplicates
    couchdb-couch_set_view-34-truncate

i have been using above stated method for building dependencies , however "zlib’ package build is giving me errors

$pwd
/root/couchbase/tlm/deps/packages

$git remote -v
couchbase git://github.com/couchbase/tlm (fetch)
couchbase git://github.com/couchbase/tlm (push)
$cmake --version
cmake version 3.14.20190516-g82c6ec
CMake suite maintained and supported by Kitware (kitware.com/cmake).

$cd tlm/deps/packages && mkdir build-zlib && cd build-zlib && cmake … -DPACKAGE=zlib

$ cmake --build . --target zlib

make: *** No rule to make target ‘zlib’. Stop.

$ pwd
/root/couchbase/tlm/deps/packages/build-zlib

$ ls
CMakeCache.txt CMakeFiles Makefile cmake_install.cmake tlm

Any reason why unable to run make here ?