Couchbase 4.5 Community Version MemCached Crashes on OOM

Hi:

This is the second time I experience this issue because of high level of memory usage which causes the OS system to kill memcached. Currently the database was running at Ubuntu 14.04 with 4 cores, 30 GB memory. I allocated at maximum 10GB memory (Data Ram Quota in the Couchbase settings page) to each server nodes (3 nodes as a cluster). However, all memory are used gradually by memcached (16.8GB physical memory used) and /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp (10.8GB physical memory used).

It seems like the Data RAM Quota setting does not work at all!

Detailed system logs:

Nov  8 06:18:56 ip-172-31-31-154 kernel: [3910127.844724] ntpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844729] ntpd cpuset=/ mems_allowed=0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844731] CPU: 0 PID: 1952 Comm: ntpd Not tainted 3.13.0-48-generic #80-Ubuntu
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844733] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844734]  0000000000000000 ffff88078482f980 ffffffff81721506 ffff880741cfb000
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844737]  ffff88078482fa08 ffffffff8171bdc1 0000000000000000 0000000000000000
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844740]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844742] Call Trace:
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844748]  [<ffffffff81721506>] dump_stack+0x45/0x56
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844751]  [<ffffffff8171bdc1>] dump_header+0x7f/0x1f1
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844755]  [<ffffffff811529be>] oom_kill_process+0x1ce/0x330
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844759]  [<ffffffff812d7225>] ? security_capable_noaudit+0x15/0x20
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844761]  [<ffffffff811530f4>] out_of_memory+0x414/0x450
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844764]  [<ffffffff81159460>] __alloc_pages_nodemask+0xa60/0xb80
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844767]  [<ffffffff81197ad3>] alloc_pages_current+0xa3/0x160
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844770]  [<ffffffff8114f577>] __page_cache_alloc+0x97/0xc0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844772]  [<ffffffff81150f85>] filemap_fault+0x185/0x410
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844776]  [<ffffffff81175d8f>] __do_fault+0x6f/0x530
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844778]  [<ffffffff81179f32>] handle_mm_fault+0x482/0xf10
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844782]  [<ffffffff8107766b>] ? recalc_sigpending+0x1b/0x50
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844784]  [<ffffffff81077f72>] ? __set_task_blocked+0x32/0x70
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844787]  [<ffffffff8172d534>] __do_page_fault+0x184/0x560
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844791]  [<ffffffff8101e737>] ? __restore_xstate_sig+0x87/0x500
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844793]  [<ffffffff8107766b>] ? recalc_sigpending+0x1b/0x50
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844795]  [<ffffffff81077f72>] ? __set_task_blocked+0x32/0x70
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844797]  [<ffffffff8172d92a>] do_page_fault+0x1a/0x70
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844800]  [<ffffffff81729d68>] page_fault+0x28/0x30
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844801] Mem-Info:
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844802] Node 0 DMA per-cpu:
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844804] CPU    0: hi:    0, btch:   1 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844805] CPU    1: hi:    0, btch:   1 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844806] CPU    2: hi:    0, btch:   1 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844807] CPU    3: hi:    0, btch:   1 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844808] Node 0 DMA32 per-cpu:
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844810] CPU    0: hi:  186, btch:  31 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844811] CPU    1: hi:  186, btch:  31 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844812] CPU    2: hi:  186, btch:  31 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844813] CPU    3: hi:  186, btch:  31 usd:   0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844813] Node 0 Normal per-cpu:
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844815] CPU    0: hi:  186, btch:  31 usd:  49
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844816] CPU    1: hi:  186, btch:  31 usd:  69
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844817] CPU    2: hi:  186, btch:  31 usd:  92
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844818] CPU    3: hi:  186, btch:  31 usd:   1
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844820] active_anon:7728207 inactive_anon:70 isolated_anon:0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844820]  active_file:81 inactive_file:138 isolated_file:0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844820]  unevictable:0 dirty:0 writeback:0 unstable:0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844820]  free:47604 slab_reclaimable:3635 slab_unreclaimable:4837
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844820]  mapped:15 shmem:91 pagetables:18142 bounce:0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844820]  free_cma:0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844823] Node 0 DMA free:15904kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844827] lowmem_reserve[]: 0 3744 30661 30661
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844829] Node 0 DMA32 free:115648kB min:8000kB low:10000kB high:12000kB active_anon:3713032kB inactive_anon:116kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3836772kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:136kB slab_reclaimable:500kB slab_unreclaimable:1204kB kernel_stack:168kB pagetables:4332kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7 all_unreclaimable? yes
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844833] lowmem_reserve[]: 0 0 26917 26917
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844835] Node 0 Normal free:58864kB min:57496kB low:71868kB high:86244kB active_anon:27199796kB inactive_anon:164kB active_file:320kB inactive_file:552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:28067840kB managed:27563532kB mlocked:0kB dirty:0kB writeback:0kB mapped:56kB shmem:228kB slab_reclaimable:14040kB slab_unreclaimable:18144kB kernel_stack:2912kB pagetables:68236kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1529 all_unreclaimable? yes
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844838] lowmem_reserve[]: 0 0 0 0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844840] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15904kB
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844848] Node 0 DMA32: 8454*4kB (UEM) 830*8kB (UEM) 230*16kB (UEM) 127*32kB (UEM) 64*64kB (UEM) 52*128kB (UEM) 44*256kB (UEM) 27*512kB (UEM) 23*1024kB (UEM) 2*2048kB (EM) 1*4096kB (M) = 115784kB
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844857] Node 0 Normal: 732*4kB (UEM) 233*8kB (UEM) 233*16kB (UEM) 336*32kB (UEM) 192*64kB (UEM) 83*128kB (UEM) 32*256kB (UEM) 14*512kB (EM) 1*1024kB (E) 0*2048kB 0*4096kB = 58568kB
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844866] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844867] 356 total pagecache pages
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844869] 0 pages in swap cache
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844872] Swap cache stats: add 0, delete 0, find 0/0
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844873] Free swap  = 0kB
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844874] Total swap = 0kB
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844875] 7999901 pages RAM
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844876] 0 pages HighMem/MovableOnly
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844876] 126077 pages reserved
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844877] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844882] [  438]     0   438     4868       60      14        0             0 upstart-udev-br
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844884] [  443]     0   443    12483      174      27        0         -1000 systemd-udevd
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844885] [  565]     0   565     3814       66      12        0             0 upstart-socket-
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844887] [  613]     0   613     2555      573       8        0             0 dhclient
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844889] [  796]     0   796     3818       52      12        0             0 upstart-file-br
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844890] [  811]   102   811     9804       96      22        0             0 dbus-daemon
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844892] [  818]   101   818    65211     9563      46        0             0 rsyslogd
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844893] [  855]     0   855    10862       89      26        0             0 systemd-logind
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844895] [  937]     0   937     3634       41      11        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844896] [  941]     0   941     3634       38      12        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844898] [  943]     0   943    31313     3957      64        0             0 salt-minion
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844900] [  946]     0   946     3634       40      12        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844901] [  947]     0   947     3634       40      12        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844903] [  949]     0   949    13851     2163      33        0             0 munin-node
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844904] [  950]     0   950     3634       42      12        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844906] [  982]     0   982    15341      169      34        0         -1000 sshd
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844907] [  988]     0   988     5913       62      17        0             0 cron
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844909] [  993]     0   993     4784       42      13        0             0 atd
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844911] [ 1013]     0  1013     1091       35       7        0             0 acpid
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844912] [ 1049]     0  1049     4819       69      15        0             0 irqbalance
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844914] [ 1566]     0  1566     3634       41      12        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844915] [ 1567]     0  1567     3196       36      13        0             0 getty
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844917] [ 1952]   107  1952     7862      151      19        0             0 ntpd
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844918] [ 3600]   999  3600     1873       41       9        0             0 epmd
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844920] [ 3634]   999  3634   310467     9005      60        0             0 beam.smp
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844922] [ 3662]   999  3662  3388160  2984636    6436        0             0 beam.smp
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844924] [ 3693]   999  3693     1111       27       6        0             0 sh
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844925] [ 3695]   999  3695     1082       26       8        0             0 memsup
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844927] [ 3696]   999  3696     1082       21       8        0             0 cpu_sup
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844929] [ 3697]   999  3697     1865       25       8        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844930] [ 3698]   999  3698     1865       29       8        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844932] [ 3761]   999  3761    28934      890      19        0             0 saslauthd-port
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844933] [ 3783]   999  3783  5357710  4618402   10379        0             0 memcached
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844935] [ 3967]   999  3967     1864       22       9        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844937] [ 3968]   999  3968     1864       28       9        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844938] [ 3974]   999  3974     1865       29       8        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844940] [ 3980]   999  3980   211640     9271      57        0             0 beam.smp
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844941] [ 3981]   999  3981   395691    47930     196        0             0 beam.smp
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844943] [ 4032]   999  4032     1111       27       7        0             0 sh
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844944] [ 4034]   999  4034     1082       26       8        0             0 memsup
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844946] [ 4035]   999  4035     1082       21       8        0             0 cpu_sup
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844947] [ 4041]   999  4041     2842     1829      10        0             0 godu
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844949] [ 4042]   999  4042     1111       28       6        0             0 sh
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844951] [ 4043]   999  4043     1329      138       8        0             0 godu
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844952] [ 4054]   999  4054     2523      420      10        0             0 sigar_port
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844954] [ 4055]   999  4055     1460      159       8        0             0 goport
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844955] [ 4060]   999  4060    74025     1551      29        0             0 goxdcr
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844957] [ 4207]   999  4207   108005      768      39        0             0 moxi
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844958] [ 4215]   999  4215     1865       25       9        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844960] [ 4216]   999  4216     2389       34      10        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844961] [ 4229]   999  4229     1460      172       8        0             0 goport
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844963] [ 4233]   999  4233   110314      841      35        0             0 projector
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844964] [ 4259]   999  4259     1460      173       8        0             0 goport
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844966] [ 4263]   999  4263   132255     4105      46        0             0 cbq-engine
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844967] [16612]   999 16612     1865       29       9        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844969] [16613]   999 16613     1865       29       9        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844970] [11227]   999 11227     2389       34      10        0             0 inet_gethost
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844972] [25044]     0 25044    44596     5130      57        0             0 check-new-relea
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844974] [52714]   999 52714     1396      160       8        0             0 goport
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844975] [52718]   999 52718   233685    22052     106        0             0 indexer
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.844977] Out of memory: Kill process 3783 (memcached) score 589 or sacrifice child
Nov  8 06:18:57 ip-172-31-31-154 kernel: [3910127.850211] Killed process 3783 (memcached) total-vm:21430840kB, anon-rss:18473608kB, file-rss:0kB

I suspect the memory issues you are seeing with beam.smp is known defect: MB-20521. With regards to the memory quota and the memcached memory usages, would it be possible to open a defect and upload the [logs](www.couchbase.com/wiki/display/couchbase/Working+with+the+Couchbase+Technical+Support+Team, so we can investigate the problem further.